table of contents        previous chapter        next chapter

 

 

X. Distribution

 

Introduction

This section looks at how users gain access to your digital objects and how they use them. Once the data (text, images, sound, moving images) and metadata have been created there are two sets of decisions to be made: choosing appropriate delivery option(s) and modes of access. These are key decisions and are likely to be made at an early stage in the development cycle. Delivery options cover questions of how to distribute the data assets and grant users access to them; in other words they are concerned with dissemination, or ‘publishing’. Access modes—including applications and interface design decisions—affect how users will be presented with the material and how they may search, retrieve, navigate and manipulate them.

 

Delivery options

Keep in mind that the delivery options discussed in this section pertain to delivery versions of data and to public versions of metadata. Information pertaining to the storage and distribution possibilities pertaining to masters is in Sections XIII-XIV. Essentially, there are two choices for distribution: using a network as the medium (principally the Internet); or using portable media (usually magnetic disks, optical discs, and tapes). We consider both choices, including a note of requirements for each, and then compare their advantages and disadvantages.

 

Portable and removable media

There are three categories of portable and removable media that projects may wish to use to deliver their digital content. These are: tapes (e.g. Digital Audio Tape/Digital Data Storage (DAT), Digital Linear Tape (DLT); optical discs (CD-ROM or DVD); and magnetic disks (floppy diskettes, Zip disks, JAZ drives). There are a number of issues to consider, including the robustness of these media, and in particular their relatively high sensitivity to magnetism, atmospheric conditions and temperature; cost; the level of hardware support among the user base; user preferences; projected lifespan of the digital material; and the widely varying carrying capacities (for capacity in 2001 see the Table on Commonly Used Types of Removable Media, below). However, for most projects, CD-ROM or DVD are the likeliest options, simply because users are more likely to have access to a CD-ROM drive or DVD drive. In addition, these media are fairly durable, and since they can be distributed in a read-only format, it is harder for users to modify or delete materials accidentally. In order to deliver data on removable media you need either the appropriate equipment for duplicating data onto blank media, or the means to outsource duplication to a vendor. You may also need an installation program (if the applications do not run from CD-ROM). Finally, you need a means of delivering the media to end users and keeping track of orders and fulfillment.

 

Definition Box:

Commonly Used Types of Removable Media (2001)
Media Name Full Name Media Category Storage type Current Capacity
DAT/DDS Digital Audio Tape/ Digital Data Storage tape magnetic For DDS-3 12 GB uncompressed[1] and 24 GB compressed
DLT Digital Linear Tape tape magnetic 35 GB - 70 GB
CD-ROM Compact Disk-Read Only Memory disk optical 650 MB
DVD Digital Versatile Disk disk optical 6 GB
Removable Disk Packs   disk magnetic 10 GB - 70 GB
Floppy Disks   disk magnetic 1.44 MB
Zip disks[2]   disk magnetic 100MB and 250MB[3]
Jaz drives   disk magnetic 1 GB and 2 GB

 

Networked delivery

Here the Internet or TCP/IP-based intranet are the chief options, although there are other types of network such as AppleTalk or IPX/SPX. There are a number of different delivery mechanisms to users, each suited to different purposes. HTTP[4] protocols are the standard mode for communicating web documents, and given the ubiquity of browsers they represent a near-universal method of disseminating text and images, without requiring special action on the part of the user. Streaming media —for example via RealAudio, RealVideo or MP3—allows you to supply real-time digital audio and video, without necessarily allowing the user to download and save the data locally. FTP[5], which supports file transfer only, is suitable for making raw data, including installable applications, available to users, who may download files and save them locally. In order to deliver data of any of these types via the network, you need the following: a network path between the provider of the data and the users; server hardware; server software (an operating system and server applications); and client applications enabling users to access the data (though as indicated above, this may simply mean free browser technology). If you prefer not to host the delivery yourself, you may be able to outsource it, either to an internet service provider (ISP)[6] or to a partner or consortium member with the appropriate infrastructure.

 

Pros and cons of portable media

The advantages of portable media for producers of digital assets center chiefly on their cost and reliability, and on the fact that they provide a mechanism to deliver very high quality content to users. The media (e.g. CD-ROMs) are typically fairly cheap. Sale of digital assets on physical media is relatively straightforward, presenting analogies with printed documents which may also ease copyright negotiations, and assets can be licensed multiple times, extending the possibilities of generating revenue. There are also few potential points of failure in the use of removable media, no need to rely on network connections, and no bandwidth restrictions. There are advantages to users too: data presented in one physical package is attractive, and users can be offered access to much higher-quality data than would be feasible over the Web.

There are a number of disadvantages of portable media for producers, many of them involving loss of control and information. With portable media, producers hand over their project data to the user in one convenient package; once released, the materials cannot be updated or augmented except by an additional publication. Furthermore, access control is difficult compared to networked resources, and producers cannot gather usage data or other information. Although they eliminate dependence on networks and hence can be valuable in environments where no network is available, portable media are vulnerable to loss, theft, damage, and physical deterioration. Their storage capacity—though increasing steadily—is nonetheless small compared to the size of a high-quality image archive, and is more suitable for delivering a focused sampling of materials than a large collection in its entirety. From the development standpoint, too, there are disadvantages. Largely because it is often heavily customized, application development for portable media has traditionally been more expensive than for Internet-based delivery, and multi-platform development can be much more difficult (although there are also challenges in supporting multiple browsers and access tools for Internet-based distribution as well). Publication of portable media also poses potential difficulties; for one thing, it requires the production, packaging, storage, and shipping of physical objects, which may be part of a museum’s ordinary work but requires a kind of staffing and space which libraries or academic projects may lack. And as few museums have access to media distribution and sales channels, their products do not get the shelf-exposure that CD-ROMs and DVDs released by both traditional and new media publishers achieve. As a result, although a small number of cultural CDs have sold over 100,000 copies, most sell only a few hundred and are only distributed in the local museum store.

 

Pros and cons of networked delivery

The advantages of networked delivery for producers center on ease of development and production: if the project is well planned, a single source of data avoids the need for duplication of data (and this is a consideration to be attended to carefully); there is transparent updating and fixing of application bugs; many client and server components already exist ready-made; and multi-platform development is far easier than for removable media. It is easy to create tiered access tailored to different target audience needs, and measuring usage is relatively simple. In addition, browsers provide a useful set of generic client capabilities. Institutions that subscribe to many digital resources—for instance, academic libraries—overwhelmingly prefer networked access over portable media because it simplifies their task immensely, eliminating the accession process, relieving them of the care of fragile physical objects, and broadening their users’ access to the materials.

The disadvantages of networked delivery focus on the relative unreliability of networks and network applications: there are many potential points of failure; there are varying, non-standard, and buggy application implementations (Java, JavaScript, CSS, XML) and limitations in browser clients and interaction. Being on the Internet also raises security concerns, although these are being addressed in a number of ways. Charging for networked access is not as straightforward as the simple purchase of a CD-ROM, although again there are a number of workable models currently in use. Finally, the quality of the digital material that heritage institutions can make accessible via the Internet is constrained by the available bandwidth, and although bandwidth has been increasing steadily for the past two decades, so too has the demand.

For most purposes, some form of networked-based web application is a good solution, and this is the approach that most digital resource providers are now taking. Given the nature of digital collections, being able to update and add to the resource dynamically is a huge advantage for most content providers, as is the ability to add users effortlessly without producing and mailing additional disks and packaging. The Internet provides a direct distribution channel between the heritage institution and the user community. Most heritage institutions that use the Internet to make their digital assets accessible do so for free, perhaps because of current attitudes towards charging for networked information resources, although it is notable that licensed digital resources such as journals, full-text collections, and bibliographies are widespread and take an increasingly large share of library purchasing budgets.

 

Modes of access

Assuming a web-based, networked delivery scenario, the next set of choices concerns the application model for web delivery, and the underlying technologies and data structures that support the digital publication. As with other decisions, these will be strongly influenced by the scale and purpose of the project or program. The needs of a small-scale project with limited, unchanging content will be very different from those of a large organization with high-volume delivery needs.

The simplest form of web delivery is a static HTML page: essentially a document that has been encoded so as to be readable using a web browser. Pages of this sort reside on a web server and are delivered to the user for viewing when requested. They may be searched in very basic ways by various means: by using a search engine installed locally, or by using one of the available commercial search engines such as Excite or Google. Such pages may be quite simple to create and maintain, using standard HTML authoring tools such as DreamWeaver or Microsoft Front Page. Since HTML is a very basic markup language, it is readily understandable and easy to learn and use. Furthermore, the software required to publish such pages is simple and easy to support.

Some of the limitations of this approach have already been indicated in Section V, Digitization and Encoding of Text; HTML has extremely limited descriptive capabilities and can represent very little of the intellectual structure or content of a document. Thus, while HTML is entirely adequate for the delivery of a project web site (home page, project description, contact information, background, and the like), it is not appropriate as a way of capturing the actual intellectual content of a site (primary source documents, metadata, bibliographic records, image collections, and so forth). But in addition, the structure of such a site—a simple collection of HTML files—does not offer any means of managing the data systematically, querying or displaying it powerfully, or maintaining it as it grows.

These two limitations—lack of descriptive power in the encoding, and lack of overall data management—can be addressed separately or together, depending on the project’s needs and the nature of the data. The former can be addressed by adopting a more powerful descriptive markup, using an SGML or XML encoding system such as TEI, EAD, or one of the other high-quality encoding systems now available (see Section V for more details). There are now a number of free open-source tools, as well as commercial systems, for publishing XML documents on the web, using XSLT (Extensible Stylesheet Language Transformations). If the content to be published consists of textual items with rich internal structure that may be of research interest to users, XML is an effective way of representing this information and exploiting it in the user interface. Good examples of this kind of content are scholarly editions or full-text collections of historical or literary documents. Similarly, if the content consists of highly structured data such as bibliographic records, finding aids, or metadata, XML may also be a useful way of capturing and publishing it, albeit with a different kind of DTD. In all of these cases, you can use XML tools to create a publication which enables users to query the information in detail, view and manipulate the results, browse documents in their entirety, and perform other central research activities. At the moment, the requisite publication tools require considerably more technical competence than those for basic HTML; the XML markup is likely to be more complex, and the design and implementation of the XML delivery system require an XML programmer. However, there is a strong demand for XML publication tools, that allow a comparatively non-technical person to manage and publish XML documents effectively, and such tools may well start to appear within the next few years.

The second limitation, the need for more powerful data management systems, needs to be addressed at a deeper level: not in the encoding of the individual items, but in the creation of a powerful infrastructure in which they are stored and managed. Typically such systems are built on a database model, treating the individual items as database records and exploiting their regularity and structural similarity to allow for effective searching and manipulation. Such systems may be quite small—for instance, many web sites are based on small-scale database applications such as Access or even FileMaker, and for very limited purposes (such as publishing a small bibliography) these might well be an improvement over static HTML pages. However, for large institutional projects, robust database systems such as Oracle, Informix, and MySQL (the latter is free and open-source) are more typical and are capable of managing very large-scale collections. What the database architecture provides is the ability to query large numbers of documents with maximum speed, and to perform the kinds of processing—sorting, statistical manipulation, systematic comparison—that database tools are best at. Such a solution is ideal for a large collection of similarly structured items, where retrieval speed is very important: for instance, large metadata collections, image libraries, historical census data, and the like.

Using a database architecture offers a number of advantages in addition to power and scale. Once the underlying design is complete, the database can grow almost without limit as new records are added, without any work other than that of entering the new data. The system can include an input form which allows non-technical staff to make updates or additions, and the form can be made accessible over the web to accommodate off-site staff or collaborators. The newly added information may become live immediately, or can be marked as provisional, to be published pending review by an editor. Such systems also offer powerful workflow management and tracking possibilities which can be a valuable investment for a long-term project or one involving several participants.

One of the most distinctive features of both the XML and database-driven approaches is that in both cases, the web page that the user actually sees is generated dynamically from the source data (the XML document or database records), rather than being stored statically on the server. In the case of the database, the publication system retrieves information based on a user query, and then uses a style sheet to order, format, and display the results in HTML so that it can be viewed with a web browser. Similarly, although XML documents themselves are stored in a stable form, the XSLT stylesheets which turn them into HTML for web viewing can also perform more powerful transformations, such as reordering data elements, adding text and graphics, selecting and displaying small segments of the document, or even converting the information to another data format entirely. These dynamic capabilities open up a huge variety of interface options which go far beyond what is possible using simple HTML.

Finally, as suggested above, these two approaches—XML encoding and database architecture—can also be combined to leverage the power of both, and indeed there are signs that the two technologies may be converging, as database systems become more complex and XML tools become more powerful.

In addition to the questions of technical implementation and user access discussed above, access models are also concerned with issues such as security, limitations on access and charging, which will be considered next.

Most of the projects interviewed in the survey for this Guide are currently providing free access to their collections, but many are looking at the alternatives or have already chosen a licensing model. Models for charging for the use of digital assets over the Internet are still not as widespread and straightforward as they might be (evidence of this comes from the continuing battles over Napster-type applications). But considerations such as protection of rights, or the need to generate revenue from digital assets, provide increasing motivation for projects to limit access to their online resources. These limitations can take several forms which may be more or less burdensome to the user. Typically users need to identify themselves before gaining access, via an authentication system of some kind. Such systems may involve direct user action, such as entering usernames and passwords, but there also exist modes of authentication that are invisible to the user, such as IP address authentication or access via an authenticating gateway. Whatever method is chosen, restricting access can be costly, determined hackers can find loopholes in most systems (particularly if the restricted data is valuable enough), and authentication systems may require a high level of maintenance and technical support. Projects or programs need to be sure that the revenue generated will justify these costs and that proper account is taken in the project planning of the responsibilities and legal liability of the agency with control over the server that distributes the assets. The advantages of free access to digital materials are therefore not just altruistic — there could be a significant overhead associated with limiting access. Many of the projects and programs interviewed were looking at revenue generation as a means of sustainability, so it is likely that secure systems that are easier to implement will be generally available in the near future. Corbis and Getty Images both believe that future access to picture libraries will be via the Internet, and they are building libraries and systems to make this happen. Many cultural institutions have generated income from their existing collections through the licensing of analog copies, and they need to make the shift to digital delivery of the material.

 

Future Trend:

XML

 

The advantages and disadvantages of static and dynamic web pages have been outlined above. One possible distribution method that combines the simplicity of authoring and hosting of HTML with the scalability and structure of dynamic pages is XML as a data format in a native XML database or content management system.

 

For further information see Ronald Bourret’s sites:

http://www.rpbourret.com/xml/XMLDatabaseProds.htm

http://www.rpbourret.com/xml/XMLAndDatabases.htm

 

Structural Metadata

 

As the digital representation of our cultural and heritage material increases and becomes more complex the relationship of individual digital objects to each other, the item from which they were derived, the collection to which they belong and the way in which the digital objects are stored and organized will become increasingly important. It is this information that will enable future users to retrieve not just a faithful representation of an object but reconstruct, navigate and understand the whole context in which the object was created and used. For this to be achieved distribution systems will have to hold increasing amounts of structural metadata. This may also suggest holding such metadata in object orientated rather than flat file or relational database systems.

 

Metadata Harvesting

 

As digital objects, catalogues and finding aids proliferate on the World Wide Web, effectively searching and retrieving information across multiple servers, systems and domains becomes both increasingly important and increasingly difficult. This issue is a central concern for cultural heritage and humanities institutions, which share the goal of broadening access through digitization.

 

One solution to this challenge is Metadata Harvesting. Simply put, this is a protocol that exposes metadata on the World Wide Web to enable searching across repositories. The best known system is the Open Archives Initiative Metadata Harvesting Protocol (MHP).

 

Further details:

A non-technical introduction to MHP:

Clifford Lynch, "Metadata Harvesting and the Open Archives Initiative," ARL Bimonthly Report 217 (August 2001): http://www.arl.org/newsltr/217/mhp.html

and

Donald Waters, "The Metadata Harvesting Initiative of the Mellon Foundation," ARL Bimonthly Report 217 (August 2001): http://www.arl.org/newsltr/217/waters.htm

The OAI MHP protocol: http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html

MHP tutorial: http://library.cern.ch/HEPLW/4/papers/3/

CIMI Working Group: http://www.cimi.org/wg/metadata/

CLIR Metadata harvesting project: http://www.clir.org/activities/details/metadata-docs.html

DLF and Metadata Harvesting: http://www.diglib.org/architectures/mdharvest.htm

University of Illinois at Urbana-Champaign Metadata Harvesting services: http://oai.grainger.uiuc.edu/

 

Conclusion

In determining the best delivery options, user needs should be balanced against producer advantages. Though portable media are cheap, reliable, relatively easy to sell, with no bandwidth problems and appear easy to archive, network delivery is generally easier and cheaper to produce and update. Tiered access for different audiences and for measuring usage is simple online. Despite bandwidth restrictions and security concerns, for most purposes networked delivery is the most common solution, providing a direct distribution channel that often reaches unexpectedly wide audiences.

Depending on the scale and purpose of a project or program, a database-driven system will generally be preferable to static HTML pages. Although new developments, such as metadata harvesting, are making content stored in databases more easily discovered, HTML pages are still more easily searchable from outside than dynamically generated pages, although within the resource itself the reverse is true. XML pages, though more costly to develop, are more economical and easier to maintain and update, and they offer vastly improved possibilities for data description, manipulation, and reuse. As with other decisions, this one will also depend on how users use your material as well as the scope of your project.

Access models also have security and economic aspects. Rights management and the need to generate income from digital assets may mandate access limits via identification and authentication systems, which, however, add another layer of cost for development and maintenance. New security and economic models are important to watch for.

 


[1] Because of the way data are encoded when written to tape and the way error correction works we do not recommend that information be written to tape only in uncompressed format.

[2] Zip and JAZ are proprietary systems manufactured by Iomega.

[3] Note the 250MB drives cannot read 100MB media and vice versa.

[4] Hypertext Transfer Protocol: A set of rules that enable documents to be used on the worldwide web and viewed by browsers such as Internet Explorer or Netscape.

[5] File Transfer Protocol: enables files to be transferred from computer to computer using a simple program such as WSFTP or Fetch.

[6] Internet service providers most typically provide internet access for users. But many of the larger ISPs also offer server services which can accommodate the publication of large web-based resources, for a fee. If your institution is not able to maintain its own web server, or if the resource you are creating is too large to be published from your local servers, an ISP may be an option worth considering.

 

  table of contents        previous chapter        next chapter




valid xhtml 1.1
abp~03/03