A New Framework for Web-based Contributory Encyclopedias William J. McIver, Jr. Scholarly Technology Group Brown University Providence, Rhode Island 02912-1843 E-mail: mciver@cs.brown.edu Rayvon Fouché Department of History Purdue University West Lafayette, Indiana 47907 e-mail: fouche@purdue.edu Joy A. James African-American Studies Brown University Providence, Rhode Island 02912 e-mail: joy_james@brown.edu Paper Abstract for ACH/ALLC 2001 Extended Abstract The current graph structure of the Web, which has been shown by recent studies to be remarkably weakly connected, suggests in practical terms the continued need for some centralizing mechanism for knowledge organization, like traditional encyclopedias. However, such a mechanism should extend current models of on-line encyclopedias to allow self-selecting communities of people to easily author and contribute articles to a common database. Furthermore, those articles should be automatically indexed and interconnected in useful ways, and users should be provided with sophisticated search mechanisms for locating related collections of articles in the database. This is what Pierre Levy calls a cosmopedia, an intentional knowledge space which is "deterritorialized," yet can be easily located and used for developing a "collective intelligence." Such a system is referred to generically as a contributory encyclopedia. Our framework for contributory encyclopedias is called a Communipedia. It is a generic framework that allows the creation of any number of distinct compilations of information. The paper proposed for ACH/ALLC 2001 will detail motivations, design and implementation of this framework for creating and managing Web-based contributory encyclopedias. The encyclopedia represents an early advance in the organization and distribution of knowledge. Traditional encyclopedias - whether in print form or, more recently, on CD-ROM - provide highly organized systems of information, but they are static and are often seen as elitist. Historically, encyclopedias have come to be seen as not simply forms of knowledge organization, but as social processes which seek to define which knowledge is valid and important. The editorial processes used to create encyclopedias are usually closed to all but a select editorial board, and both traditional and CD-ROM-based encyclopedias are expensive to produce and own. The World-Wide Web (WWW) addresses one aspect of this problem, in that it represents a revolutionary step in democratizing the organization and dissemination of knowledge. It has allowed people all over the world to author and share information in a distributed and independent fashion. Recent studies have shown, however, that most of the WWW is poorly connected. This makes it difficult for users to find information in large parts of the WWW. It also makes it less likely that others will find the information that users intend to share. Overly simple indexing and searching mechanisms employed within individual Web sites compound the problem. One of the main virtues of the WWW is that it enables distributed and autonomous information sharing, but this situation makes clear that there is also a need for centralizing mechanisms for knowledge organization and dissemination that are available to all. WWW-based encyclopedias fill this role to a limited extent. On-line encyclopedias can be dynamic and made accessible globally at a lower cost than traditional and CD-ROM-based encyclopedia, but as with traditional encyclopedias, contributions are restricted to an elite group and their enabling technologies are usually not in the public domain. Furthermore, the WWW-based encyclopedias do not yet take advantage of more recent technologies, such as XML, for facilitating better authoring, indexing and hyperlinking. The focus of this project is to reconceptualize the form and function of the encyclopedia so as to create a new form which will provide better support for the organization of historical knowledge and on-line access to that knowledge than existing WWW-based encyclopedias. This new form seeks to move the encyclopedia from a static, linear, and elite type of knowledge organizing to one which is contributory, multi-dimensional, analyzable, and extensible. The design of this new form is geared toward use by average users. We call this new form a Communipedia. The data model supports the representation of composite documents and will be amenable to both rich forms of hyperlinking and database indexing for efficient searches. An instance of the Communipedia framework is contributory in the sense that any member of a self-selecting community will be able to submit articles. It will be possible for new articles and revisions of existing articles to be added to the Communipedia at any time. The framework is designed to accept articles, provide sophisticated on-the-fly indexing and hyperlinking of incoming articles, and store them in an XML database which will support a sophisticated query mechanism. A complementary design goal has been to allow sophisticated authoring processes for articles and is at the same time easy to use. The Communipedia framework is extensible in that it is designed to allow any user to conveniently add features to its technical design. One application of this aspect of the framework is the development by users of new tools for analyzing a collection or new document type definitions (DTDs) to represent new types of entries. The basic element of any collection managed by a Communipedia is the document. All documents exist as first class objects in the database server and have unique identity. Documents may be one of three semantically-related types: article, annotation, or map. All documents are semi-structured data objects modeled in an XML-based schema. Various built-in semantic relationships between these document types are supported. A document of type article corresponds generally to what is thought of as an encyclopedia article, but also includes multimedia assets such as images, audio and video. Each article, unlike traditional encyclopedia articles, however, may be composite in the object-oriented sense, containing other articles. Allowing composite articles will allow authors to reuse existing articles in composing new ones. A document of type annotation is associated with a document of article type. An annotation may contain a natural language commentary on the associated article for human consumption, or it may be used to represent a workflow specification and log for the automated editorial workflow processing of that article. A special case of the former is an annotation consisting of only a link to another, already existing, document. Each article document may be associated with zero or more annotation documents. Documents of type map serve as a higher-level form of knowledge organization than database index structures by representing specific semantic relationships and roles between articles. These are based on the concept of the topic map. Each map has a user specified type, such as "is a", "authored by", or "is related to." The Communipedia system will have a set of built-in map types; however, users and other system components will also be able to define their own map types. Each document (i.e. article or map) which participates in a map has a role. A map of type `authored by," for example, implies the participation of a document which represents the author and the document authored by that person. Since Communipedia may be linked to others for networked searches, each map must have a specific scope which defines the collection in which it exists. Map documents are intended to enable more sophisticated types of browsing and searching through a collection. Maps can be both the subjects and objects of queries and keyword searches since they are first class database objects. An instance of the Communipedia can be multi-dimensional in several respects. On an article level, a Communipedia goes beyond the linearity of traditional encyclopedias to provide dynamic hyperlinking between articles tailored to the query being performed. On an editorial level, the framework supports optional editorial layers of articles. Using this feature, a Communipedia could be organized, for example, into one or more collections of articles developed by associated editorial boards, while other collections in the same Communipedia might consist of articles contributed by users independent of an editorial board. This project envisions a key scenario in which this feature is critical. The ultimate goal of this work is the creation of an Encyclopedia Africana through the use of both an editorial board consisting of traditional, trained scholars along with independent contributions from communities or individuals. Using this approach, an entry on African American inventors written by trained historians, for example, might be linked to contributions of related information by independent scholars or individuals. Such a contributor might add information about experiencing the introduction of an invention, an encounter with the inventor, or a recording or image of the inventor they own, or the results of their own research. Other layers can be added dynamically to hold alternate versions or perspectives of articles, or running commentaries on one or more other layers. On an access level, Communipedias also support optional networked interconnection with other Communipedias, using a server-to-server, information exchange protocol to create so-called Web syndicates at the server level. The Communipedia is designed as an extensible, component-based system consisting of: user interfaces for authors, editors and browsers, an XML database server, an XML-based workflow management subsystem, and a peer-to-peer communication system for interconnecting multiple Communipedia. In addition to keyword search, general query expressions are supported using an XML-based query language, as well as hyperlink browsing. The workflow manager automates the routing of documents as part of the editorial process selected by users of a Communipedia. Workflows are configurable using specifications written in the WfMC extension to XML. This project makes contributions both to the technical aspects of hypermedia systems and their impact on social systems for knowledge organization and dissemination. African American Studies is the initial context for its evaluation. New approaches to organizing, searching and managing on-line encyclopedias using XML-based technologies have been developed, which are, in turn, serving as the basis for new approaches to pedagogy, collaboration, research, and writing in African American studies and other disciplines. The Communipedia will serve as a framework for examining a broad range of research issues, including information needs and use models in marginal communities, comparative studies of traditional versus vernacular historiography, technology appropriation and collaborative design in marginal communities, and universal access to information.