TYPE OF PROPOSAL: paper TITLE: Beyond the Web: TEI and the Ebook Revolution KEYWORDS: ebooks, TEI, XML AUTHOR: Matthew Gibson AFFILIATION: Electronic Text Center, University of Virginia Library E-MAIL: msg2d@virginia.edu AUTHOR: Christine Ruotolo AFFILIATION: Electronic Text Center, University of Virginia Library E-MAIL: cjr2q@virginia.edu CONTACT ADDRESS: Alderman Library University of Virginia Charlottesville, VA 22903-2498 FAX NUMBER: 804.924.1431 PHONE NUMBER: 804.924.3230 *Introduction* From August through November 2000, the Electronic Text Center at the University of Virginia delivered over a million freely-available electronic books to patrons in over 100 different countries. Distributed in a variety of formats, including .lit, .pdb, and .pdf, these ebooks have provided proof-of-concept for the adaptive uses of TEI standards beyond the World Wide Web -- standards that the Electronic Text Center has employed since its inception in 1992. In this presentation, we will discuss the mechanics of our ebook production and the conversion workflow we hope to implement in the near future. We'll also talk about the user response to our ebook collection, and the advantages and disadvantages that different formats offer to scholars and instructors in the humanities. For the purposes of this paper, an ebook is defined as an electronic full-text resource designed to be read on a screen, in something other than a web browser. Thus an ebook can be read on a PC, a laptop, a PDA, or a dedicated reading device, in one or more of the growing number of available formats and software applications. *Methods of Ebook Production* In its first phase of ebook production, the Etext Center repackaged a portion of its TEI-encoded collection as .lit files for use with the Microsoft Reader. Although the Reader is a proprietary piece of software, it is compliant with the Open E-Book (OEB) format, an XML-based standard to which TEI data can easily be adapted. Using simple Perl scripts, we automated the conversion of over 1,500 existing TEIXLITE files into extended OEB, which allows most of the original tagging to be preserved and accommodated with stylesheet instructions. This conversion gave us a body of core ebook documents that we could repackage, through the use of a piece of commercial software, into the Reader format. Later, with some simple adjustments to the conversion scripts and the stylesheets, we were able to output our OEB files to the .pdb format for the Palm system and the .pdf format for the Adobe Reader. At the moment all of our ebook files are static objects on the Etext server. However, with Xhub, the conversion application described and proposed by the Scholarly Technology Group at Brown*, in mind, we are working both to expand the number of formats we can process from SGML/XML content and to create those formats on-the-fly. Because the automation for .lit and Palm systems is already in place and we are about to begin dynamically transforming SGML/XML to PDF, the public auto-conversion interface to generate ebooks from TEI data is imminent. Ultimately, we envision a delivery system where visitors to our website can choose to view and search our texts through the traditional web interface, or download them instantly in an ever-growing number of ebook formats. Patrons will have more control than ever before over the way they access and use our materials. *User Response, Statistics, and Feedback* Like similar TEI-based text repositories, Etext has prided itself on the usefulness of its encoded data for sophisticated searching and text analysis. Traditionally, though, we've considered issues of aesthetics, design, and interface to be of secondary importance. Our work with ebooks represents a new focus on the technologies of reading and how they impact our patrons. Early analysis of user statistics for our ebooks indicates that, when users are given the choice between a downloadable MS Reader version of a text and a web-delivered XML/HTML version, they choose the former by a margin of about 2 to 1. As we make additional ebook formats available from our website, we will conduct careful analysis of usage patterns, with a particular eye to how format preference varies among individual titles or content categories. This analysis should prove useful to academic institutions and commercial publishers alike, as we are unaware of any substantial analysis of ebook usage patterns that has yet been published. *Pedagogical Challenges* In converting richly-encoded TEI documents to ebook format for classroom use, we provide students with the advantages of portability and a user-friendly interface. However, the current ebook platforms limit the utility of these texts because they are not SGML/XML-aware and do not support, for example, the kinds of hierarchical searching and analysis that TEI markup allows. In our classroom pilot projects, we are therefore searching for implementation solutions that combine the functionality of encoded text with the ebook's ease of use. For example, we are using the original TEI texts to create stand-alone indices of all materials related to a particular course. Students can then perform web-based searches that take advantage of the markup and metadata, but will have the option of retrieving their results in the ebook format of their choice. As ebooks provide instructors with more control over the presentation of classroom materials, we are recognizing the importance of working closely with them to determine the optimal format for their purposes, as different formats facilitate different types of scholarship. For example, raw page images loaded into a PDF-based ebook reader would have little utility for a scholar doing linguistic analysis. For an instructor interested in the visual impact of book layout and typography, however, this presentation is preferable to full-text transcription and encoding. Thus we find that we can't allow our own standard practices or assumptions about humanities computing to limit the range of presentation options we offer to our patrons. Even within the scope of a single course, a variety of textual formats may be needed to meet the instructor's pedagogical goals. *Conclusion* Since it was established, the Electronic Text Center has maintained its two-fold mission of building SGML-based content while simultaneously educating and serving the community that will use this content. We see ebook production as an important part of both the research and public service aspects of our mission. As methods of delivering content change, and user expectations change with them, we must adapt to these changes and incorporate them into our existing workflows. Furthermore, we hope that our presence the ebook world will, in some small way, help to foster a commitment to structured data and open standards in an industry which is increasingly dominated by big corporations and proprietary interests. =============================================================== * See http://www.gca.org/attend/2000_conferences/Extreme_2000/Papers/Mylonas/Mylonas-Mah.html for a discussion on Xhub that Elli Mylonas and Carole Mah from STG gave at the Extreme Markup Languages 2000 Conference, Montréal, Canada, Aug. 15-18, 2000.