TYPE OF PROPOSAL: paper
TITLE: Using the TEI to author web sites
KEYWORDS: TEI, website, XSL
AUTHOR: Sebastian Rahtz
AFFILIATION: Oxford University
E-MAIL: sebastian.rahtz@oucs.ox.ac.uk
CONTACT ADDRESS: 
	Oxford University Computing Services
	13 Banbury Road
	Oxford OX2 6NN
	UK

FAX NUMBER: (+44) (0) 1865 273275
PHONE NUMBER: (+44) (0) 1865 283431

The Text Encoding Initiative DTD is usually classified as descriptive,
suitable for encoding digital versions of existing books, or
for creating specialized publications such as dictionaries. Less
commonly, it is used for academic books and papers, and original
documents such as the TEI Guidelines themselves. In this paper, we
address the problems of using TEI markup in an even less obvious domain,
that of `normal' Web sites, using
the Oxford University Computing Services http://www.oucs.ox.ac.uk
site as an example. Some examples are also given from the TEI web site
itself. We cover:

 - the conversion of existing HTML documents (c.6000) to TEI XML
 - the (small) extensions to the TEI DTD which are needed, and usage notes
   (eg uses of `rend' and `type' attributes)
 - the development of a comprehensive, flexible, set of XSLT specifications to
   convert the TEI XML documents into a linked web tree
 - consideration of management issues

Why choose the TEI DTD for a web site? The most important reason is
that we can harness experience with TEI paragraph-level markup, but
another argument in favour of TEI is the mature metadata support in
<teiHeader>. Using this allows us the option of converting to RDF
later on (since teiHeader should be as rich), without the
unfamiliarity of new elements.

The conversion process is typical of an exercise that will have to be 
carried out my millions of people over the next few years; cleaning up
bad HTML is a well-understood process, but interpreting the result as
TEI is not always easy (structured divisions present special
problems). Less obviously, the conversion process often involves
manual stitching together of a set of HTML files into a single TEI
document for much easier editing and maintenance.

The TEI DTD proves (perhaps surprisingly) perfectly suitable for
general web pages. Some of the extensions needed are:
 - the standard TEI Lite extensions
 - addition of short-cut attributes to <xptr> and <xref> to allow
   URLs directly (rather than via entities)
 - similarly, provision of `file', `scale', `width' and `height'
    attributes to <figure>, for practical authoring
 - addition an element <email> to the allowed contents for <address>
 - addition of MathML as the content for <formula>

As with any TEI project, we need to build up a repertoire of `rend'
and `type' attribute values for various elements. These include
 - `fancy' and `doublespace' types for lists
 - `new' and `noframe' rend attributes for <xref> and <xptr>, to
    specify links which must start a new window, and escape from a
    frame, respectively
 - `code' rend attribute for <hi>, to mimic HTML's <b><code>
 - `interpret' type attribute for <xptr> to support transclusion
but of course most visible web effects are confined to the
stylesheets. XML processing instructions are used to generate some of
the HTML <meta> elements. It is likely that the tables will need more and more
`rend' support in future, and this is the most likely area where we
would drop TEI in favour of another table schema.

The greatest amount of work is in writing XSLT stylesheets to render TEI
to HTML (either dynamically on the web browser client, or on the web
server). The results are coupled with CSS stylesheets, but in this
system at present CSS is relegated to a fairly minor role, since we
need considerable amounts of the transformation for which XSLT is well
suited. Most obviously, we often want to convert a single TEI document
into a set of HTML web pages, but there are many other examples of
generated or rearranged text. The resulting set of XSLT specifications
(over 3000 lines of code) is notable in three ways:
 1. It makes heavy use of the `import' feature of XSLT, allowing for
    modular and cascading stylesheets; a group of pages can easilu have
    their own wrapper around the main stylesheets, and a particular page
    can have its own wrapper around the group one.

 2. There are over 60 points identified where the result is
    parameterized, allowing for simple overrides in a wrapper
    file. These cover everything from the words used for `Next page',
    through the depth at which <div> elements produce new pages, to a
    switch which generates an HTML frameset presentation.

 3. There is a web form which allows a new user to derive an XSLT
    wrapper around the stylesheets, in a manner analogous to the TEI
    Pizza Chef, without knowing very much about XSLT. More experienced
    programmers can override any aspects of the stylesheets, obviously.

The paper shows examples of how the same document can be presented
on the web in a variety of ways by minor changes to the XSLT
stylesheet.

The management of web pages is always an issue, whatever
authoring system is used. We prefer to use a conventional change
management system which interacts with the <teiHeader>, and provides
plenty of flexibility for controlling a multi-author environment.

In conclusion, this paper demonstrates that authoring static web pages
in the TEI is reasonable, and that XSLT stylesheets provide a powerful
tool for manipulating them.