XML for Java Developers

G22.3033-002

Dr. Jean-Claude Franchitti

New York University

Computer Science Department

Courant Institute of Mathematical Sciences

 

Session 2: IE5’s Implementation of the XSL Specification

 

Course Title: XML for Java Developers                                                          Course Number: g22.3033-002

Instructor: Jean-Claude Franchitti                                                         Session: 2

 

 

The example presented in the handout on "XSL Transformations" presents an XSL style sheet that uses patterns to locate objects within the document tree, and shows how you can specify template rules to format these objects.

 

The example is interesting in that it demonstrated the process of transforming XML into HTML and shows how you can combine CSS style rules to format the HTML output.

 

The example can be tested using IBM's LotusXSL style sheet engine. In that arrangement, the style sheet is to be served along with its accompanying XML document from a Java servlet. The example can also be tested from the command line.

 

The style sheet should run in most XSL processors, including Internet Explorer 5's. However, if used with Explorer 5, the entire document, save one word ("by"), is missing. In tracking down the problem, you can discover a few differences between the XSL draft specification and Internet Explorer's implementation.

 

The differences reflect the quickly changing specification. Nevertheless, an understanding of how IE processes style sheets will save you countless hours of head scratching in other cases.

 

The Problem

 

The XML document, news.xml, presents a typical online news story. The document contains elements that describe the various parts of the article including its title, dek (subtitle), and byline, as well as formatting information within paragraphs. For example, the first character in the first paragraph is a drop-cap letter that would be larger than the rest of the characters in the document.

 

The style sheet used to transform the XML document goes something like this: A root template is used to create a boilerplate of the HTML output. Wherever we want to place text from the XML document, a pattern is used to select that portion of the document. Then <xsl:apply-templates> is called to process child nodes. For example, we call <xsl:apply-templates select="Story/SectionTitle"/> to include the title of the document at the top of the story.

 

When <xsl:apply-templates> is called, two things happen. First, the processor grabs the content specified in the pattern: In this case, the content of the Story/SectionTitle element is News&amp;Views. Next, <apply-templates> looks in the style sheet to see if there are any templates that apply to this node. In the original example, the story title is processed directly in the root template and no other template rules are specified. Thus, it is simply included with the appropriate formatting information.

 

However, later in the style sheet when <xsl:apply-templates select="Story//BodyText"/> is called, a chain of events occurs. The style sheet is as follows:

 

Listing One:

 

<xsl:stylesheet

 ...

      <TITLE><xsl:apply-templates select="Story/SectionTitle"/></TITLE>

 ...

             <H2><xsl:apply-templates select="Story/Headline"/></H2>

 ...

             <DIV Class="copy">

                <xsl:apply-templates select="Story//BodyText"/>

             </DIV>

 ...

  </xsl:template>

 

  <xsl:template match="BodyText">

     <P><xsl:apply-templates/></P>

  </xsl:template>

 

  <xsl:template match="DropCap">

     <DIV Class="DropCap"><xsl:apply-templates/></DIV>

  </xsl:template>

 

  <xsl:template match="bold">

     <B><xsl:apply-templates/></B>

  </xsl:template>

 ...

</xsl:stylesheet>

 

 

First, the content from the BodyText element is retrieved. This time, a template for Body Text exists, so it becomes instantiated and is processed. The only thing the BodyText template does is call <apply-templates> to process its children. And herein lies the problem.

 

At the time the style sheet was written, it was not clear whether <apply-templates> should process all descendants of the current node, or just its immediate children. LotusXSL assumes that all descendants will be processed. From an author's perspective, this is better, because it means that you don't have to write a separate template rule for every element type in your document. Microsoft, on the other hand, assumed that only immediate child nodes should be processed. That means you must write a separate template for each and every element type in your document.

 

Solving the Mystery

 

As you might guess, to get the style sheet in Listing One to work properly in Internet Explorer 5, you must write template rules for all of your document's element types. Essentially, what you must do is get your templates to cascade down through the tree in order to touch all of the elements. To do this, most rules will simply call <apply-templates> to traverse to the next level of child nodes. If any element type requires special formatting, you can simply add it to that template.

 

Listing Two contains an excerpt from the revised style sheet. The first thing to note in the new example is that we have renamed the elements in the XML document for readability. In particular, the BodyText element has been renamed to aBody, and the story element has been renamed to article.

 

Listing Two:

 


<xsl:stylesheet

 ...

  <xsl:template match="/">

      <TITLE><xsl:value-of select="article/headline"/></TITLE>

 ...

             <Span ID="BoxCopy">

                <xsl:value-of select="article/headline"/>

             </Span><BR></BR>

 

             <DIV Class="aBody">

                <P><xsl:apply-templates select="article//aBody"/></P>

             </DIV>

 ...

  </xsl:template>

 

  <xsl:template match="aBody">

     <P><xsl:apply-templates /></P>

  </xsl:template>

 

  <xsl:template match="para1">

     <P><xsl:apply-templates /></P>

  </xsl:template>

 

  <xsl:template match="para">

     <P><xsl:apply-templates /></P>

  </xsl:template>

 

  <xsl:template match="para2">

     <P><xsl:apply-templates /></P>

  </xsl:template>

 

  <xsl:template match="dropCap">

     <DIV Class="dropCap"><xsl:apply-templates /></DIV>

  </xsl:template>

 

  <xsl:template match="bold">

     <B><xsl:apply-templates /></B>

  </xsl:template>

 

  <xsl:template match="italic">

     <I><xsl:apply-templates /></I>

  </xsl:template>

 

  <xsl:template match="byline[@Email]">

     <A HREF="mailto:mfloyd@BeyondHTML.com"><xsl:apply-templates/></A>

  </xsl:template>

   

</xsl:stylesheet>

 

Browsing through Listing Two, you'll notice that the root template includes a rule to process the aBody element. The pattern article//aBody says "start at the document element article and select any descendants that are aBody elements." This allows aBody elements that are nested within other elements to be processed. When this <apply-templates> is instantiated, the processor looks for any templates that match aBody. Since there is an aBody template, it becomes instantiated. The only statement in this template is an <apply-templates>, which says "process all immediate child nodes."

 

Within the tree structure, child nodes of aBody include para1, para, and para2. Once again, the processor sets out to find templates for these element types and locates a template for each. The template for para1 simply inserts an HTML paragraph element (<P>) and calls <apply-templates> to process its child nodes. Children of the para1 element include the dropCap, bold, and italic elements. Note that there's also a text child node that represents para1's content. Again templates exist for each of these element types. In the case of the dropCap template, an HTML <DIV CLASS="dropCap"> element is inserted. Don't mistake this reference as pointing to an XML element. The CLASS attribute for this <DIV> actually references a CSS <STYLE> rule of the same name, which is located in the root template (not shown).

 

Next, the dropCap template also calls <apply-templates> to process its child nodes. This time, the only child node is a text node representing the element's content -- in this case, the "W" character. This character is inserted into the <DIV> element and the processor moves on to process the other templates.

 

Although you would expect this approach to solve the mystery. The title, dek, byline, and document text all should have appeared in the browser. But the same problem remains: Only the solitary word "by" appears in the window.

 

It turns out that Microsoft requires that you create a template rule to process all node types that are not specified as an element type. This means you must create templates to process attributes, comments, processing instructions, and yes, text nodes. That seems peculiar since text is so common that a template for handling it is built in to XSL. Nevertheless, IE requires that you include the template rule found in Example 1 to display an element's content. With this tiny bit of code, the mystery is solved.

 

Example 1:

 

<xsl:template match="text()">

<xsl:value-of />

</xsl:template>

 

Presumably, these implementation details were not made clear when Microsoft wrote its processor. In any case, this template rule should be included in any style sheet you design for use in IE.

 

More Q & A

 

You may want to dynamically build a hypertext link using XSL, where the target filename is an attribute of an XML element. In other words, you may need to create something like

<<A HREF="target.xml">

where target comes from an XML element

<GOHERE ref="target">

 

Assuming that you are outputting HTML from XSL (that is, transforming the XML to HTML), you could simply use a pattern to access the attribute, then generate an HTML anchor tag using the attribute for the HREF. In fact, the earlier example uses this approach to create a link to a biography in the author's byline.

 

The code is shown below:

 

<xsl:template match="byline[@Email]">

<A HREF="mailto:jcf@cs.nyu.edu">

<xsl:apply-templates/></A>

 

 

The next question is how does one determine whether he or she is outputting to HTML or XML? After trying the above, the resulting text string could be a stunningly perfect

 

<A HREF="target.xml>xxx</A>

 

displayed on the screen when launching the XML file. But it isn't a link that IE5 should have interpreted.

 

You could think that there is no distinction between XML and HTML in the above output tree type. So is there something to declare, like a processing instruction (PI).

 

As it turns out, a complete answer involves a lengthy explanation. First, the interpretation process depends on how you load the XML stream. For example, you could simply launch the XML file in the browser and rely on XSL to process the document, or you could load it and process via the DOM. The important point to keep in mind is that conceptually there are two trees -- the source tree and the result tree. The source tree is constructed by parsing the original XML document and placing all elements, attributes, comments, processing instructions, and so on into your tree structure.

 

The result tree is constructed from what is specified in the XSL style sheet (in this case, transformed HTML). At this point, the tree nodes represent well-formed XML. If the output is to go to a file, then the output will look like HTML and can be processed by any HTML browser. Presumably, IE shortcuts this process. That is, when you launch the XML file directly in IE, it parses the document into the source tree, constructs the result tree, reads it, and processes the output as HTML.

 

Conclusion

 

One thing you may have learned from all of this is that despite vendors' best efforts to comply with existing standards, various XSL engines still exhibit peculiar differences. Part of the problem is that implementation depends on which version of the standard was used, and what state it was in at the time. Web developers, must still grapple with these differences.