The ABCs of XML, Part 4

This article provides more of what you need to know to survive in the world of connected data by identifying a few common problems you are likely to encounter transforming XML into other formats.

By John T. Sever

Share Print Related RSS

By John T. Sever, President, Cascade Controls

The ABCs of XMLIn the final installment of this series exploring the fundamentals of XML (eXtensible Markup Language) and  its partner tool, XSLT (eXtensible Stylesheet Language Transformation), I’ll identify a few common problems you are likely to encounter and provide practical XSLT samples for transforming XML into other formats that may be more convenient for your purposes.

Problem 1 – Why do I get extra text in my output?
Answer: Built-in Templates. XSLT beginners often find unwanted unstructured text outside carefully designed and highly structured output. The extra text is likely the result of the XSLT processor applying a built-in template. When <xsl:apply-templates> is invoked and no matching template is found in the stylesheet, the processor invokes a default built-in template that lives inside the processor.

The primary built-in template matches any element and then calls <xsl:apply-templates> passing the current element’s child elements for further processing. This template gives XSLT its automatic recursive feature for all elements that have no explicit template in the transform.

The built-in template for text nodes (text between Element Open and Close tags) copies the text to the output. This built-in template causes great confusion if you don’t understand the way the processor handles unmatched nodes.

You can easily see the results of this behavior by transforming an XML document with a stylesheet that contains no templates as shown here.

RELATED ARTICLES

The ABCs of XML: Part I
The ABCs of XML, Part 2
The ABCs of XML, Part 3

Transformation Results
DV Modules and Parameters
DV Sample

<xsl:stylesheet version=”1.0” xmlns:xsl=”http://www.
  w3.org/1999/XSL/Transform”>
</xsl:stylesheet>

There are also built-in templates for comments and processing instructions that do absolutely nothing. Built-in templates have a lower priority than all other templates. Thus, you may override a built-in template by creating your own template.

To override the built-in template for text nodes, simply include the following template in your stylesheet.

<xsl:template match=”text()”/>

If you have other templates for processing text nodes, their match patterns will be more specific than this pattern and will, therefore, be matched by the XSLT processor as having higher priority than this template. Add this one-line template to a blank stylesheet and see that it produces a blank output file—a great starting point when developing new transforms.

Problem 2 – None of my templates work when the XML file includes a default namespace.
Dealing with a default namespace can be a little tricky. Because the designers of XSLT decided that terseness is of minimal importance, XSLT is verbose and explicit in most cases, but default namespaces are uncharacteristically non-explicit and can result in confusion and frustration. 

Remember that all elements in an XML document must belong to a namespace. In fact, there is no way to create an XML element that is not part of any namespace. This may seem confusing if you often work with XML documents that include no namespace declarations. Document elements with no declared namespace actually belong to a default null namespace and require no special namespace modifier in match patterns or XPath expressions. A non-null default namespace is one that does not include a namespace alias prefix. The following examples will illustrate this fine point.

In this sample, RecipeElement and OtherElements are included in the default null namespace and may be matched in XSLT expressions as RecipeElement or OtherElements, respectively.

<RecipeElement>
 <OtherElements/>
</RecipeElement>

In this sample, RecipeElement and its children (OtherElements) belong to the default namespace, urn:Rockwell/MasterRecipe because the namespace does not include a prefix after xmlns. These will no longer be matched by RecipeElement and OtherElements. To match these, the namespace must be declared with a prefix in the stylesheet; for example, xmlns:rs=”urn:Rockwell/MasterRecipe”.

Remember the value of this attribute must exactly match the namespace as it is declared in the XML file. It is case-sensitive. This declaration includes the prefix rs that may now be used for match expressions, such as rs:RecipeElement and rs:OtherElements, respectively. Remember that because the namespace is declared as a default namespace in the XML document, all child elements inherit from this same namespace, unless explicitly overridden by use of another namespace prefix.

<RecipeElement xmlns=”urn:Rockwell/
 MasterRecipe”>
 <OtherElements/>
</RecipeElement>

In this sample, the namespace is assigned the explicit prefix rs and therefore is no longer the default namespace.  Because no default namespace is declared, and because RecipeElement does not use the rs prefix, it belongs to the default null namespace and may be matched as RecipeElement and OtherElements respectively.

<RecipeElement xmlns:rs=”urn:Rockwell/
 MasterRecipe”>
<OtherElements/>
</RecipeElement>

In this sample, the namespace is assigned a prefix which is not used by RecipeElement. Therefore, matching this element requires a namespace declaration with a prefix in your stylesheet. Although it may be confusing, the XSLT stylesheet prefix is not required to match the XML source document prefix because a prefix is only a shorthand local alias of the full namespace. However, I recommend using the identical prefixes in your transforms to avoid confusion.

You may be surprised to learn that OtherElements is included in the default null namespace! Only default namespaces are inherited by child nodes. The prefixed namespace used in this sample is not the default namespace, and therefore is not inherited by OtherElements. This means that RecipeElement and OtherElements are members of different namespaces.

 

Character Entity Escape Sequences

 

Character

Sequence

&quot;

TAB

&#09;

New Line

&#10;

&

&amp;

<

&lt;

>

&gt;

&apos;

    

The most commonly used character entities and their escape sequences are shown here.

<rs:RecipeElement xmlns:rs=”urn:Rockwell/
 MasterRecipe”>
<OtherElements/>
</rs:RecipeElement>

I used the top of an RSBatch master recipe for the above samples. Rockwell uses this default namespace for each recipe saved in XML format. Therefore, your transformation for any RSBatch recipe should look like this.

<xsl:stylesheet version=”1.0”  xmlns:xsl=”http://www.
  w3.org/1999/XSL/Transform”
 xmlns:rs=” urn:Rockwell/MasterRecipe”
 exclude-result-prefixes=”rs”>
 <xsl:template match=”rs:RecipeElement”>…
 </xsl:template>
</xsl:stylesheet>

XML and XSLT namespaces can be one of the most frustrating aspects of developing XSLT transforms if you don’t understand these fine points, so I suggest you reread this section and make sure you understand how default namespaces differ from prefixed namespaces. A little experimentation can help too.

Problem 3 – My stylesheet automatically adds namespace attributes to output elements. Can I remove these?
You can never produce an output file using namespace prefixes that have not been declared; therefore, each namespace used in the output must be declared at least once. However, you may find that your output is littered with namespace declarations in nearly every element, whether they are used by that element or not. To suppress unnecessary namespace declarations, use the attribute exclude-results-prefixes in the xsl:stylesheet element (see previous example).

The value of this attribute is a list of namespace prefixes separated by white space. Remember this will not remove all namespace declarations, but it will remove unused and unnecessary namespace declarations.

Problem 4 – Text Output
Generating text output such as a comma-separated-values (CSV) file can prove more difficult than you may expect until you understand how the processor handles text and white space. Often you may find it difficult to generate one particular character like a quotation mark (“) because it has specific meaning in XML or by the XSLT language. Here are a few pointers for generating plain text output.

Make sure to include the output element before your templates as follows: <xsl:output method=”text”/>.
Use concat() function when creating output that is a combination of many text nodes or a mix of text nodes and literal text. This sample will generate a single CSV row with two comma separated columns, where each value is contained within quotation marks.

<xsl:value-of select=”concat(‘&quot;’,Value1, ‘&quot;,
 &quot;’, Value2, ‘&quot;&#10;’)”/>

Use the <xsl:text> element instead of literal text in your templates. This will give you more explicit control of your output while maintaining an easy to read transform. XSLT elements are not allowed inside an <xsl:text> element. The following verbose sample will produce the same output as the previous sample.

<xsl:text>”</xsl:text><xsl:value-of select=”Value1”/>
<xsl:text>”,”</xsl:text><xsl:value-of select=”Value2”/>
<xsl:text>”&#10;</xsl:text>

Use a CDATA section to simplify using special characters because everything inside of CDATA is ignored by the XML parser. Again, this will not work if your output is to be a mixture of literal text and XSLT elements, as in the previous example. A CDATA section begins with <![CDATA[ and ends with ]]>.

Practical Examples

Example 1Example 1: Import DeltaV Modules into MS Access Database
This example fills out the transform that I created for Part 3 of this series. (Control, January 2007). (This XML file is large and may require a bit of time to unzip and view).


Example 1Example 2: Convert RSBatch Recipe to a Word Document
This example transforms a Rockwell RSBatch recipe XML file into an MS Word 2003 document. MS Word 2003 supports an XML structure (schema) that includes every feature available from inside Word 2003. A variety of information regarding XML in MS Office 2003 is available.

In case it has not yet sunk in, you can create a Word document with a text editor and some well-formed XML – without installing MS Word! Our transform simply converts the Rockwell RSBatch XML structure into a Word 2003 structure that can be edited inside MS Word 2003.

RELATED ARTICLES

The ABCs of XML: Part I
The ABCs of XML, Part 2
The ABCs of XML, Part 3

Transformation Results
DV Modules and Parameters
DV Sample

Most of our engineers can use XSLT effectively, but learning WordProcessingML is much more difficult. To bridge the gap, we have created our own simplified intermediate XML language called CascadeDocML.

To support CascadeDocML, we created an XSLT transform that converts from our simplified syntax to full WordProcessingML syntax. Therefore, to generate a Word document is a two-step process: Input XML to CascadeDocML to WordProcessingML.

This is the methodology used for this example. The download for this example includes a document describing the functionality and structure of CascadeDocML, so you may use it for your own applications. (This XML file is large and may require a bit of time to unzip and view).


Example 1Example 3: Search and Replace
This example will generate an output file that is an exact copy of the input XML file with keywords replaced according to a list of search/replace pairs defined in a separate file. This transform may be used to duplicate a complete unit configuration where only the tags have changed. This transform is different than performing search/replace in a text editor such as Notepad, as it does not replace everything throughout the file. Our sample input file is a DeltaV XML file in which we want to be very selective about what code elements are available for search/replace.

Although this example may not work perfectly for your specific application, it is extremely modular and easy to extend to your own needs. (This XML file is large and may require a bit of time to unzip and view).


Example 1Example 4: Generate Parameter Spreadsheet from RSBatch Area Model
This example will create an Excel spreadsheet of an RSBatch Area Model with each process cell, unit and phase. Each phase will include a detailed list of the phase parameters and reports. One transform in the example creates a CSV text file that may be opened in Excel. The other creates an Excel XML file instead of a CSV file. The Excel XML output includes spreadsheet formatting that is much more complete than a simple CSV file. (This XML file is large and may require a bit of time to unzip and view).


  About the Author

John T. Sever, president of Cascade Controls, can be reached at johnsever@cascon.com.
 

Share Print Reprints Permissions

What are your comments?

Join the discussion today. Login Here.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments