The ABCs of XML - Part 3: XSLT

This article provides more of what you need to know to survive in the world of connected data by introducing the XSLT programming language, which brings the benefits of XML to industrial automation applications.

Share Print Related RSS

The ABCs of XMLBy John T. Sever, President, Cascade Controls

XML data is increasingly prevalent in the world of industrial automation. System vendors provide options for data interchange through XML, and industry organizations adopt XML as a standard for data transport, such as the OPC Foundation’s OPC XML-DA specification. Still, many front-line automation engineers find no benefit to old data restructured in XML alone. The benefits they seek may be found in technologies designed to support, consume and manipulate XML data. One such technology from the World Wide Web Consortium is a language specifically designed for transforming or restructuring an XML document into something else. The eXtensible Stylesheet Language Transformations (XSLT) is a simple, yet powerful language that can bring enormous benefit to those willing to learn it.

 

RELATED ARTICLES

The ABCs of XML: Part I
The ABCs of XML, Part 2


 

In this article, I will introduce the basics of XSLT through an example derived from an actual customer request. My coverage of the topic will be limited by the space available in this format however, I hope to awaken interest in a technology that might otherwise be overlooked and thereby give to some, a hint of the benefits waiting to be uncovered.

The Project
My customer maintains a large DeltaV automation system in an FDA-qualified environment. The customer would like a tool that simplifies daily demands of managing a large configuration. Such a tool, we have determined, is most easily developed around a relational database. Therefore we must transfer the configuration from control system to database in a structure that best supports the requirements of this tool.

A prototype of this tool is under development using Microsoft Access 2003 because of its built-in features for reporting and rapid application development. One such feature is the ability to import XML data through the File menu (File/Get External Data/Import), allowing the application of an XSLT transform before importing. This feature imports element-normal XML data by creating new tables with structures based upon those found in the XML document. The user optionally may choose to append data to existing tables following a set of mapping rules that match table and field names to element names in the XML document.

 

     FIGURE 1: THE XSLT TRANFORMATION PROCESS
  XSLT Transformation Process
 

Restructure XML data or transform XML into an alternate text format.

The Transformation
Unfortunately, the XML structure of a DeltaV configuration does not match the desired data structures to meet our analysis requirements, so the raw XML data must be restructured or transformed into a different XML structure better suited to the customer’s requirements. Before learning the benefits of XSLT, I would have solved this problem by writing VBA code to load the XML document and programmatically walk the tree of nodes, adding new records to database tables using either the DAO or ADO object models. The XSLT solution presented here requires less code, is more robust, easier to develop, easier to maintain and modify, and is significantly faster than the VBA solution.

The XSLT style sheet restructures the XML for importing control modules and their parameters (See Sample Files). By dissecting this style sheet, I will explain basic XSLT concepts, while explaining solutions to common data structure problems.

Before reading on, get familiar with the basic structure of a DeltaV XML file. Notice that each control module is defined in a <module> element that has attributes for tag, plant_area, category, user and time. Also review the <attribute> and <attribute_instance> elements that are children of a <module> element.

Style Sheet as Transformation
The top element in an XSLT transformation must be the <xsl:stylesheet> element, and it must include a version attribute and the namespace declaration. An XSLT transformation is also an XML document, and therefore it must have a single top level element <xsl:stylesheet>. This element must include the namespace http://www.w3.org/1999/XSL/Transform, and it must have a version attribute. By convention, the namespace is generally assigned the xsl: prefix used throughout the document to distinguish XSLT language elements from literal output.

XSLT Transformation Processors
Execution of a transformation is performed by an XSLT processor that accepts XML and XSLT documents as input and generates a document as output. Output is created through execution of templates declared within the XSLT document. The processor walks the XML tree matching each node to a template in the XSLT document. If the processor cannot find a matching template in the XSLT file, it executes a built-in template inside the processor.

The built-in template for element nodes selects child nodes of the current element and tries to find templates that match these children. This built-in behavior results in automatic recursion down every branch of the XML source document tree. There is also a built-in template for text nodes that copies the text to the output. The effect of this built-in template is usually unexpected and unwanted, but can be easily prevented if you understand this behavior.

When a matching template is found for a given node (not a built-in template), the processor will not automatically recurse its children, effectively ending the transformation process for all descendent nodes. The developer may programmatically choose to continue processing child nodes with <xsl:for-each> or by invoking <xsl:apply-templates>, providing fine control of how child nodes are processed.

Output Formatting
The second line of our transform, <xsl:output>, controls output format and must appear before any templates. The attributes specify output formatting instructions for the processor,
 <xsl:output encoding=”utf-8” indent=”yes”/>

Templates
In our example, the processor starts with the document root and, finding no matching templates, invokes the built-in template which selects children of the document root and looks again for matching templates. The only child of the document root is the <fhx> element, which is passed to our first template that matches <fhx> elements as children of the document root (identified by the slash).

This template generates a literal <root> element in the output. Inside the <root> element <xsl:apply-templates select=”module”/> invokes the processor to find matching templates for all <module> elements that are children of the current <fhx> element. The processor is instructed to sort the <module> elements in ascending order by the tag attribute of each <module>. The @ prefix identifies an attribute name instead of an element name.

 <xsl:template match=”/fhx”>
  <root>
   <xsl:apply-templates select=”module”>
    <xsl:sort select=”@tag” data-type=”text”
        order=”ascending”/>
   </xsl:apply-templates>
  </root>
 </xsl:template>

The processor executes our next template for each <module> element that is a child of the topmost <fhx> element. The module matching template performs a shallow copy of the current <module> element to the output using the <xsl:copy> element. A shallow copy is one that copies a single node without including attributes or descendants. The result is one <module> element in the output for each <module> element in the input. Inside <xsl:copy> are three more XSLT elements. Because these three lie inside the opening and closing <xsl:copy> tags, their results will be output inside the <module> element created by the <xsl:copy> statement. In this way, we can create rich hierarchical output from any type of input XML. The next <xsl:copy-of> element performs a deep copy which includes a copy of all attributes and descendants (children, grandchildren, and so on). The <xsl:copy-of> element includes a select attribute that identifies four elements separated by the pipe (|) operator, which translates as a logical OR. The result is a full copy of <description>, <period>, <controller> or <type> elements that are children of the current <module> element.

The final two statements of our <module> matching template instruct the processor to find matching templates for each of the attributes (@*) of the current <module> element and to find matching templates for each <attribute> element that is a child of the current <module> element. 

 <xsl:template match=”module”>
  <xsl:copy>
   <xsl:copy-of select=”description|period|
             controller|type”>
   <xsl:apply-templates select=”@*”/>
   <xsl:apply-templates select=”attribute”/>
  </xsl:copy>
 </xsl:template>

Because Access will not import attributes, the next template must turn these attributes into elements as shown here.

Notice that this template does not specify each attribute by name, but instead processes all <module> attributes irrespective of name or value. Notice how the name() function is used within curly braces {}. The curly braces provide a means for using an expression to create dynamically an attribute value. The expression inside the curly braces is evaluated, converted to a string and the results assigned to the attribute. The name() function returns the name of the current attribute, which is used by the <xsl:element> statement to create an element in the output with the name of the current attribute. The result is an element for each attribute where the new element name matches the name of the corresponding attribute. The <xsl:value-of> element outputs the value of the current attribute as the text content or value for the newly created element completing the transformation from attributes to elements.

 <xsl:template match=”module/@*”>
  <xsl:element name=”{name()}”>
   <xsl:value-of select=”.”/>
  </xsl:element>
 </xsl:template>

The final template creates a single <attribute> element in the output for each <attribute> child of a <module> in the source XML file. This template is a bit more involved, so I won’t dissect it line-by-line. Instead, I’ll describe what is accomplished by this template, and let you figure out how it is achieved. This template must include the following functionality:

The source XML document separates parameter definition (<attribute>) from its value (<attribute_instance>). This template must normalize this information into a single parameter element so the resultant parameter table in Access will include its value as well as its data type, category, position on the control diagram, etc. The template is applied to each <attribute> so a variable is created that points to an <attribute_instance> of the same name within the same <module> from which the value is output.
A <ModuleTag> element must be added to each <Parameter> as a foreign key for relating a parameter to its parent <module> in the database.

The XML for the value of an <attribute_instance> differs for different data types. For example, an integer type value 25 is stored as <value><cv>25</cv></value>, whereas a named set type is stored as <value><string_value>some string</string_value></value>. The transform must return the value no matter what the data type.

 

Attribute Normal

<module tag=”FIC-101”
 plant_area=”TANKS”
 category=”FLOW”
 user=”johnsev”
 time=” 1095903496”>

Element Normal

<module>
  <tag>FIC-101</tag>
  <plant_area>TANKS</plant_area>
  <category>FLOW</category>
  <user>johnsev</user>
  <time>1095903496</time>
</module>

 

To support DeltaV named set data types, the output must include an <Enum> element that will contain the name of the named set found in the <attribute_instance>. This element must be included but empty for non-named set data types.

An attribute may be assigned multiple categories. A good relational design would have each category as a separate record in a related table. However, for simplicity sake, the category will be stored as a single field in the Parameter table as a comma-separated list.

The name and type attributes must be transformed to elements.

The parameter’s position on a control drawing defined by elements <x>, <y>, <h> and <w> must be transformed as children of the output <Parameter>.

Running the Sample
The sample files, including the results of the transformation, can be downloaded  at  www.controlglobal.com/XML3.html). Import the original XML file into MS Access both with and without the XSLT transformation to see the value of this transform and how well it performs.

Try this style sheet on your own DeltaV configuration—if you have one. If you have other XML data you would like to import into Access, try importing the data without a transform first. The XML structure may be suitable for import without a transform. If not, you have a great way to use what you’ve learned here and expand your reach with XSLT.

Conclusion
XSLT is a relatively easy language to master compared to a procedural language like VB. If you want to learn more you’ll find extensive online content and numerous publications. My personal favorite is XSLT Programmer’s Reference by Michael Kay from Wrox Press. The second edition covers XSLT 1.0 and 1.1. The third edition is also available which covers XSLT 2.0.

The final installment will provide more examples of using XSLT to transform XML data into something more useful.

XPath

Another standard from the W3C called XML Path Language (XPath) was developed for addressing parts of an XML document from within XSLT. XPath is a sort of query language for returning data subsets from an XML document that has its own syntax and rules like most languages. Within an XSLT stylesheet, an XPath expression may be used anywhere an XSLT element includes a select=”” attribute.

XPath expression look very similar to the pattern matching expressions in <xsl:template match=””> however the syntax and meaning of these expressions (match expression vs. select expression) is not the same. Most XSLT references provide a comparison of the differences between match expressions and select expressions otherwise known as XSLT Pattern Matching and XPath Expressions.

XSLT Processors

There are many XSLT processors available for download such as Saxon, Xalan and Microsoft MSXML. If you are working on a Windows XP computer, you already have a version of MSXML—most likely version 3 or version 4. All of the samples in this article were tested against MSXML4. The .NET framework also provides a .NET XSLT processor as well. Command line versions are available online for the XSLT processors mentioned here.

Different processors may produce slightly different results or may provide different extension functions that are not portable so to avoid problems it is a good idea to develop your transformations with a specific processor in mind.

 


  About the Author
John T. Sever, president of Cascade Controls, can be reached at johnsever@cascon.com.
Share Print Reprints Permissions

What are your comments?

You cannot post comments until you have logged in. Login Here.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments