Interested in linking to "The ABCs of XML, Part 2"?
You may use the Headline, Deck, Byline and URL of this article on your Web site. To link to this article, select and copy the HTML code below and paste it on your own Web site.
08/10/2006
THE WORLD-WIDE Web Consortium's XML Recommendation opens with a list of 10 design goals. The first goal states: “XML shall be straightforwardly usable over the Internet.” Straightforward or not, eXtensible markup language (XML) is used extensively over the Internet, and has become a de facto standard for data interchange.
Because of its association with Internet applications, however, automation engineers have sidestepped XML, assuming it has little applicability to their daily work. This is a mistake. Though this technology has been used extensively for Internet-based applications, XML is an extremely simple and flexible data format with untold uses waiting to be uncovered and implemented in industrial controls and automation.
ADVERTISEMENT
|
SAMPLE FILES | |
>> Transformation Results |
Alone, XML data is simply raw text that has little to offer automation engineers. But XML isn’t alone. Developers everywhere have jumped aboard the XML bandwagon to create a seemingly bottomless reservoir of tools, applications, services, and standards all designed to create, consume, translate, store, and present XML data. This infrastructure of supporting applications is what makes XML such a compelling choice for application data. This article will introduce XML’s fundamental concepts for those who have so far managed to avoid this important technology. Parts 3 and 4 of this four-part article will address XML supporting technologies. [“The ABCs of XML, Part 1” ran in CONTROL, June ’06.]
Not a Typical Language
XML isn’t a language in the sense that there are defined keywords, functions, or statements. XML is often compared to hypertext markup language (HTML) because it works well with HTML applications, has similar markup, and has been joined with HTML to create the XHTML specification. However, the HTML specification defines a list element tags like <body>, <h1>, <b>, and <i> with defined behavior for HTML browsers.
XML lacks a defined set of tags and allows anyone to create their own set of tags and attributes to suit their own application needs. Instead, the XML specification defines a set of markup rules that must be followed for the marked up text to be interpreted as XML data.
10 Well-Formed Rules
XML is organized in a logical or physical structure called a “document.” An XML document may be a file on disk, it may be streamed from a server, or it may be hard-coded text inside an HMI VBA application. Though the data may have many different sources, the document metaphor still applies as long as it’s “well formed.” To be considered a well formed, an XML document must adhere to the constraints defined in the W3C’s XML specification. These constraints can be distilled into 10 easy rules.
Comments
In addition, the markup for a comment in XML is identical to HTML. The comment opens with <!-- and closes with --> and may span multiple lines.
XML Declaration
An XML document may begin with an optional XML declaration. The XML declaration must precede all other content, and isn’t considered part of the XML document. It’s used to provide information to XML processors about the document's content. Because the declaration is not an element, it must not have a closing tag. The declaration looks like this:
<?xml version=”1.0” encoding=”utf-8” standalone=”yes”?>
If the declaration is included, version is the only required attribute, and must have a value of either 1.0 or 1.1. Version 1.1 supports special Unicode character handling functionality that’s rarely needed, and, therefore, version 1.0 is used almost exclusively.
The encoding attribute defines the character encoding used by the document, so that an XML processor may properly parse the document. The default encoding used by XML processors is UTF-8.
Processing Instructions
Any number of processing instructions may appear below the XML declaration and before the root element. It must be enclosed in <? and ?> like the XML declaration, and provides application-specific handling information. A Microsoft Word 2003 XML document may include the following processing instruction, which tells the Windows operating system to identify the XML document as an MS Word file. When double-clicked, an XML file with this processing instruction will open in MS Word, as shown:
<?mso-application progid="Word.Document"?>
XML Validation
An XML document has a specific structure of element names, attribute names, and hierarchical parent-child relationships. As long as a document meets the requirements for “well-formedness,” it can have any structure and contain any data. This flexibility is what makes XML extensible.
However, applications that interpret XML documents have expectations that the XML will adhere to a particular structure. Validation is the process of checking an XML document for conformance to a defined structure or schema. A schema can be defined in the XML document, or a reference can point to an external schema document. There are multiple standards for defining a schema including Document Type Definition (DTD), XML Schema Definition (XSD) language, and XML Data Reduced (XDR). An XML document that that adheres to a defined schema definition is judged to be “valid.”
A schema isn’t required when developing XML applications, and, in fact, can significantly complicate XML application development. When you control a document's content and related applications, you can work more efficiently without a schema.
Software vendors that support XML data normally publish a schema, so other applications can properly validate content before working with a document. A control system vendor that supports import of XML data into the control system will likely validate a document before the import process to prevent loading data that may lead to a control system fault.
Namespaces
XML namespaces solve a problem that can occur when an element name may have different meaning within a single document. For example, the element name template is an XSLT keyword, and its meaning is different than the template element used in an MS Word XML document. All elements and attributes in an XML document are included in a namespace, even if a namespace isn’t explicitly declared. When no namespace is defined in a document, content is included in the default null namespace. A namespace may be defined as an attribute of the start tag of an element with the following format:
xmlns:prefix="namespaceURI"
Where a namespace is declared for an element, all child elements with the same prefix are included in the same namespace. The element where the namespace is declared may also be included in the namespace if the same prefix is used in the element name.
<cc:Recipe xmlns:cc="http://www.cascon.com/Recipe">
<cc:Step cc:XPos="600" cc:YPos="600" AcquireUnit="yes">
In the sample above, the prefix cc refers to the namespace http://www.cascon.com/Recipe. Elements included in this namespace include recipe, step, and name. The element UnitAlias and the attribute AcquireUnit are included in the default null namespace.
The namespace prefix cc serves as a shorthand or alias notation for the full namespace http://www.cascon.com/Recipe. The actual namespace may be any string value but it is meant to be globally unique. XML parsers don’t enforce uniqueness, nor do they expect any particular notation such as a web universal resource identifier (URI). A web style URI is frequently used because a real web URI like http://www.cascon.com/ is guaranteed to be globally unique across the Internet, which greatly minimizes the chance of colliding namespaces.
To simplify this example, the namespace can be declared as the default namespace with no prefix as shown in the following example:
<Recipe xmlns="http://www.cascon.com/Recipe">
<Step XPos="600" YPos="600" AcquireUnit="yes">
<Name>Feed:1</Name>
<UnitAlias>Unit500</UnitAlias>
</Step>
</Recipe>
Notice that the namespace attribute xmlns no longer includes the prefix definition cc. Without a prefix, a namespace becomes the default namespace for the element where it’s declared. This makes the Recipe element and all its child element members of the namespace http://www.cascon.com/Recipe. Default namespaces apply to elements only, not to attributes. Therefore, the attributes of the Step element are included in a null namespace (equivalent to xmlns=""), and not the default namespace. This special behavior for attributes can be quite confusing. This quirk of default namespaces isn’t difficult to work around as long as you understand how it works.
You'll likely not need to bother with namespaces in documents created for internal purposes. However, you'll need to understand namespaces when working with vendor-generated XML files. You will see the importance of namespaces in Part 3, which will cover XSLT.
THE SAMPLE fragment of XML software code (below) was lifted from a recipe exported from Rockwell Automation's RSBatch product. Rockwell defined the element and attribute names for describing an RSBatch recipe. Other system vendors may define a separate set of elements and attributes to describe a batch recipe. If you are a system integrator that works with batch recipe software from different system vendors, you may choose to define your own system-agnostic batch recipe XML data structure (or schema) for internal development purposes that can easily be converted to/from a vendor specific structure.
<!-- This is an XML comment -->
<Step XPos="600" YPos="600" AcquireUnit="true">
<Name>FEED:1</Name>
<StepRecipeID>FEED</StepRecipeID>
<UnitAlias>UNIT500</UnitAlias>
<FormulaValue>
<Name>AGIT_SPEED_SP</Name>
<Display>true</Display>
<Value />
<Real>75</Real>
<EngineeringUnits>RPM</EngineeringUnits>
</FormulaValue>
</Step>
The first thing to notice about the code fragment is the angle brackets (< and >) which mark an XML “tag.” An XML “element” includes a start tag like <Step>, an end tag like </Step>, and everything between the two. Notice that the Step element contains four child elements Name, StepRecipeID, UnitAlias, and FormulaValue. The FormulaValue element contains five child elements. This parent/child relationship between elements shows that XML can support hierarchical data structures.
XML is designed to be self-describing. A document's data is stored in element values and attribute values, so that element and attribute names describe the data they hold much like data is described in a relational database by table names and field or column names.
Data stored as an element value is the text between start and end tags. In this sample, the value for the element EngineeringUnits is RPM. Data stored as an attribute value is found on the quoted right side of a Name="Value" pair. In our example, the Step element contains three attributes named XPos, YPos, and AcquireUnit, which have attribute values of 600, 600, and true respectively.
Resources on XML
There are many reference books and online resources for learning more about XML. Recommended online resources include:
| About the Author |
ControlGlobal.com is exclusively dedicated to the global process automation market. We report on developing industry trends, illustrate successful industry applications, and update the basic skills and knowledge base that provide the profession's foundation.