THE WORLD-WIDE Web Consortium's XML Recommendation opens with a list of 10 design goals. The first goal states: XML shall be straightforwardly usable over the Internet. Straightforward or not, eXtensible markup language (XML) is used extensively over the Internet, and has become a de facto standard for data interchange.
Because of its association with Internet applications, however, automation engineers have sidestepped XML, assuming it has little applicability to their daily work. This is a mistake. Though this technology has been used extensively for Internet-based applications, XML is an extremely simple and flexible data format with untold uses waiting to be uncovered and implemented in industrial controls and automation.
Alone, XML data is simply raw text that has little to offer automation engineers. But XML isnt alone. Developers everywhere have jumped aboard the XML bandwagon to create a seemingly bottomless reservoir of tools, applications, services, and standards all designed to create, consume, translate, store, and present XML data. This infrastructure of supporting applications is what makes XML such a compelling choice for application data. This article will introduce XMLs fundamental concepts for those who have so far managed to avoid this important technology. Parts 3 and 4 of this four-part article will address XML supporting technologies. [The ABCs of XML, Part 1 ran in CONTROL, June 06.]
Not a Typical Language
XML isnt a language in the sense that there are defined keywords, functions, or statements. XML is often compared to hypertext markup language (HTML) because it works well with HTML applications, has similar markup, and has been joined with HTML to create the XHTML specification. However, the HTML specification defines a list element tags like <body>, <h1>, <b>, and <i> with defined behavior for HTML browsers.
XML lacks a defined set of tags and allows anyone to create their own set of tags and attributes to suit their own application needs. Instead, the XML specification defines a set of markup rules that must be followed for the marked up text to be interpreted as XML data.
10 Well-Formed Rules
XML is organized in a logical or physical structure called a document. An XML document may be a file on disk, it may be streamed from a server, or it may be hard-coded text inside an HMI VBA application. Though the data may have many different sources, the document metaphor still applies as long as its well formed. To be considered a well formed, an XML document must adhere to the constraints defined in the W3Cs XML specification. These constraints can be distilled into 10 easy rules.
- XML is just plain old text
XML is designed to be human-readable as text. This means any text editor can be used with an XML document. A simple text editor will treat an XML document just as it would an INI file, a CSV file, or any text file.
- XML is data
XML is designed as a flexible self-describing data structure. By itself, XML cant do anything, nor does it define how data should be processed or handled. By contrast, HTML includes both data and a description of how it should be displayed in a browser.
- XML documents must have one root element
There can be only one top-level root element in an XML document, and all other elements must be between the root element start and end tags.
- XML white space data is preserved
HTML reduces consecutive white space characters to a single space character. With XML, white space is interpreted as datajust as any other character.
- XML naming rules
Element names cant include white space, must start with a letter, and cant include characters that are used for markup such as <, >, ;, &, among others. Its generally a good idea to limit element and attribute names to letters, numbers, and underscore.
- XML elements must be closed
An element can be closed with an end tag, or optionally with the shorthand notation for empty elements. By contrast, HTML doesnt require that elements be closed. In fact, most browsers will attempt to render any HTML element whether or not its closed properly. XML is not so forgiving. An empty element is one with no value and no child elements, though it may have attributes. An empty element may be closed with the shorthand notation /> at the end of the start tag. For example <Value /> is equivalent to <Value></Value>.
- XML elements must be properly nested
In HTML, elements were allowed to overlap like this: <b><i>bold and italic text</b> italic only</i>. This type of element crossing is not allowed in XML. An element that starts inside a parent element must end inside the same parent before the parent element is closed.
- XML is case sensitive