Settle differences to unlock models

Tony Paine of HighByte reports that implementing a DataOps layer can resolve differences between data sources, streamline analytics, and allow the publication of richer models

Jim Montague

May 20, 2021

5 min read

1660317469873 Ct2105coverdataanalyticsherov3

As users, system integrators and suppliers struggle with accelerating digitalization, they're running into the fact that making process data available goes beyond accessing the usual historian to involve a wider circle of technologies and sources.

"Real-time process information is readily available for operations and those responsible for a site, but it must be further refined and prepared for use in the cloud. Raw data by itself is useless on the IT side. What's needed is an information modeling framework," says Tony Paine, co-founder and CEO of HighByte, and formerly CEO of Kepware. "We used to collect operations data by leveraging a multitude of device protocols and making it available via OPC UA to industry-specific applications. This works great for on-premise solutions, where controls experts have innate knowledge of what the raw data means. Unfortunately, this is not enough when the data leaves the site and enters the enterprise, as new classes of applications are being used by non-operational users that can draw any meaning from the raw data. The data simply lacks standardization and context.

"We believe a new layer in the technology stack is needed to abstract away the collection, standardization, normalization and contextualization of data that’s necessary for preparing and generating reusable information by any system, application or user. This data operations (DataOps) layer bundles information close to its source, and publishes it as an enriched model for consumption to any platform. Though users worldwide can analyze data within their own sites, the sites themselves are isolated by site-specific naming conventions for tags, maps and other functions, and the context of the data is typically locked up in various applications or, worse yet, only known to onsite staff. This is why we need a central location to model and maintain information for enterprise analytics, providing a common way to represent units, current values, temperatures, setpoints, minimum/maximum settings and other parameters, and meet data governance requirements set by IT. Today, this is being rolled out as custom software development efforts, leveraging software development kits (SDK) from Microsoft Azure, Amazon Web Services (AWS) or other platforms. Control personnel aren't software developers. They're less concerned with developing scalable and maintainable solutions, and more concerned with keeping their plants operating to meet the various KPIs of the business. Custom development might work for small, proof-of-concept, site-to-cloud integrations, but it falls down quickly, when expanding the same concepts across all sites that need to tie into supply chain and customer networks."

Paine reports data analytics is moving beyond industry-specific networking and asset definition strategies like electronic device description language (EDDL), field device type (FDT) and even OPC to a more generalized DataOps layer that can take any kind of data that comes in, model it for data governance, and automatically transform the resulting information into a format that can be ingested by Azure, AWS, other databases or historians, or any other platform that may eventually need to use it. To serve as this layer, linking processes, technology and people, HighByte has developed its Intelligence Hub software, which provides connectivity to technologies like OPC UA, MQTT, SQL, Azure, AWS and REST-based web services, providing a user experience that's natural for operations personnel, while creating the necessary models to meet the requirements of IT.

"Users want visibility and locations of tags and data points that may be locked in applications, so IT must guide and define policies for data consistency in concert with OT. This lets them model any asset, process or system across their sites," explains Paine. "Local engineers can then fill in models by connecting to their local data streams. We provide Intelligence Hub as one place where users can document what assets are where, and then report what those assets are doing. This preparation includes removing noise from signals, normalizing data with the same units and definitions, and contextualizing data by adding metadata. This means a user may have a number indicating temperature, but additional metadata is needed to show if the value is in range, its origin, units of measure, and how should it be interpreted."

To achieve improved context for data and build better models, Paine adds that users should pick a measurement that is important to their process or organization, such as evaluating tank performance, to guide their search for the most appropriate solution. "By defining a specific problem they want to solve, they focus the scope of the solution and can provide value to the business quickly" says Paine. "They must first settle on one end result, break it into approachable blocks, evaluate data requirements, and determine how the data will be used. By starting small and providing immediate value, the solution should naturally scale to meet the needs of the business. In the end, DataOps must bridge the needs of IT and OT, and solve problems across their shared spectrum from onsite to the cloud. They need to leverage a piece of technology that meets the needs of both sides."

About the author: Jim Montague