What’s so big about big data? Isn’t it just more data—more of the same old information from the same places, which many users are finally waking up to and using? Well, yes, many of the usual data handling methods, software and devices are being trotted out again under the “Big Data” buzzword, and so they must have a new banner and be called something new even if they’re not.
However, despite the hype and distractions, there are persistent differences between traditional data and big data that can’t be ignored. Some of these differences include information sources not accessed before, data types not analyzed previously, and new management and storage technologies. These distinguishing features are summarized in big data’s oft-cited “four Vs:” volume, variety, velocity and value.
The challenges presented by big data are being met by augmented and new analysis tools, networking pathways and flexible cloud computing services. Most come from IT, and thanks to ever-lower microprocessor, software and computing costs, they’re now arriving in force in process control applications, on plant floors and in the field.
Dig deep, find treasure
Consequently, though cautious end users remain reluctant to migrate, others are finding ways to cope with all the new data streams coming from newly connected, Internet-enabled devices, identify previously unseen correlations and trends, and achieve unprecedented operating gains.
For example, Avangrid Renewables in Portland, Ore., collects lots of time series and other information from its 3,000 U.S. wind turbines and other generating assets, and seeks to coordinate it with related operations, independent system operator (ISO), weather, market and pricing data (Figure 1). Main sources include OSIsoft PI, SCADA, SQL databases and SAP. Avangrid wanted to examine and better visualize its existing OSIsoft content, so it could understand operations better and improve decisions.
Avangrid especially wanted to more accurately report and get paid by the ISO for lost generating capacity during required curtailment periods, but it needed deeper turbine ramp-down cost data to prove its economic losses. “We knew we were losing money, but determining the actual impact required investigating years of turbine data,” says Brandon Lake, senior business systems analyst at Avangrid Renewables.
To that end, Avangrid enlisted Seeq Corp. and its data investigation and discovery software, which integrates information from historians, databases and analyzers without altering existing systems. Its software uses a property-graph database geared toward querying relationships across nodes to work with data and relationships between data in objects called “capsules,” which store “time periods of interest” and related data used to compare machine and process states, save data annotations, enable calculations, and perform other tasks.
Lake reports that Avangrid tried to compile ramp-down data before using Excel, but it took too much time and labor. “With Seeq’s software, we were able to isolate shutdown events, add analytics and determine what was happening in just hours,” says Lake. “In the past, this would have taken days or weeks.”
Once its participating wind farms isolated shutdowns and ramp-down events, determined curtailment times, added pricing and other setpoints, and determined differential power-generation scenarios to determine losses, Seeq could export the data to Excel and identify revenue the wind farms could claim. Depending on its ISO contracts and wind availability or curtailment, Lake reports that Avangrid saves $30,000 to $100,000 per year.
What do you have? What do you want?
Despite its obvious advantages, big data is still a hard sell for many users because they must shift their data-gathering gears not just to new tools, but to new ways of thinking—mostly to understanding what big data is and how it can serve their applications and goals.
While traditional data architectures move structured information through an integration process before warehousing and analyzing it, Oracle Corp. reports in its “Enterprise Architect’s Guide to Big Data” that big data uses distributed, multi-mode, parallel data processing to handle its larger, unstructured data sets, and employs different strategies, such as index-based retrieval for real-time storage needs and map-reduce filtering for batch processing storage. Once filtered data is discovered, it can be analyzed directly, loaded onto other unstructured or semi-structured databases, sent to mobile devices, or merged with regular data warehousing (Figure 2).
“We’ve always handled many forms of information, and to us, big data begins with multiple streams and events produced by people, machines and processes. However, big data ties these streams together with heuristics and analysis, so users can relate what couldn’t be related before, and find new links and efficiencies,” says Ian Tooke, consulting director at Grantek Systems Integration in Oak Brook, Ill. “We do environmental controls for pharmaceutical and warehouse applications, and we can use big data to tie changes in environmental factors to the state of drugs in storage. For example, humidity can affect glue viscosity when we’re making corrugated substrates, but now we can use this new data to make adjustments, which can help improve the shelf life and effectiveness of some drugs. Similarly, there are many regulations for manufacturers about keeping pharmaceuticals cool in storage, but fewer rules for trucks and distribution warehouses, so we provide environmental monitoring in warehouses and on trucks.”
Jim Toman, lead consultant for manufacturing IT at Grantek, adds that food manufacturers are also looking at their control and support systems for more traceability by gathering environmental measurements, and applying statistics to help meet production line setpoints. “Food manufacturers are shifting from testing for poor quality to preventing it by ensuring that their process controls and documentation give them better traceability, and then maintaining that genealogy through manufacturing and distribution,” he adds.
“While traditional process data is stored in historians and crunched later for standard deviations, big data can also participate in more advanced statistical analyses, such as clustering, regression and multivariate modeling, and consider correlations among many more variables,” says Mike Boudreaux, connected services director, Emerson Automation Solutions. “The process control industries already do process optimization and predictive maintenance, and big data can enable predictive analytics and machine learning as open-loop advanced process control (APC). This is a gross generalization, but it puts these concepts in context because predictive analytics and machine learning use the same mathematics as APC, such as neural networks to model relations between data parameters.”