What’s so big about big data? Isn’t it just more data—more of the same old information from the same places, which many users are finally waking up to and using? Well, yes, many of the usual data handling methods, software and devices are being trotted out again under the “Big Data” buzzword, and so they must have a new banner and be called something new even if they’re not.
However, despite the hype and distractions, there are persistent differences between traditional data and big data that can’t be ignored. Some of these differences include information sources not accessed before, data types not analyzed previously, and new management and storage technologies. These distinguishing features are summarized in big data’s oft-cited “four Vs:” volume, variety, velocity and value.
The challenges presented by big data are being met by augmented and new analysis tools, networking pathways and flexible cloud computing services. Most come from IT, and thanks to ever-lower microprocessor, software and computing costs, they’re now arriving in force in process control applications, on plant floors and in the field.
Dig deep, find treasure
Consequently, though cautious end users remain reluctant to migrate, others are finding ways to cope with all the new data streams coming from newly connected, Internet-enabled devices, identify previously unseen correlations and trends, and achieve unprecedented operating gains.
For example, Avangrid Renewables in Portland, Ore., collects lots of time series and other information from its 3,000 U.S. wind turbines and other generating assets, and seeks to coordinate it with related operations, independent system operator (ISO), weather, market and pricing data (Figure 1). Main sources include OSIsoft PI, SCADA, SQL databases and SAP. Avangrid wanted to examine and better visualize its existing OSIsoft content, so it could understand operations better and improve decisions.
[sidebar id =1]
Avangrid especially wanted to more accurately report and get paid by the ISO for lost generating capacity during required curtailment periods, but it needed deeper turbine ramp-down cost data to prove its economic losses. “We knew we were losing money, but determining the actual impact required investigating years of turbine data,” says Brandon Lake, senior business systems analyst at Avangrid Renewables.
To that end, Avangrid enlisted Seeq Corp. and its data investigation and discovery software, which integrates information from historians, databases and analyzers without altering existing systems. Its software uses a property-graph database geared toward querying relationships across nodes to work with data and relationships between data in objects called “capsules,” which store “time periods of interest” and related data used to compare machine and process states, save data annotations, enable calculations, and perform other tasks.
Lake reports that Avangrid tried to compile ramp-down data before using Excel, but it took too much time and labor. “With Seeq’s software, we were able to isolate shutdown events, add analytics and determine what was happening in just hours,” says Lake. “In the past, this would have taken days or weeks.”
Once its participating wind farms isolated shutdowns and ramp-down events, determined curtailment times, added pricing and other setpoints, and determined differential power-generation scenarios to determine losses, Seeq could export the data to Excel and identify revenue the wind farms could claim. Depending on its ISO contracts and wind availability or curtailment, Lake reports that Avangrid saves $30,000 to $100,000 per year.
What do you have? What do you want?
Despite its obvious advantages, big data is still a hard sell for many users because they must shift their data-gathering gears not just to new tools, but to new ways of thinking—mostly to understanding what big data is and how it can serve their applications and goals.
While traditional data architectures move structured information through an integration process before warehousing and analyzing it, Oracle Corp. reports in its “Enterprise Architect’s Guide to Big Data” that big data uses distributed, multi-mode, parallel data processing to handle its larger, unstructured data sets, and employs different strategies, such as index-based retrieval for real-time storage needs and map-reduce filtering for batch processing storage. Once filtered data is discovered, it can be analyzed directly, loaded onto other unstructured or semi-structured databases, sent to mobile devices, or merged with regular data warehousing (Figure 2).
[sidebar id =2]
“We’ve always handled many forms of information, and to us, big data begins with multiple streams and events produced by people, machines and processes. However, big data ties these streams together with heuristics and analysis, so users can relate what couldn’t be related before, and find new links and efficiencies,” says Ian Tooke, consulting director at Grantek Systems Integration in Oak Brook, Ill. “We do environmental controls for pharmaceutical and warehouse applications, and we can use big data to tie changes in environmental factors to the state of drugs in storage. For example, humidity can affect glue viscosity when we’re making corrugated substrates, but now we can use this new data to make adjustments, which can help improve the shelf life and effectiveness of some drugs. Similarly, there are many regulations for manufacturers about keeping pharmaceuticals cool in storage, but fewer rules for trucks and distribution warehouses, so we provide environmental monitoring in warehouses and on trucks.”
Jim Toman, lead consultant for manufacturing IT at Grantek, adds that food manufacturers are also looking at their control and support systems for more traceability by gathering environmental measurements, and applying statistics to help meet production line setpoints. “Food manufacturers are shifting from testing for poor quality to preventing it by ensuring that their process controls and documentation give them better traceability, and then maintaining that genealogy through manufacturing and distribution,” he adds.
“While traditional process data is stored in historians and crunched later for standard deviations, big data can also participate in more advanced statistical analyses, such as clustering, regression and multivariate modeling, and consider correlations among many more variables,” says Mike Boudreaux, connected services director, Emerson Automation Solutions. “The process control industries already do process optimization and predictive maintenance, and big data can enable predictive analytics and machine learning as open-loop advanced process control (APC). This is a gross generalization, but it puts these concepts in context because predictive analytics and machine learning use the same mathematics as APC, such as neural networks to model relations between data parameters.”
Wide net, big haul
No doubt the best-known aspect of big data is the namesake amounts of information its servers take in, though the less glamorous chore is making sense of it all, and putting that intelligence to use.
“Manufacturing generates more data than any other sector of the economy, but only a little bit of it is used, and so there are huge opportunities to create value by using that information,” says Bill King, CTO of the Digital Manufacturing and Design Innovation Institute at University of Illinois Labs in Chicago. “This data is produced at every stage of the manufacturing lifecycle, including design, assembly, operations, shipping, maintenance and end of life. It’s a powerful idea that these digital threads can connect, and that any part of the lifecycle can reach the other stages to do useful things. If we can design knowing more about manufacturing—and get data from users to flow back to the fabricators—then we can make different decisions in digital manufacturing and achieve greater value, but it’s still hard to get those stages to work together.”
Scott Howard, regional sales manager, Statseeker, adds that, “Big data begins by collecting whole bunches of information because at first its users don’t know what matters, so they gather everything, and then try to find correlations and statistical threads, such as more closely matching equipment performance to effects on quality and end products.” Statseeker makes a networking monitoring tool that checks participating ports every 60 seconds.
“We were dealing with big data before it was called big data. We created a big repository with tons of data in it, but the challenge was now that we’ve got it, what do we do with it? There’s no value in reams of data if it doesn’t help your operations or business,” adds Chris Hemric, P.E., technical services director, R.J. Reynolds Tobacco Co., Winston-Salem, N.C., who spoke during a panel discussion at Inductive Automation’s Ignition Community Conference 2016 in mid-September. “The lesson we learned is that you don’t want to save every point. You have to decide what’s the purpose in life for the points you want to save, so you don’t create clutter and information that’s too complex. Collecting absolutely everything is too hard for plant staff, so you need to decide, do I need that point or not?”
Hermic adds that an overall engineering and management team can talk about how to rationalize data, and decide what they need compared to what they’ve got. “We’re drowning in terabytes of data,” he explains. “However, ‘big data’ really is just more data, and it doesn’t necessarily add value. So, our process controls engineering group, control engineering group and manufacturing managers are working to decide. The control engineering group does factory automation and integration for the upper levels, while the process controls engineering group examines operating trends, OEE issues and other details. As a result, these three groups came up with five points for deciding what data is useful. These include: total product produced by the work cell, good product quality, rejects, work in process, and amount of work in intermediate work in process.
[sidebar id =5]
“We stumbled onto Ignition SCADA software, and added it to our new processes, including our largest manufacturing process with 70,000 tags, and we’re now connecting it to all our manufacturing, which includes 40 acres under roof. What’s really challenging is the pace of change, and how all the pieces of our SCADA and other systems are evolving in relation to each other. This is another way that Ignition helps because now we can put in standards for HMI and screen development, which reduces development time and cost and ultimately improves our product and quality.”
Big fish handling and filleting
Beyond its larger and more varied sources and the amounts of information it takes in, big data is also distinguished by how its information is handled and stored. While traditional data management involves gathering, compressing, simplifying and asking questions, and then slicing and dicing big chunks of information for analysis, big data is about shuffling many more small pieces of data through as fast as possible with database strategies like online transactional processing (OLTP) or online analytical processing (OLAP).
“Where we traditionally used RTUs to manage our wells and drilling pads, our construction schedules are so aggressive now, and bringing so many wells, controls and I/O into our central control pads, that it’s no longer efficient to use RTUs only,” says C. Kisha Herbert, PE, staff electrical engineer at QEP Resources, an independent crude oil and natural gas exploration and production firm in Denver. In addition, Herbert reports that QEP has built 32 drilling and production facilities since July 2012, and each has 160-220 I/O points. It also employs a variety of automated valves, manifolds and skid equipment.
To help automatically and quickly manage all the new data coming in from its new wells, pads, sensors, controls and other components, QEP recently adopted ControlLogix PLCs from Rockwell Automation. “On a typical QEP pad and production facility, well locations are protected through constant monitoring of protective shutdown devices; alarm and event logs are used to review and track specific information and recent events; and standardized ControlLogix PLCs and RSLogix software are helping us meet our aggressive schedules and maintain safe, standard process controls,” explains Herbert. “Understanding local regulations and requirements upfront and having good controls is a big help, but ControlLogix enables the remote I/O points at our remote pads to provide useful information to our central control pad. This is easier than using the former RTUs because they require a lot of linear programming to run their loops, routines and subroutines.”
Chirayu Shah, marketing manager, visualization and information software, Rockwell Automation, adds that, “Because information is coming in from so many more places, such as unstructured sources, video feeds and social media, users that can leverage this data can make more educated decisions. Process data is no longer isolated at sites where it’s generated, so it can also join with input from outside facilities such as business intelligence, gain a wider context, and function at higher levels in both large and small organizations.”
Though it also uses longstanding statistical tools, Emerson’s Boudreaux adds, big data’s innovation is that it applies them using web-based and cloud-computing services carried out by distributed computing and storage infrastructures, such as Hadoop, Cassandra, Mongo DB and others. “These services are based on data storage models that use map-reduction to reduce information into usable forms,” he said. “They also build server clusters that share data across many servers, and process data in parallel to return results quickly—much like searching, clicking and getting immediate results via the web and Internet.”
For process control users, Boudreaux adds that big data is an opportunity to get more insights, actionable intelligence and value from information they’ve been collecting for decades. For example, Batch Analytics software embedded in Emerson’s DeltaV DCS has protected services for taking process equipment information from valves and gas chromatographs, and uses big data methods to visualize them; take transactional snapshots to collect their histories and identify trends; and employ machine learning to predict equipment failures and model complex fault scenarios. This is why Emerson recently announced that its Plantweb digital ecosystem and Connected Services will be powered by Microsoft’s Azure IoT Suite.
“Because we didn’t have access to big data sets before, the traditional approach was theoretical and used physics models and equations, which were time-limited and had to generalize among many applications,” explains Boudreaux. “Now, we’re using big data to develop empirical equations and models describing actual behaviors, which are more effective because they’re individualized. As computing gets faster and software costs go down, we get closer to continuous monitoring and management that’s more tailored to each user.”
Organize and analyze
Of course, big data only delivers on its even bigger promises when users can get useful nuggets they can turn into better decisions, efficiencies and profits. In process control applications, this often means better predictive analytics and maintenance, and/or improved remote monitoring and optimization.
[sidebar id =3]
For instance, Sierra Nevada Brewing Co. in Chico, Calif., recently enhanced critical temperature controls on its fermentation tanks by improving its data visualization tools with Ignition SCADA software, and efficiently sorting through reams of batch data to proactively identify failing resistance temperature devices (RTD) and other issues. The brewery has about 100 tanks at its plant, including two main brewing cellars, each with 10 to 14 beer fermentation tanks (Figure 3). Each of the 15-foot-wide tanks has layered zones and two or three RTDs that can generate two different data flows for the same three- or four-week batch. Temperature is controlled by the solenoids and glycol flowing to jackets on the tanks and by an overall chiller plant.
“We wanted, at a glance, to show there was either no problem or there was an item we needed to look at,” says David Lewis, technical services manager at Sierra Nevada. “We also wanted bar graphs showing number of batches per year per tank, as well as number out-of-spec incidents. We also wanted to drill into data about the last several batches, so we could check the details of several out-of-spec incidents.”
Ignition let Sierra Nevada moved batch data into tables from 200-300 tanks, brew kettles and supporting devices. “We capture data every five minutes, and we already had about 10 years worth of information in our database,” says Lewis. “In early alert attempts, we had to determine which RTD was indicating it was beginning to fail, for example, by causing the chiller to run on. We also needed to figure out which data tails to exclude, though this might mean missing some stuck solenoids, and we had to avoid email overload. So, we stepped back, prioritized our data, and tried to make it more proactive. We also required an appropriate context in which we could see everything together, so we’d know when operations were happening that were supposed to be happening, instead of looking tank by tank and batch by batch. This meant bringing in much more data, but then separating useful signals from noise.”
Because data is captured in five-minute intervals for the brewery’s more than 100 tanks, Lewis reports this generates about five software database rows of data per tank per batch. At about 2,000 batches annually, Sierra’s brewing operations produce a total of 2 million software database rows per year. To access this data and begin to improve decisions, he adds that he and his colleagues are using Dynamic SQL programming to build queries for their database. Sierra also uses Tableau data visualization software to view database results, which are displayed in conjunction with Ignition software.
“Our MES is located on one server and batch data is on the DAQ server. This means critical data was on different servers, but our initial cross-server queries weren’t working, and replicating data from one server to another was too cumbersome,” says Lewis. “We needed data from 2,000 batches with their own start and stop times, so we ran a query string using Dynamic SQL, joined one with another, did it 2,000 times, and it crashed. So, we threw a Hail Mary, and cut and pasted the 2,000 queries into Tableau, and five minutes later, we got the 2 million rows we needed.”[sidebar id =6]
Lewis adds that all displays for its newly enabled database were built with Ignition, which allows users to click on each batch and see a profile for it. “Then we can use known and previous failure patterns to better determine when the next RTD is going to fail,” adds Lewis. “We can see drift and behavioral changes in the graph for a batch, and fix problems before they become failures.”
Similarly, Nick Moceri, president of SCADA Solutions in Irvine, Calif., showed how his company is using IoT-based remote monitoring and control to let its client’s legacy wind turbines ramp electricity production up or down more quickly in response to fluctuating grid demand and to avoid negative pricing, which may require 200 turbines to be adjusted in 10 minutes or less. SCADA Solutions added its in-house software to Opto 22 controllers and other components.
“Many older wind farms weren’t built with the Internet in mind, but now they need to add Internet protocol (IP) switches that are addressable to a server, so they can be brought into a central location,” says Moceri. “IP-addressable switches, controls and servers allow us to get ahead of the game, optimize production, and perform predictive maintenance that extends turbine life. In fact, one wind farm in Palm Springs, Calif., went from flat results to a 16% production improvement and complete return on investment in just three months.”
To begin to implement a big data strategy, Grantek’s Toman suggests it can’t be done merely from the ground up, and instead requires an organization-wide strategic plan. “You can’t just look at the needs of individual silos. You have to reach out to the rest of the company, evaluate systems in place, and identify other data silos that can be leveraged,” says Toman. “Many times, organizations have blinders on, so they need to bring in someone who isn’t in the existing culture to poke around, and mediate between the engineering and IT sides on how they can adopt some best practices and standards for big data. This isn’t just converging data technologies; it’s about convincing process people to practice and benefit from them.”
[sidebar id =4]