Solutions spotlight: How cloud is reshaping the process data landscape

April 20, 2020

16 min read

Editor in chief Keith Larson guest hosts this Solutions Spotlight edition. In this podcast, Keith speaks with Megan Buntain, director of cloud partnerships for Seeq, about the rise of cloud architectures in the process manufacturing landscape and how it's reshaping the process data landscape.

Transcript

Keith: Hi, this is Keith Larson, editor in chief of Control magazine and ControlGlobal.com. Welcome to this Solutions Spotlight edition of our Control Amplified podcast. Today I'm joined by Megan Buntain, director of cloud partnerships for Seeq, and we're here to talk about the rise of cloud architectures in the process manufacturing landscape, and in particular about how the cloud is reshaping the process data landscape and how process engineers and other citizen data scientists do their work.

Welcome Megan, and thanks for joining us.

Megan: Thanks Keith for having me.

Keith: I think we'll dive right in. It seems more and more process manufacturers are turning to the cloud for really centralized storage of the process data that once used to be in local historians. What are some of the trends you see driving the increased use of cloud storage for process data, and what considerations go into an organization's decision to go full on-cloud, hybrid or stay on-premise with that data?

Megan: Yes, we are seeing manufacturing companies and process industries move beyond consideration of cloud technologies, even pilots of cloud and more driving into that initial adoption wave where IoT and process data, the brownfield process data, are being moved either for production purposes or analytics purposes to cloud platforms. And really, this effort starts with just a general acceptance or adoption of cloud overall and typically that starts outside the OT or process data realm.

So, an enterprise manufacturer may start with Microsoft Office for workers and then move critical ERP applications like SAP to the cloud. And once that happens, the value starts to build around financial data, customer data, systems data and the kind of scaling efficiency that can be gained by bringing this data together in the cloud. And then companies start to look at, well why not our process data? We have been storing potentially petabytes of process data with rich knowledge captured in that data of the performance of assets and processes over time.

So, it really starts outside the OT realm, that comfort level with cloud and moving beyond initial security or data privacy concerns. And then as process data is considered, it's really three things that we're seeing. The first is IoT. So the wave of IoT, the sensorization of new devices and assets brings in new data streams that more by their nature are sort of born in the cloud, that data streams in off of the sensor and goes straight to cloud.

And the second is in looking at that data that's been in historians, the brownfield data, the ability to aggregate that data to develop global operational KPIs. That's the key trend. So how do I look across manufacturing sites and facilities to drive more global operational KPIs and impact?

And then the last one is really machine learning. So, as machine learning's taken hold in the ability to predict outcomes with more fluency and capability, most manufacturers don't want to do that in the same realm as their process control network. They want to make copies or make that data available for advanced analytics and machine learning. So IoT, global operational reporting, and machine learning are driving this. But I did want to touch on one thing, it will be, we imagine, years and years before a manufacturer would consider an all cloud deployment. There are absolute reasons to have hybrid deployments where some data is as close to the site of manufacture and to the operator as possible and other data is more relevant for aggregation in the cloud.

Keith: Yeah, we definitely see that that hybrid approach of some kind. I mean, it makes a ton of sense for some data to be close to where it's generated and close to where it's needed for decision making. So, that makes a ton of sense.

Megan: And one key difference there, Keith, is the concept of real-time. So real-time in a process control network is milliseconds, and the impact of being able to make a decision for an operator in a real-time mode is very impactful from a safety and health and a process optimization perspective. Whereas when we think about cloud and analytics, we're in the near-real-time realm where seconds or even minutes of data lag may be okay for the purposes of those analytics. So that's a key consideration is real-time and the impact of the decisions that you need to make in real-time versus near real-time.

Keith: Yeah, I always use the analogy of if you're in a self-driving car, do you really want your car querying the cloud before it puts on the brakes, you know what I mean? It's not exactly the ideal situation so that makes sense.

You obviously mentioned the analysis and doing the analytics on this data. Does moving the process data to cloud make analysis easier for using tools like Seeq's on that data once it's in the cloud versus on-premise installations in historians around the globe? I would imagine it's more straight forward in a lot of ways.

Megan: Well, it depends on the purpose and the outcome. And so, if you deploy analytics applications like Seeq in the cloud and connect that analytics application to on-premise data sources or cloud data sources, you want your process engineers, your subject matter experts, they don't care where the data is. They want to be productive, they want to easily access that data and dive into analytics so that they can diagnose issues and predict outcomes.

So, if you're serving that subject matter expert and helping them be productive as possible, the way the cloud really comes into it in deploying applications in the cloud is really about scale. So, as we think about how do you get hundreds to thousands of people across larger organizations to have access to modern analytics tools across dozens of data sources. So, scale is a key capability.

And the second piece in cloud is around the rate of innovation. So, if you're deploying cloud analytics, software companies have that ability to quickly update and upgrade. We're used to this in so many of our cloud applications, we use on our phones and every day the updates aren't held for major releases. And so, the rate of innovation that a software company like Seeq can deliver and that the process engineer or teams or managers or analysts can take advantage of is much, much quicker, and the change management process is much easier because you're just dripping out updates over time versus these large upgrades. So those are two key things on cloud is scale and the rate of innovation.

Keith: Yeah, that makes a lot of sense. You mentioned machine learning as well. Obviously,, that's kind of an important add on to the whole concept of more traditional analytics I would say or at least how I think of it in terms of closing the loop on what those mean. Do you want to explain a little bit of how machine learning works together with data analytics and what new benefits or capabilities that's bringing to your customers?

Megan: Sure. So, the one thing I would caution is there's still so much value to be gained from core analytics before the jump to machine learning. So even when process engineers have access to process data from their historians, there are trending and visualization tools they've been using, traditional BI applications, we still see such a vast majority of the subject matter expertise engineers using Excel. Using Excel to wrangle data, using it to do diagnostics in that data.

And so before the jump to machine learning, the very first priority is how do we give modern analytics tools to those subject matter experts whom know the assets and processes and the data very well, and how do we capture their knowledge and create workflows from that knowledge so that from raw materials into finished goods out, we've connected more of that knowledge and those analytics to drive impact.

And we're seeing that right now in this environment around coronavirus just in terms of the remote working scenario. So now, we have engineers and operators in some cases who are working from home who need to monitor, diagnose and drive decisions on assets that they're not in front of and they're not in the same location. So, there's a lot the value that can be captured.

As we look to machine learning where machine learning comes in and in order for it to be successful, just really think about a couple of things here. The first is typically machine learning comes in for two or three reasons. One is obviously this idea of prediction. So preventative maintenance, being able to take this value from your historical and your present data and use machine learning tools to be able to predict equipment failure or batch quality or other places to optimize. So, prediction comes into play as well as flexibility.

So, one thing that occurs quite a bit that a lot of people don't think of with machine learning is the ability to create custom visualizations. So, not being bounded by the way you visualize and can gain insight from data by the visualizations that are within a particular tool, but creating your own through machine learning. But there's a couple of critical things there. One is that machine learning doesn't work unless there's context on the data and the subject matter expert knows that industrial data knows those assets.

And so, just to spark up machine learning with the data science team as an example, you know, we run into that the joke of, you know, we sent a set of data over to a data science team who ran some great algorithms only to tell us that the equipment was off, you know, on a Sunday instead of a week back and forth to have that kind of outcome.

So, context is really critical in the ability to bring engineering teams and data science teams together to work in tandem is important. But the real excitement around machine learning is if these tools are open and transparent and this idea of bring your own algorithm, you can leverage machine learning capabilities to do learning at scale. And the excitement around process data is that, you know, in many other industries in order to take machine learning and really drive the impact of that, they don't have enough data. Well, there's a ton of data. There's usually a lot of data on the process side and so there's value to be added there. So as long as you don't jump over just the value of general analytics to machine learning and if you ensure there's context and collaboration when you bring machine learning into a process, you're more set up for success.

Keith: That makes a lot of sense. Take care of the foundation first and then look at next level of machine learning beyond that.

Megan: And part of that too, Keith is trusting the outcome. So, if there isn't transparency and subject matter experts engaged in the process, it's a huge change management challenge to convince someone with 10, 20, 30 years of experience that the data that's coming through a machine learning experiment is going to be valid. And so, having that collaboration early and often is very important from a change management side.

And then secondly, the capturing is, with machine learning there's, you know, obviously the idea or the emphasis is on the learning so that our data set, our models, can learn over time. And when that happens, you're actually capturing or codifying the knowledge of some of the most experienced engineers that you have in your facility and applying that at scale across dozens of assets or processes.

Keith: We talked a little bit about the importance of real-time execution. Where do you find normally that machine learning would reside because that certainly seems to me something that would be closer to real-time than more of the traditional offline or near real-time type of analytics? Does that sit close to the process control network typically or where have you found that that tends to be most useful when you're trying to close the loop on some of this stuff?

Megan: Well, it's early days but we're seeing both. We're seeing this concept of, let's just call it a center of excellence usually driven through an innovation team or a digital transformation team that is tasked with aggregating data from various data silos into potentially an industrial data lake in the cloud. And to have those digital teams really look for aggregate trends in their data.

And so usually the outcome here is to think about what can be learned when we look at large volumes of data across our processes, both upstream and downstream through the course of a process and across our supply chain. So, you see that willingness or need to sort of pull the data up and aggregate it and look for trends that could be inferred over a large set of data. That's one pattern.

And then in the real-time element that you mentioned, this is in the area of edge analytics, and so this idea of real-time decision support. So you have a scenario where there's an operator and let's imagine that they're in the quality side and so they're, either manually or with specific tools, checking for quality of part of the process of the manufacture of a product. So that machine learning edge analytics, first of all, we can leverage that interaction between the quality engineer or operator who's deciding what makes good product or bad and bring that data and train a model, a machine learning model and then over time have the model identify what's a good or a bad batch or what decision could be made.

And then the third wave of that is now we have an edge analytic where there's a screen that that engineer's looking at and you've got real-time decision support from that machine learning algorithm that's been applied to support that decision process. So that's one example and we see both patterns. So, to that aggregate, how do I get global operational value or value chain efficiencies across my organization through applying machine learning and analytics on different data types and then the real-time decision support with edge analytics for the operator engineer.

Keith: Yeah, that makes a lot of sense. You mentioned that we're still kind of early in the application of analytics in a lot of areas. I mean, how do you see the preparedness of process engineers for creating their own algorithms in machine learning? Is that something that they're ready to take on? Citizen data scientists turn citizen AI implementer? Is that something people are asking for? Are they ready to do that?

Megan: Yeah, I think there's a range and I think the first piece is that if you can get an engineer on board first and foremost with tools that take away some of the challenges of this, the data wrangling. No engineer wants to spend all of their time or 80% or 70% or 50% of their time wrangling the data. They want to apply their knowledge and skills in the highest value way.

And so the more we can make advanced analytics tools and machine learning available easily without significant amounts of training, leveraging first principle capabilities, that's the first job right is just to drive that productivity higher and to not dramatically change the way that an engineer works. Just give them better tools.

However, we are also seeing a wave of engineers coming out of school diving into industry who are keen to learn Python, who are keen to learn some of these programming languages and they don't intend or expect to be data scientists, but with some basic Python skills and with a set of tools, they're pretty agile in terms of taking a work effort that may have taken, you know, days or weeks in a previous working style and then applying some of these capabilities and leveraging machine learning models that are open source and collaborating as an engineering community on these models so that you can take advantage of some other team's great work.

So, the productivity's really the goal and there's no expectation that engineers will become data scientists, but there's certainly an eagerness for engineers to build on their skills to drive more value.

Keith: Interesting. So, even though we're losing some of the seasoned old subject matter experts, we do have young engineers coming in with new skills that are relevant as well.

Megan: Yeah, and it's the balance. How do you capture and knowledge capture and collaborate not just within engineering teams, but across different functional teams? And first of all, capture that experience and that knowledge of someone who's been in the plant for 10, 20, 30 years and just understands how systems work, all the way through to working with an engineer who is keen to try some new tools and new capabilities and apply them. So, it's a balance in the best possible world. It's a balance of those teams working together.

Keith: Great. It sounds like we all have our work cut out for us, so really appreciate you taking the time to chat, Megan. I really appreciate you sharing your insights and joining us today. So, stay well and stay safe.

I'm Keith Larson and you've been listening to Control Amplified podcast. Thanks for joining us, and if you've enjoyed this episode, you can subscribe to future episodes at the iTunes store and at Google play. Plus, you can find the full archive of past episodes at ControlGlobal.com. Thanks again again, Megan. I appreciate it. And we're signing off until next time. Thank you.

For more, tune into Control Amplified: The Process Automation Podcast

About the Author

Control Amplified:

Control Amplified: The Process Automation Podcast

The Control Amplified Podcast offers in-depth interviews and discussions with industry experts about important topics in the process control and automation field, and goes beyond Control's print and online coverage to explore underlying issues affecting users, system integrators, suppliers and others in these industries.