Exploring historians & data analytics – Part 2

Sam Russem expands on the first discussion about the capabilities of today's modern historians and data analytics

April 4, 2019

10 min read

Building on Part 1, executive editor Jim Montague speaks with Sam Russem, director of the Smart Manufacturing practice at Grantek Systems Integration, a CSIA-certified system integrator in Philadelphia. Russem expands on the first discussion about the capabilities of today’s modern historians and data analytics.

Transcript

Jim Montague: Hi, this is Jim Montague, executive editor of Control magazine and ControlGlobal.com, and this is the fifth in our new Control Amplified podcast series. In these recordings, we’re talking with expert sources about important topics in the process control and automation field, and try to get beyond our print and online coverage to explore some of the underlying issue effecting users, system integrators, suppliers, and other people and organizations in these industries.

For our fifth podcast, we’re talking with Sam Russem, director of the Smart Manufacturing practice at Grantek Systems Integration, a system integrator with offices throughout the U.S. and Canada, and a certified member of the Control System Integrators Association (CSIA). Sam provided some excellent input for our two upcoming stories on historians and data analytics, and I thought this might translate well from our usual print and online venues into this audio format.

And since we’re covering two stories on two topics, I’m attempting to do two podcasts about them, so this will be Part 2. I think I can justify this plan because conducting multiple interviews for a print story is what eventually give it depth, and I’m pretty sure that will be the case here as well.

Okay Sam, sorry for the long preamble, and thanks for joining us today.

Sam Russem: Yeah, thank you, Jim. I’m really happy to be here, and I think it’s great that Control is doing this podcast series. I definitely have a lot of coworkers and people I work with that have been looking for something in this space, so I’m happy you’re doing it and I get to contribute.

JM: Okay. Alright, let’s get started then. First off, if historians have evolved to the point that any device with a microprocessor and software can be set to collect and store long-term data, then what the heck characterizes a historian, quote-unquote, today?

SR: So, good question, Jim. I would really say that the core definition of what makes a historian really hasn’t changed, it’s more the platforms that can host that historian and the versatility of options that that opens up when you can put them in more flexible places.

So, let’s start with that main definition. That core definition of what a historian is is generally it is concerned with the efficient collection and storage of time-series data. So, really the difference between what I would call a real historian versus like a data logger, is the speed and the scale of it. So, a historian is going to be able to read manufacturing data at very high rates, usually a second or faster, it can certainly go longer, but it needs to be able to do that high-speed recording, and it usually uses some types of compression algorithms on the back end to efficiently store that data for the long term.

One of the other main features of a historian that we’re looking for is the ability to store and forward information, too, right? So, it’s great to even have a local historian on a machine or on a line that is responsible for that high-paced, local logging, and then it’s usually going to send that to a higher-level, enterprise-level historian for longer-term storage. Now, where the store and forward piece comes in is accommodating for the situations where the local data-logger and your enterprise logger have become disconnected for some reason, you still want that local historian to continue to collect data and then seamlessly synchronize that back up to the master historian once that connection is re-established. So, really what I’m talking about is you’re asking around microprocessors and kind of being able to put a historian in more places, the core features are the same, but absolutely as we’ve increased our compute power and our storage capacity, the footprint required for a historian is a lot smaller than it used to be, you can put them in various places in your system architecture, as opposed to one big single site server.

JM: So, they’re kind of more function-based than a particular hardware-type item. Is that the mental shift people have to make, do you think?

SR: Yeah, definitely. If you’re thinking of a historian as a single-use piece of hardware that’s put in, it’s definitely a more flexible piece of software than that. So, it’s really a matter of, that core function is the same, if you need to be collecting data and storing that for a very long time at a very fast rate, a historian is your tool to do so, but there’s a lot of flexibility that you have in terms of analytics or machine learning or even just basic trending, in the way that you apply that tool to get the results you’re looking for.

JM: So then, given how much historians have changed, and become more software and less hardware, what kinds of new ways and places are they being applied, maybe where they couldn’t be used before, and how’s Grantek carrying that out?

SR: Yeah, sure, so let’s, actually, let’s take a look at some of the features I mentioned earlier, and I can kind of bring a certain example of those around. So, we talked about the store and forward, we talked about the local logging, and we talked about that price-point coming down, so you can put them in different spots in your architecture.

So, one really cool use-case that we were very happy to work on over at Grantek was for one of our customers that owned a number of renewable energy assets, and they had a problem around being able to collect and store and do analytics on data from a bunch of their solar power sites across the U.S. So, definitely an issue where their main center for actually using that data was in a centralized location, but the actual sites they needed to collect that data from were across the country. So, at a high level, what the solution architecture there looks like is that Grantek was installing PLCs with historians in the chassis at each of these sites, so that PLC was still responsible as the main data concentrator, it can speak a lot of different protocols, that versatility was really important so that we could standardize on the historian hardware regardless of whatever hardware was installed at a particular site. It also facilitated some controls, HMI screens, and things like that, but one of the main things that was included was a in-chassis historian that would do all the monitoring for the important historical data, it could record all this, and it would send it to their centralized historian instance for more detailed analytics and reporting. One of the most important features in this architecture was around that store and forward capability. So, solar sites are built all over the country, and not all of them always have the greatest and most reliable Internet connections, but losing data could really provide major problems with your analysis or even some regulatory issues if you weren’t able to show some of your generation metrics, right? So, the ability for that historian on site to run for as much as two weeks or longer without necessarily being able to report back to the mother ship, and once it did have that connection re-established that all the data would just flow like nothing ever happened, was a really big selling point for that and helped us solve a lot of problems in that particular instance.

One of the reasons I bring this up is we would want to compare that to the counterpoint, right? So, let’s say we didn’t have these cheap historians that you can put all over the place and easily implement an architecture like that. A couple years ago, if you wanted to do something similar, you’re talking about putting a historian server at every site, so a Windows-based server usually that runs that communication software and runs your historian and stores a lot of that data locally, that’s a pretty powerful set of hardware for a solar site that is generally something that is not enough of a revenue-generating asset to always justify that type of investment. So, really what this has done, especially for sites like this where you do need to have a record of that data, it means the entire footprint of a solar site could be smaller than it just was because your initial hardware investment isn’t quite what it would need to be if you did need to throw a full-blown historian into each one of these sites.

JM: So then, once all the data is gathered and stored, and since the only thing that changes faster than hardware is software, how are data analytics and those tools and technologies changing lately? I mean, I assume everything’s in the cloud, but what happens once everything gets there?

SR: So, let’s start with your thing about the cloud. So, as far as all the data is going into the cloud, I would say that in one way shape or form, usually, but not all the time and actually we like to think of the cloud as three, maybe, separate ways of approaching it sometimes. So, there’s kind of the big cloud and the big data analytics piece of it and there’s also enterprise or local-level clouds, and then sometimes there’s really justification to not put this data into a cloud at all.

Let’s break this down a little more. If I talk about those big data clouds, that’s where you don’t own your servers at all, you completely offloaded most of that compute power to a Microsoft or an Amazon or something like that, and they have a lot of the responsibility for the maintenance of those servers and their uptime. So, not everyone, I think, is comfortable with that idea yet, we do still hear about things like data breaches and security concerns all the time, and we’re talking about some proprietary and potentially sensitive data that we could be storing in these systems, too. So, I’m not trying to steer you away from using those distributed cloud architectures with a solid design and security in mind. That type of planning can go a long way to really mitigate a lot of those risks, but the hesitation is certainly there and I think that there is some understanding that maybe not everyone’s ready for that just yet.

However, most manufacturers are going to have some type of local or on-premise cloud or cloud within their enterprise. So, if you’re still using locally hosted share point sites, or if you do have a situation where you’re taking data from individual plants and sending that to a larger enterprise instance for analysis, that is more of a local, on-premise cloud and I think a lot of the technologies and advantages that come with cloud can still be realized in that type of model.

But we also do need to have solutions for people who aren’t ready for a cloud yet. Maybe their data is very, very sensitive and there are security reasons that it really can’t go outside of a site, or sometimes there’s just other infrastructure or pieces of software that aren’t quite ready for that yet, too. So, how do we bring analytics into those types of situations as well is always an interesting question.

For more, tune in to the Control Amplified podcast.