When good data goes bad

Work is needed to better protect the integrity of historical data

March 18, 2021

4 min read

All of us are familiar with and use historians for a wide variety of reasons, from improving operations to troubleshooting our processes, and also for reporting to regulators when necessary.

Because of the amount of data that can be generated in a single day from any facility, historians use a variety of techniques to reduce the number of actual points that need to be stored. At their simplest, they use deadband and interpolation between the times when the value changes by more than the configured deadband. (Configuring the deadband allows you to get more “accuracy” on more important points than others.) The assumption is that between these two points the data doesn't change enough to be bothered with.

This same effect and assumption is happening “upstream” of the historian as well. The controller resolution is only as good as its scan or update rate (two to five seconds for a typical DCS). Further, the “raw” data from the sensor itself is based on a manipulation of the inputs to the transducer.

Some research has been done with raw data to see what it might be used for in terms of diagnostics on the sensor or even on the process itself. For example, calculating imminent cavitation or impeller wear from harmonics on a pump. But because of limitations in being able to capture this high-frequency data, the models haven't progressed too far. In theory though, with the advent of Ethernet-APL and 10-MB/s data rates, it could be possible to collect this data soon. (Is this potentially one of the APL killer apps I alluded to last month?) Clearly, using historian data-processing techniques doesn't support the in-depth details required for this sort of analysis and insight.

The other challenge faced by users of historian data are the gaps typically flagged as bad data. Today’s intelligent sensors could at least provide some insight into the source of bad data, even if we were only willing to link the process variable (PV) status tag to each reading. Since PV status is binary (good or bad) the amount of memory required would be minimal, especially since the historian need only record when it changes state.

For many industries, especially for reports to regulators or for financial transactions, any time data "goes bad" means someone is responsible for getting it back ASAP, and then starting the investigation to explain what happened during the data gap. In some cases, as part of the investigation, the missing data can be calculated from other data points as part of the reporting process. However, because there needs to be a true source of data, there are limited ways in which the historian itself can “report” what are now accepted as the true values. I'm not a database expert, but I do believe there must be a way to do this without too much overhead or effort. Something like a pointer to the corrected data with a second pointer to an explanation to how the corrected data was determined might do the trick, but with a lot of manual intervention.

Automated methods that take advantage of the intelligence in digitally connected devices to give us advance warning and prevent the failure in the first place would be one good start. Another might be to have “mini historians” in the devices themselves, so that if the failure is other than in the device, that buffer can be accessed and read into the gap. In theory, this could be done with MQTT-SparkplugB, which is lightweight enough, but again needs an IP address. This same concept could be used to also provide the native, high-frequency data if deemed important enough.

Yes, good data will continue to go bad. However, at least if we make effective use of the tools available to us, we should be able to minimize the impact when it does, as well as determine the reasons why it went bad.

What other ways can you, our readers, think of to “fill in” missing data points, thus improving quality of historical data? Please drop me an email with your thoughts.

About the author: Ian Verhappen