Process Historians vs SQL Relational Databases

As everybody knows, I am a big fan of giving air time to contrarians. I don't always agree with what they say, but I am always willing to give them a pulpit. I received a very interesting "guest editorial" this morning from Steve Hechtman of Inductive Automation, in which he espouses a heretical point of view about the value of process historians versus relational databases. The vast majority of SCADA and DCS systems are designed to have a process historian sit above the control layer. Hechtman says that maybe this isn't necessary, and that it is maybe too expensive. Is he right?

Process Historians
Vs.
SQL Relational Databases

Steve Hechtman
President, Inductive Automation

I wish to register a complaint. There is a rumor that has been circulating for years that relational databases are too slow for fast process data and that only process historians are up to the job. Vendors of process historians will cite sluggish performance and the lack of data compression as the reasons standard off-the-shelf relational databases won’t work. Apparently the last time they used a SQL relational database was a few decades ago. While there may be some specialized domains where process historians have a niche, they are not a practical choice for most industrial applications. In effect, historian vendors are saying your Toyota Camry is inappropriate transportation because it is incapable of going 180mph or finishing the quarter mile in under ten seconds.

The rumor denigrating relational databases for poor throughput is baseless. A standard, off- the-shelf Microsoft SQL Server coupled with FactorySQL can log in excess of 100,000 tags per second using a desktop machine. In all likelihood, other factors such as your industrial network would become bottlenecks before the database does. Furthermore, today’s generation of SQL relational databases are designed to scale gracefully to power high-volume website traffic, whose load peaks dwarf those of industrial controls applications.

Data compression is an area where process historians do score a point. However, even this consideration can be handled with standard off-the-shelf SQL relational databases. Take a look at the MySQL 5.0 Archive Storage Engine which achieves on average a four to one compression ratio. Proprietary process historians may beat that, but let’s get back to the point of practicality. Hard disk space is so cheap these days that even considering this point is becoming an anachronism. For the rare application that demands it, table compression coupled with intelligent data logging allow databases to compete even in this regard.

One crucial question that process historian vendors omit is: what are IT departments willing to support? When I make initial contact with IT folks, I always ask which relational database they use. Then I assure them we’ll work with that. This generally makes them very happy. Believe me, you want IT on your side or your project will end up on a data island which is useless in an enterprise system. Think of it from their point view; they have the training and tools, generally, to support just one type of database. With these tools and training they can support the database with scheduled backups, tuning and other maintenance.

Okay, we’ve heard process historian rants about relational databases; let’s talk about the downside of process historians. Let’s start with support. Just check the Amazon bookstore for any one of the proprietary process historians and you’re likely to come up empty handed. On the other hand, check for “SQL configuration” and you’ll come up with hundreds of books. How about finding people to support these proprietary systems? Good luck.

Then there is the concern about supporting relational data with a process historian. Frankly, the middleware layer is all about relational data. Time-series data, which is what process historians deal with, is just a fraction of what is needed in the middleware layer. Correlating batches, shifts, inventory, orders, downtime, quality, etc., is purely relational in nature, and these are the features that today’s enterprise integration projects demand.

What about a cost comparison? The process historian is going to be ten to thirty times the cost of a relational database using a driver like FactorySQL depending on the number of tags required. The controls industry is still backwards on this point and prefers to price its software per tag as though the extra tags cost money to manufacture.

In summary, we’re talking about practical choices. The Ferrari may be great fun, but do you need a $500,000 vehicle to drive the kids to school or would the Camry suffice? Likewise, do you need a $60,000 process historian to log data? A relational database makes a great historian, but the reverse isn’t true. A process historian cannot process relational data. For the vast majority of systems, a relational database has more than enough power to service the historical and relational data requirements, making it not just the practical, but the wise choice.