Process Historians vs SQL Relational Databases

As everybody knows, I am a big fan of giving air time to contrarians. I don't always agree with what they say, but I am always willing to give them a pulpit. I received a very interesting "guest editorial" this morning from Steve Hechtman of Inductive Automation, in which he espouses a heretical point of view about the value of process historians versus relational databases. The vast majority of SCADA and DCS systems are designed to have a process historian sit above the control layer. Hechtman says that maybe this isn't necessary, and that it is maybe too expensive. Is he right?

 

Process Historians
Vs.
SQL Relational Databases

Steve Hechtman
President, Inductive Automation


I wish to register a complaint.  There is a rumor that has been circulating for years that relational databases are too slow for fast process data and that only process historians are up to the job.  Vendors of process historians will cite sluggish performance and the lack of data compression as the reasons standard off-the-shelf relational databases won’t work.  Apparently the last time they used a SQL relational database was a few decades ago. While there may be some specialized domains where process historians have a niche, they are not a practical choice for most industrial applications.  In effect, historian vendors are saying your Toyota Camry is inappropriate transportation because it is incapable of going 180mph or finishing the quarter mile in under ten seconds.

The rumor denigrating relational databases for poor throughput is baseless.  A standard, off- the-shelf Microsoft SQL Server coupled with FactorySQL can log in excess of 100,000 tags per second using a desktop machine.  In all likelihood, other factors such as your industrial network would become bottlenecks before the database does. Furthermore, today’s generation of SQL relational databases are designed to scale gracefully to power high-volume website traffic, whose load peaks dwarf those of industrial controls applications.

Data compression is an area where process historians do score a point. However, even this consideration can be handled with standard off-the-shelf SQL relational databases.  Take a look at the MySQL 5.0 Archive Storage Engine which achieves on average a four to one compression ratio.  Proprietary process historians may beat that, but let’s get back to the point of practicality. Hard disk space is so cheap these days that even considering this point is becoming an anachronism. For the rare application that demands it, table compression coupled with intelligent data logging allow databases to compete even in this regard.

One crucial question that process historian vendors omit is: what are IT departments willing to support?  When I make initial contact with IT folks, I always ask which relational database they use.  Then I assure them we’ll work with that.  This generally makes them very happy.  Believe me, you want IT on your side or your project will end up on a data island which is useless in an enterprise system.  Think of it from their point view; they have the training and tools, generally, to support just one type of database.  With these tools and training they can support the database with scheduled backups, tuning and other maintenance. 

Okay, we’ve heard process historian rants about relational databases; let’s talk about the downside of process historians.  Let’s start with support.  Just check the Amazon bookstore for any one of the proprietary process historians and you’re likely to come up empty handed.  On the other hand, check for “SQL configuration” and you’ll come up with hundreds of books.  How about finding people to support these proprietary systems?  Good luck.

Then there is the concern about supporting relational data with a process historian.  Frankly, the middleware layer is all about relational data.  Time-series data, which is what process historians deal with, is just a fraction of what is needed in the middleware layer.  Correlating batches, shifts, inventory, orders, downtime, quality, etc., is purely relational in nature, and these are the features that today’s enterprise integration projects demand.

What about a cost comparison?  The process historian is going to be ten to thirty times the cost of a relational database using a driver like FactorySQL depending on the number of tags required.  The controls industry is still backwards on this point and prefers to price its software per tag as though the extra tags cost money to manufacture.

In summary, we’re talking about practical choices.  The Ferrari may be great fun, but do you need a $500,000 vehicle to drive the kids to school or would the Camry suffice?  Likewise, do you need a $60,000 process historian to log data?   A relational database makes a great historian, but the reverse isn’t true.  A process historian cannot process relational data.  For the vast majority of systems, a relational database has more than enough power to service the historical and relational data requirements, making it not just the practical, but the wise choice.

What are your comments?

You cannot post comments until you have logged in. Login Here.

Comments

  • It's TIME SERIES DATA!

    I don't care what the pH was at 3:02:04 PM July 24, 2007. I care about what the pH trend was from 1 to 7 PM of July 24, and compare it to the previous similar weekday, holiday, after it rained, whatever...

    Many years ago, I asked our sage consultants why they chose to use SQL databases to store our data instead of a statistics analysis package. The answer? We know SQL.

    When the only tool you know how to use is a hammer, everything looks like a nail.

    Reply

  • Your right, its not USUALLY relational data. So? You said When the only tool you know how to use is a hammer, everything looks like a nail? Most of the automation industry knows process historians, but know one knows why... if they are better then a SQL database.

     

    I would really like the answer to this question, especially seeing how most process historians are just a front end to a relational database anyways.  I have yet to see any performance delta between the 2.

     

    Another thing to consider is that with a relational database, you can make your historical data relational.  For example, I have created a bunch of scripts to analyze our alarm history, and provide insight into problematic alarms, alarm rates, related alarms, stuff that really does make a big difference, but rarely gets any thought.  

     

    For my company, a relational database is a better fit for what we offer.  The system is more flexible, allowing integration into business data, financial data, etc.  Not to mention the fact that a good process historian is limited by tag, which doesnt make much sense, and also has a very high cost.   Why reinvent the wheel creating another layer of expensive hardware, when a software "stack" of low cost or free software will suffice.  For 99.9% of the customers out there, a community edition of PostgreSQL or MySQL and a OPC -> DB bridge will suffice.  At most, $3000.  Why would anyone want to pay more??

    Reply

RSS feed for comments on this page | RSS feed for all comments