By Greg McMillan and Stan Weiner
Stan: This is the first of a four-part series on past, present and future challenges and opportunities presented by the deluge of data now available to automation professionals. We start out with our experiences with expert systems and neural networks, introduce a radical perspective on the end of science, and then move into interviews with Randy Reiss, a data analytics consultant, and Brian Hrankowsky, a specialist in modeling and control at a major pharmaceutical company, to get perceptive views on the use of data. Written and spreadsheet results are now in computer data bases, but there is no comprehensive approach for integration, visualization and analysis.
Greg: I remember looking at trend recordings and trying to figure out what was limiting production in a stressed-out production unit pushed way beyond its nameplate rating (see Figures 3-6, p. 107, Advanced Control Unleashed, ISA, 2004). The production unit had multiple stages of parallel processing trains in a continual state of flux from coatings, undersized surge tanks and recycle streams. There were thousands of process variables. Most of the controller outputs were at their limit because pumps and piping were too small for the higher production demand. Process debottlenecking projects often failed to deliver extra production. The world's best model-predictive control (MPC) specialist said what was needed more than MPC on unit operations was a process model and real-time optimization. A neural network specialist worked in the unit for a decade with no significant resulting benefits.
Stan: An initiative using state-of-the-art, real-time expert systems by a dozen specialists yielded a package for instrument fault analysis and one on-line expert system of some value. The plug was pulled on the other systems as the specialists moved on out the door with more than half a million dollars in software and several million dollars in engineering expense down the drain.
Greg: We learned that trying to model flows from signals to valves without positioners and variable pressure drops was absurd. The expert system messages were an incessant reminder that the plant needed more measurements and positioners. We realized that any sort of intelligent alarm system should be thoroughly verified on-line in the privacy of an office before being put into the control room. We also recognized that you could not readily track down the contributions to a given conclusion. Rules could be thrown in haphazardly. The same problem was rampant in fuzzy logic systems.
Stan: In the 1990s, data mining took off. Artificial neural networks (ANN) were sold on the basis that your plant had a wealth of data, and that all you needed to do was feed all of it to an ANN, press a button and get wonderful predictions of important process variables not measured on-line. Some caught up in hype were even asking the ANN to predict variables that weren't measured anywhere. You need at least some lab measurements to train, test and correct the ANN. On-line feedback correction of ANN is normally needed for predictions in chemical and biochemical processes because of sensor sensitivity and repeatability limitations and unmeasured disturbances. Most often, a lab analysis provides this. In other cases, it is an at-line or on-line analyzer with a significant delay.
Greg: For predictions in continuous processes or during a batch process, the ANN output is time-synchronized with an actual analysis, usually by the addition of an individually adjustable delay on each input. For the prediction of batch end points, the insertion of delays is not necessary. For predicting batch composition profiles, the prediction may be the slope of the profile that is a formation rate or reaction rate. The rate is then integrated as part of a simple material balance to give a composition. The delay is often selected to be the dead time plus the time constant or time to 63% of the final value. However, the ANN prediction is used without the delays to provide faster recognition. Thus, an ANN can predict a product composition or quality before it is measured.
Stan: It's important to recognize that all models are wrong. The error always exists and is just a matter of degree. Instead of recognizing the need for feedback correction, this realization is used to condemn all models and as an argument for a pure data-driven approach.
Greg: "The End of Theory—The Data Deluge Makes the Scientific Method Obsolete" by Chris Anderson (www.wired.com/science/discoveries/magazine/16-07/pb_theory) argues that the proliferation of data means the end of science, and that correlation rather than causation is of greatest practical value. This doesn't sit well with me, being a physicist and an engineer. I have always wanted to understand "why," and develop a concept that would help me deal with new situations by a combination of inductive and deductive reasoning. However, I do acknowledge that statistical physics is important for sub-atomic particle analysis, and that the adaptive nature of biological systems and human free will make statistical methods essential in cellular- and consumer-response analysis. I think the future is a combination of the best of first-principle, neural network and statistical modeling.
Stan: Our experience with ANNs revealed that they helped focus, but did not replace, process understanding. Just dumping data into the ANN resulted in erroneous predictions because the ANN keyed on extraneous or coincident non-causal inputs. Some inputs also were dependent on each other and not truly independent variables. The prediction may have looked good in the off-line mode, but quickly deteriorated on-line.
A successful ANN application depends on a person who understands the possible cause-and-effect relationships in the process to eliminate inputs that caused erroneous outputs. The inputs also must have variation that ideally is distributed evenly between the possible minimum and maximum values. The ANN has problems seen long ago in a polynomial and exponential fit. The predictions can take off to bizarre values outside the input data set used, and can give humps (reversal in slope) that are disastrous for feedback control because they correspond to a reversal of the process gain. Training sets also should have a uniform distribution of the measured output. Properly developed neural networks can identify correlations that were understood to be causal, but not considered important. This "knowledge discovery" involving nomination and confirmation of causal relationships was an important benefit more prevalent than the instances of successful on-line ANN predictions. Greg:
Stan: Real-time expert systems and ANN would give erratic predictions for inputs that were dependent on each other or outside of the range used to develop the system. Furthermore, one could not drill down to determine the major contributors to the strange result. These systems fell into disuse as soon as the developer left the scene.
Greg: There are so many failures of expert systems, it's difficult to keep track of all of them. Here are some of the failures that plague my dreams.
Top 10 Failures of Expert Systems
10. Failure to say you should have bought control valves instead of those cheap on-off valves
9. Failure to say you should have bought Coriolis meters instead of those cheap rotameters
8. Failure to explain why expert systems failed
7. Failure to explain what engineers will do when all the manufacturing is offshore
6. Failure to predict the next layoff
5. Failure to predict the last and next economic crises
4. Failure to explain what is really said in congressional bills
3. Failure to predict your drug costs under the Medicare prescription plan
2. Failure to predict what the cost of medical care will be under the new healthcare plan
1. Failure to figure out where the governor of South Carolina was last June.