Interested in linking to "Situation Critical"?
You may use the Headline, Deck, Byline and URL of this article on your Web site. To link to this article, select and copy the HTML code below and paste it on your own Web site.
In Australia, Esso's Longford gas plant that experienced a catastrophic failure several years after the Pembroke incident. At the heart of this accident is a simple instrumentation failure that involved a level, an alarm and an expected operator action. The system failed and condensate overflowed the column and froze a heat exchanger. The incident may have been recoverable had the operations and maintenance team had basic training and been able to understand the dangers of adding heat and how brittle fractures are caused. The designers failed to consider the reliability of the operator and the consequences of "no response" by the operator.
The Australian Royal Commission that investigated the accident again made some important discoveries. They stated that the failure to undertake ongoing analysis and evaluate process trends within the gas plant diminished the likelihood that such upsets as those which contributed to the accident on the September 25, 1998 -- operating conditions in the absorbers or condensate carryover -- would be detected and avoided by appropriate responsive action. They also stated that it was evident that well before the accident, panel operators had become accustomed to the frequent occurrence of alarm conditions at the base of the absorbers. High-level alarms had become frequent enough for such alarms to be regarded as a nuisance rather than a warning of process upsets requiring action. This goes some way in explaining the insensitivity of operators to such alarms in the lead-up to the accident.
The practice of operating the absorbers in an alarm state had a bearing upon the loss of lean oil circulation. Excessive condensate carryover could not have occurred if operators had responded to the alarm warnings in the control room in the period leading up to the accident.
Operators would, no doubt, have reacted more appropriately to high levels in the absorbers had they appreciated the potential for condensate carryover and the dangers associated with cold temperatures, but even without this, the operators did know that operation of the plant for any length of time in alarm generally carried risks with it.
There was no evidence of any system to give priority to important alarms (even after the lessons of Pembroke). Good operating practice would have dictated that critical alarms be identified and given priority over other alarms. It would also have dictated that operators be informed of the correct way to respond to process upsets identified by the occurrence of critical alarms.
The lack of a system of priority for critical alarms explains why the operator failed to respond promptly or adequately to the activation of the alarm on the morning of the accident. Many lessons were learned at Longford, and again poor situation awareness can be seen as a major contributing casual factor in the accident. It is fair to state that the control room had poor ergonomics and a very poor HMI. Operators did not use trends to predict process upsets or evaluate long-term process trends or plant performance.
The lessons from Piper Alpha and Pembroke were ignored by Esso Longford, a mistake which surely the industry would not repeat -- or would it?
Unfortunately in the next few years, the BP Texas City disaster occurred in very similar circumstances. This time it was not an electrical storm, but a plant start-up after a turnaround. In conclusion, it can be determined that a simple instrumentation system that consisted of a level transmitter, an alarm and an expected operator response failed. The system failed and hydrocarbons overflowed, and a catastrophic explosion was the result. The lessons from Piper, Pembroke and Longford were not learned. Without oversimplifying this accident, like many other organizational accidents, many contributing factors can be found.
However, if the Piper Alpha's lessons for safe operation were applied they could have avoided this accident. It is apparent that the plant had poor alarm management practices, that the HMIs did not provide adequate situation awareness. Again, a breakdown of the technological system and its human elements can be seen.
By examining several recent major accidents over the last 20 years are examined, we can conclude that these incidents have multiple causal factors, but at the center of these incidents are reports of console operators missing important information. Recent analyses have identified that basic skills for monitoring the DCS have been compromised by data overload, misplaced salience, workload and other stress factors, such as fatigue.
If the subject of "situation awareness" is to be understood, years could be spent doing research to come to the conclusion that it needs to be treated as a system and that the system has elements such as alarms, HMI, trends and ergonomically designed control rooms to reduce distractions, fatigue and allow operators to perform to their best performance.In the aircraft industry there are some sound definitions of "situation awareness" and some sound engineering principals that have resolved many human error issues experienced by the aircraft industry. We have also learned from aviation's experience that trying to automate a way out of this problem is fraught with new problems, such as increased system complexity, loss of situation awareness, system brittleness and workload increases at inopportune times. The attempt to automate a way out of so-called human error has only led to more complexity, more cognitive load and catastrophic errors associated with losses of situation awareness (Endsley & Kiris, 1995).