Given the advances in technologies and work practices in oil refineries over the past 30 or 40 years, one would expect to see a decline in the number of major safety incidents and associated losses. That expectation would be wrong.
According to a 2012 report by the Energy Practice of Marsh Ltd., a division of insurer Marsh McLenna , the five-year loss rate (adjusted for inflation) in the refinery industry over the period 1977 to 2011 continued to rise, with incidents during start-ups and shutdowns continuing to be a significant factor (Figure 1). Why is this happening?
Consider the realities of present day refinery operations:
- Modern control systems are highly effective and make plants safer;
- Modern control rooms provide protected havens for operators to perform their work;
- One operator can control substantially more equipment than was previously possible;
- Each operator has access to as much or as little information as wanted or needed.
Given all these benefits, why are there still problems? Some of these technological advantages can create their own hazards if care is not taken. For example, it is now possible to use hundreds of different colors on an operator’s screens and to alarm in multiple ways, so systems often are configured with displays or alarms that are inactive under normal conditions, but may become active unnecessarily under abnormal operations, causing operator confusion and possibly errors.
Plant operators need to get a clear picture of what’s happening in periods of crisis to avoid escalation of incidents. What often happens instead is that during an incident, operators can get too much information, and can become confused, especially when they’re under stress. For example, the incident report examining an explosion in 1994 at the Chevron Milford Haven, U.K., refinery said, "In the last 11 minutes before the explosion, the two operators had to recognize, acknowledge and act on 275 alarms.”
Start-ups and shutdowns of process units, grade changes and other operating transitions are considered normal operations, but they put operators under an increased amount of stress because they know these are the times when things could potentially go wrong. This is compounded when abnormal conditions exist, often causing inappropriate decisions that can lead to production incidents, injuries or even deaths.
A poorly configured system can confuse operators, but a well-designed system can help them. Creating the right balance of human and automation system capabilities depends on understanding how these two entities differ and what strengths each has in crisis situations.
Creating Disaster: BP Texas City Refinery
On March 23, 2005, an explosion and fire in the isomerization nit of BP’s Texas City refinery, which at the time was the company’s largest, killed 15 people and injured 170.
The incident report indicated several issues that operators failed to act on, such as a level alarm acknowledged in error, the heating ramp rate being too fast, and the fact that the operators tried to start the unit in manual when procedures indicated that, during early start-up, the only way to control splitter level was in automatic. All of these mistakes were possibly due to a lack of vigilance or a poorly designed control system, but they quickly became compounded as the incident unfolded.
The report cited issues with procedures as one of the causes of the incident:
- "…Failure to follow many established policies and procedures. Supervisors assigned to the unit were not present to ensure conformance with established procedures, which had become custom practice on what was viewed as a routine operation.”
- "The team found many areas where procedures, policies and expected behaviors were not met.”
The report recommended changes to start-up and shutdown procedures, but did not recommend additional training or procedure support from the control system. The incident could possibly have been avoided by the correct use of instrumentation, control and procedures, which we will consider in more detail later.
Humans Do Count
Having several very skilled "operators” probably saved Qantas Flight 32 on Nov. 4, 2010. The equipment was an A380 Airbus, the largest and most technically advanced passenger aircraft in the world at the time. It had left Singapore for Sydney and was flying over Indonesia when one of the engines blew apart. (Figure 2 shows the extent of the damage.)
The pilots were inundated with alarm messages: 54 came in to alert them of system failures or impending failures, but only 10 could fit onto the screen at a time. Luckily, there were five experienced pilots on that flight, including three captains who were on check flights. Even with that much experience on board, it took 50 minutes for the pilots to work through the messages to find the status of the plane.