By Ian Nimmo
Today technology has pushed industrial processes to the limit while optimizing them for maximum efficiency and the most cost-effective use of energy. State-of-the-art plants are monitored by sophisticated “smart” sensors that even calibrate themselves. They employ process control devices that tune themselves based on plant situations, and computers that can predict and optimize based on market requirements. But they have an Achilles heel.
I don’t believe that technology gone beyond the capabilities of our current operators, but I do think we neglect to ensure safe production. Omissions in the design of the interface between the human and the machine leave holes in our defenses, allowing human error to steal the very profits we worked so hard to get through this technology.
The work of the Honeywell-backed Abnormal Situation Management Consortium shows that abnormal events are still costing industry $20 billion per year for the petrochemical industry alone. The petrochemical industry experiences roughly 7.3 days of unplanned shutdowns per year, costing on average $250,000 per hour, due to incidents. The total cost of 7.3 days lost productivity is $43 million dollars, not all of which is preventable. A 5% improvement of these figures would produce an extra $2,150,000 per year.
The consortium found that in most plants not meeting their production targets, 8% to 12% of the loss was due to preventable abnormal situations. It is easy to see losses that take away from the bottom line, but a hidden cost lurks operators back the process away from its full potential so they can operate at a more comfortable and less stressful level.
Automation Ironies
Operators struggle to work in badly designed central control rooms. The number-one issue is stress, which impacts the operator’s ability to perform, often leads to shift-work related illnesses and is the primary reason operators give when declining jobs as console operators.
What are the missing control-room design elements and why do we neglect them? British engineering psychologist, Lisanne Bainbridge outlined them in her article “Ironies of Automation,” in New Technology and Human Error, J. Rasmussen, K. Duncan and J. Leplat, eds. (Wiley, 1987).
The first irony, Bainbridge says, is that by taking away the easy parts of operator’s task, automation can make the difficult parts of the job even more difficult.
Second, while many system designers regard human beings as unreliable and inefficient, they still leave people to cope with those tasks the designer could not think how to automate—most especially, the job of restoring the system to a safe state after some unforeseen failure.
Third, in highly automated systems, the task of the human operator is to monitor the system to ensure the “automatics” are working as they should. But even the best motivated people have trouble maintaining vigilance for long periods of time—say, 12-hour shifts. They are thus ill-suited to watch out for these rarer abnormal conditions.
Fourth, skills need to be practiced continuously to be kept sharp. Yet an automatic system that fails only very occasionally denies human operators the opportunity to practice the skills they will need in an emergency. Thus, they can become de-skilled in just those abilities that justify their marginalized existence.
Bainbridge concludes, “Perhaps the final irony is that it is the most successful automated systems with rare need for manual intervention which may need the greatest investment in operator training.”
The Real Cost of Bad Design
Texaco Pembroke had a serious incident and explosion in 1994 that affected hundreds of people, seriously injured twenty-six employees and caused damage of nearly $100 million (£48 million). According to the Health & Safety Executive’s investigation, the major factors that contributed to this incident were
- Too many alarms that were poorly prioritized;
- Control room displays did not help the operator understand what was happening;
- Operators inadequately trained to dealing with a stressful and sustained plant upset;
- A work environment that contributed to disruptions and stress.
In an incident at the Esso Longford Gas Plant in September, 1998, 10 tonnes of flashing hydrocarbon were released and exploded. The explosion killed two, injured eight, and dug a crater 1.5 meters deep. The control room was evacuated, restricting the operator’s ability to safely shut the unit down. The estimated loss to the industry was $1.3 billion.
Bainbridge and others studying these incidents concluded that training and continuously practicing skills was critical to success. The design of the operator workspace has a direct correlation on the operator’s ability to perform to the standards required for successful intervention of problems. However, management continues to neglect training, giving it a low priority, and continues to support equipment shelters rather than a building that supports the multiple activities the operators perform throughout a 24-hour operation.
Four Stages of Intervention
Successful intervention involves four stages: orienting, evaluating, acting and assessing.
Orienting involves perceiving the exact problem. Operators can receive many alarms, and due to their poor prioritization—and often an avalanche of them—during a process disturbance, their task becomes difficult. They must process the data by working with the human-computer interface (HCI) to achieve the next goal, evaluation.
This process has been named “situation awareness.” To achieve it, designers must focus on the control room’s functional layout and the consoles, ensureing adjacency for good communications and collaboration, and on the HCI, its detail and navigation method.
This stage can be dramatically improved by following the practices utlined in the EEMUA publication 191 Guideline for alarm management, the style guide for HCIs outlined in EEMUA publication 201, and good workspace design practices as outlined in the ISO 11064 Standard for Ergonomic Design of Control Buildings.
Evaluation includes developing a hypothesis regarding the cause of the plant problem. This stage can be improved with good problem solving training, such as the Kepner-Trego Problem Solving & Decision Making Methodology. Having the ability to test a hypothesis is recognized as a best practice. This means the operator has to first have time to respond and study the problem. Waiting for late alarms will not provide this. If either a system (state-estimator technology) or an operator (tracking trends and displays) facilitates early detection of an impending failure, operators should have a training room close by the control room that allows them to track the occurring event while testing theories on a real-time dynamic simulator.
Acting. The operations and technical support team must take corrective action, which may entail the use of the automation control system or the control operator verbally instructing a field operator to make adjustments to process equipment. But the possibility of an inappropriate action by the operations team can initiate or escalate disturbances. Here is where good training (based on competency models) and continuous practice of skill development ideally by simulators can make a big impact on success.
Assessing is another place where we encounter many failures as the operations team make a move, and instead of assessing and continuously monitoring the actions taken until a confirmed success and stable process, the operators leave the monitoring to the automation system and wait for the next wakeup call; e.g. another alarm. This is how operators miss the knock-on or domino effects of a failure and orient themselves late in the process event, getting themselves into the continuous fire-fighting mode of operating.
Training and Environment
Development of a mental model of the process has been recognized as a best practice. Current practice to achieve this requires console operators to be competent in each of the existing field operator positions achieved by learning and experiencing these jobs.
Development of staffing workload assessments against each console operator position and balancing workload based on experience and competence will allow a formal training system to deliver the performance required. However, continuous improvement and tracking of performance using quantitative measures obtained by observance and qualitative analysis of DCS data should be the role of supervision.
Operator training should be based on competency models and include the opportunity to practice skills on a regular basis, rehearsing critical response, practicing standard operating procedures and emergency response actions. The console operators function as process control specialists and effectively control, optimize and troubleshoot the process to meet production goals. They should be able to successfully manage abnormal situations directing outside operators in routine and abnormal responses. The control room and adjacent support rooms should create a continuous learning environment.
The workspace design is usually only addressed if a new control room is being proposed or built or significant change is occurring due to a technology upgrade or reduction in operations personnel. It is generally focused on the control room. They have evolved from equipment shelters to sophisticated computer rooms to control centers for people, with equipment and computers taking a second place.
Control room designs vary from the “functional design,” where console layout is the primary focus and communication and collaboration drive the layout, to “theater style,” where all consoles face a video wall and the off-workstations provide a panoramic view of the whole plant. Each console has its scope of control on one or two of these off-workstations, while the operator sits at the console able to see adjacent units and detailed information about units, sub-units, trends, procedures and alarms.
The control room should address operator vigilance and other performance-shaping issues and should be specified as part of the Console Operator Performance Standard. One of our customers has adopted a fatigue countermeasure program that involves education and workspace design issues such as circadian lighting, nap rooms, noise reduction, HVAC systems that have people-zoned areas, ergonomic furniture and software that addresses common human factor issues and stressors.
Although technology and a poor work environment can produce impressive profits, lack of concern for people and the environment and systems that supports them can blow away these and future profits for many years.
Ian Nimmo is President and founder of User Centered Design Services an ASM Consortium affiliate member and an ASM service provider. He is a member of the IEEE and a senior member of ISA.