Interested in linking to "Human error in instrumentation systems"?
You may use the Headline, Deck, Byline and URL of this article on your Web site. To link to this article, select and copy the HTML code below and paste it on your own Web site.
THIS ARTICLE is the first of a two-part series that will explore the classifications of human errors, why humans make errors, how errors occur in instrument design, construction, operation, and maintenance, methods to minimize human errors, and what are some of the methods used to quantify human errors.
Human error plays a role in all human activities. We all make mistakes. Not all mistakes are harmful, some cause no problems, and some may actually benefit mankind. But making mistakes in the design, construction, operation, and maintenance of chemical processes can sometimes have costly if not disastrous effects.
Human errors can be classified a number of ways. One is to classify them as errors of commission or omission. Errors of commission means someone did an act that resulted in an error, while errors of omission are where someone did not do something that created an error.
Errors can also be classified as active or latent. In active errors, the error is immediately apparent or the consequence is immediate, while a latent error's consequence is not. Latent error may require time, conditions, or another action before the consequence of the error is apparent.
Errors can also be classified as random human error or where human factors are involved. Random human errors are those that can only be predicted using statistics. Human errors due to human factors means that a procedural factor, management factor, design factor, or some human characteristic facilitated the error. A study of 136 refinery incidents by the Battelle Memorial Institute indicated that human error was involved 47% of the time. Of these, 19% of the errors were random human error while 81% involved human factors.
Human errors can also be classified as to the reason the error was made.
Some feel it is human nature to make errors. This may be true, but there are reasons why people make mistakes, some of which are not under the direct control of the person making the mistake. Understanding these reasons may prevent some mistakes. I've grouped them into three broad categories.
FIGURE 1: TAGGED OUT
Maintenance tags covering one of the emergency feedwater valve indicator lights may have contributed to human error at Three Mile Island nuclear plant.
1. People-Oriented Errors
2. Situation-Oriented Errors
3. System-Oriented Errors
Human Errors in System Design
Human errors enter the instrument design process in many ways. Some of the ways are mistakes, specification errors, failure to communicate, lack of competency, and functional errors.
Mistakes--Probably the most common human errors in instrument designs are mistakes. A common form of a mistake is slips (lapses or execution) errors where the intentions are correct but the execution is not. Mistakes can also be due to lack of competency or they can be facilitated by the design system itself. Some of the errors that can be facilitated by the design system methodology are data errors, drawing errors, informational errors, and change errors.
Since there are many tasks and details involved in an instrument system design, there are many opportunities for mistakes. The "devil is in the details" is very applicable. The design document review processes as well as self-checking methods can help. People tend to make the same errors when dealing with details or small matters (particularly if there is no large negative result), e.g., misspelling the same word, entering data incorrectly, etc. Knowing your own pet errors can improve your self-checking methods. Some of these errors result from a short-term memory error process--you think the entry is correct because you "know" you entered the "correct" data (a check at a later time typically reveals this type of error). Larger errors or errors of significant negative impact should to be treated as learning experiences and analyzed to prevent the error from happening again (rather than just being rationalized).
Data errors are mistakes that result from improperly entering data or errors due to the propagation of data on design documents. Instrument designs contain a tremendous amount of data, however, much of it is duplicated. The more you enter the same data, the more the opportunity for error. The common means to reduce this error typically includes a time delayed checking process and multiple reviews. The design of the engineering process to provide minimum duplicate data entry is also a method to reduce this type of error. Data must flow from document to document in a design. If the data paths are torturous or complex, data may not get to where it should correctly. The design of efficient data flow to minimize errors is a means to reduce this type error.
Drawing errors can come from errors on existing drawings that are used in a design. These spring from uncaught errors in previous designs, as-builts that have not been picked up, and changes by field forces that are not relayed to engineering. Field verification is the primary method of minimizing these errors. Drawing errors also can come from using computerized functions such as cut and paste where the pasted function is not updated with the new drawing's data. With CAD comes increased speed in doing drawings, but with increased speed also comes more potential for errors.
The normal review process for detecting mistakes for an operating company may have included reviews by the designer, a drafting checker, the instrument engineer, the instrument engineering supervisor, instrument maintenance, and operations. Unfortunately, many companies have reduced their personnel and re-structured their organizations. This has in some cases resulted in fewer reviewers and sometimes less qualified reviewers. An engineering and construction (E&C) firm will have a equivalent review process but the increasingly competitive E&C market can lead to tighter bids with shorter, less extensive review processes.
Specification Errors--One of the major ways errors enter the design process is in the specification phase. This error occurs many times because the initial scope of a project was not done properly. A well known Health and Safety Executive (HSE-UK) study that investigated accidents involving computerized systems concluded that 44% of accident were caused by a failure in specification. In a process plant this can occur due to failure of engineering to scope the project or failure of the requesting party to properly scope the project.
Specification errors results many times from not communicating the design specifications from the conceptual design stage to the detail design stage. Communication lines are sometimes strained in plants due to poor relationships between engineering, operations, and/or maintenance leading to communication breakdowns and errors. Internal cultural differences or even age differences can also contribute to the failure to properly communicate a project scope.
Time also plays a part in the specification breakdown. Due to workload, priorities, and reduced manpower in many plants, people may not allocate sufficient time to the initial scoping of the project. They assume the details will be filled in later, thus leading to a poorly defined and communicated scope and downstream changes leading to potential errors in the design.
Getting the specification right can significantly reduce the amount of potential human error in the design process. Some companies have gone so far as to have significant "front-end loading" on large projects to minimize the potential for specification error. Front-end loading can be a significant value-added process, even on small projects.
Communication Breakdown--The failure to communicate information or changes can lead to errors. If the correct information does not flow smoothly, to the right spots, and at the right time, errors may result. Both individuals and the design system affect the flow of information. Analysis of data flows to locate bottlenecks, kinks in the data flow, and error sources is a method of minimizing this kind of error source.
Changes in design open the door to errors. Changes are common in engineering designs but are typically poorly managed. Changes typically can come from many sources and must propagate to many places. Many times, changes are not well thought-out because the appropriate people are not consulted. Change management is a common practice in the process industry, but it is seldom applied at the design level.
Lack of Competency--These errors result from inexperience to outright incompetence. This type of error can come from the lack of the ability, skills, knowledge, and/or experience in the instrument design. Morale also effects this. In a low morale environment, even competent people's skill levels can waver. This is becoming more of an issue with the "brain drain" that is resulting from downsizing. Motivated and competent people are the mainstay against this error source.
Functional Errors--Sometimes, errors in design lead to errors in function or operation. The most obvious error is that the instrument design does not work. Some less obvious ones are an instrument design works but is of mediocre design (does not perform to its potential) or one that does not have the full expected range of operation or meets all the specifications.
Another type of functional error is a design that facilitates people making mistakes in the operation or maintenance of the design. If your design does not meet the operational expectations of the customer, your design may facilitate human errors. Operation from right to left rather than left to right, colors that don't agree with standard colors in the facility, backwards order of operation, things of different function but similar in appearance or arrangement, not following existing plant practices or standards, etc., are examples. This is one of the reasons behind standardization.
It is important to understand that no matter how good a design is, people will still make mistakes in operating and maintaining the equipment. Designers many times design for what they believe is normal operation with no expectation that the system will be operated in any other manner or that errors will made in its operation or maintenance. Failure to consult with the operators and maintenance personnel can lead to systems that are difficult to operate and maintain. If we anticipate other reasonable operational modes or possible errors, we can adjust our designs in regard to human factors, error recoverability, error tolerance, etc. Checklists are a good practice to minimize these kinds of errors.
Common Errors in Instrument Design--Some of these are improper grounding, improper shielding, failure to provide isolation, improperly sized equipment, wrong range or trip point for an instrument, failure to consider ambient temperature range (particularly the low end), wrong materials of constructions, equipment not properly rated for hazardous area, failure to consider power quality issues, no spare parts, incorrect wiring, incorrect tagging, failure to tag wires properly, poorly located, hard to maintain, etc. Again, checklists are a common method of minimizing errors in designs.
Errors in Construction, Operation, Maintenance
Error can occur in the construction phase of a project. Errors that affect the operation of the equipment are typically caught during commissioning or startup. However, other types of errors may not be. Some examples are wrongly identified or tagged equipment, equipment not properly installed in a hazardous area, improperly calibrated or ranged instruments, loose terminals, improper grounding, improper shielding, etc. Upfront constructability consideration can minimize difficult installations helping minimize installation errors.
The common protections against installation errors are a competent installation crew and supervision, installation inspections, checklists, punchlists, and planned commissioning tests.
Humans also make mistakes in operating equipment and processes. Some of these are simple slips but others may be facilitated by the design of the system or by the training, procedures, and practices. Some times the training, procedures, and practices functionality are considered only under normal conditions, while under abnormal conditions they may facilitate errors. Complex procedures are also prone to errors. Complicated or confusing tagging of equipment can lead to operational errors. Stress of the operating environment can contribute significantly. In one plant, the stress of keeping the plant operating led operators to delay a decision to shut down, which led to an accident.
Some common human errors in operations are misunderstanding instructions, writing down or entering the wrong value, missing a step in a procedure or task, misidentification, mis-estimation of quantity or extent, failure to communicate situations or conditions to other people (particularly across shifts), failure to properly lock out and tag out equipment, and lack of situational awareness.
Training is the most common solution. Training must include both normal and abnormal conditions. While well-designed systems and processes significantly contribute to error reduction, there is no substitute for motivated, good quality, experienced people.
Finally, mistakes happen in maintenance too. The wrong loop or equipment gets worked on, a transmitter gets calibrated to the wrong value, or a repair is botched. Some of these are due to slips, but others can be due to inexperience, lack of knowledge, poor maintainability, lack of motivation, poor supervision, or incompetence.
Downsizing is also reducing the experience level in maintenance departments. Meanwhile, technology changes rapidly, making it harder and harder to keep up with fewer and fewer people that have less and less experience.
Up-to-date drawings are a must. Out-of-date drawings can lead to errors when troubleshooting or repairing. Manually marked-up drawings can lead to errors. All drawings should go back to drafting and then be field-verified. Out-of-date or missing vendor documentation can also be a source.
Design for maintainability is a big issue in minimizing errors. If something is hard to work on, it may also be error-prone. Poor identification (tagging), poor access or work space, poor location, lack of or poor quality documentation, and poor lighting are contributors. A good working relationship with engineering that allows maintenance input into designs is a must. A maintainability checklist is another method to help assure the maintainability of instrument designs.
Maintenance procedures are also a means to help minimize errors. Standardization is another means to help minimize errors. The less variance there is in a system, the less potential for error.
Human error is inevitable. Failure to take this into account is courting disaster. The keys are to realize how and where errors can occur in your system, and then to adopt methods that lessen their occurrence, minimalize those that do occur, and mitigate their impact.
The Role of Human Factors
in Control System Errors, Part 2
MANAGEMENT systems, procedures, ergonomics, organization, and facility design are factors that can cause human error. Human error is blamed for many things. It is not uncommon to see in the paper that an airplane crash to be blamed on “pilot error,” or in the case of a recent explosion in a process plant, blamed on an operator error for failing to make the proper valve line up. But are these really human errors, or are they system errors in which the actual human error was just part of a larger problem that facilitated the human error?
Human factors is a term that is often seen when errors are analyzed. Human factors is actually a very broad subject of which human error is only a part. There is a variety of definitions for human factors. One of the definitions I like is:
“The human factors are the application of relevant information about human characteristics and behavior to the environment humans are operating in to maximize the benefit to the humans. “
Another definition could be:
“Human factors are the characteristics of human beings that are applicable to humans interacting with systems and devices.”
The terms human factors and ergonomics are often used interchangeably. Both can describe interactions between the worker and the job demands or work domain. Generally the difference between them is that ergonomics focuses on the physical interaction of humans with their work while human factors emphasizes broader scope of improving human performance in work tasks and reduce the potential for human error. One might say that ergonomics is really a subset of human factors.
Some human factors are based on human limitations or inherent behavior. Others can be based on psychological or sociological factors. Human limitations are based on the physical and mental capabilities of humans, while human-to-human interaction is based on human psychology. Sociological factors are group dynamics or culturally or ethnically based.
FIGURE 1: DESTROYED BY DESIGN
EPA and OSHA determined the 1997 explosion and fire that caused this damage to Shell's Deer Park, Texas, chemical plant was caused in part by a check valave that was not designed for the application. The process hazards analysis did not address the risk of valve shaft blowout. Lack of indication of a hydrocarbon leak in the control rool and inadequate communication during the accident contributed to its severity.
The culture-based human factors can be based on a local culture, for example, the culture of a plant, local area, or may be ethnic or society-based. While many people are not aware of their plant culture (can’t see the forest for the trees), each plant has a plant culture or essentially a way of doing things or how the plant responds to things.
Plant cultures result from the plant’s political and power structure, management and supervision style and attitude, procedures and practices, local area culture, plant experience, etc. Examples of plant culture extremes are the “not invented here syndrome,” isolation (no interest in how other people are doing things), and “it can’t happen here” syndrome.
An ethnic or society-based culture is based on the norms for the ethnic or society group. A cultural human factor, for example, might be that people in the U.S. read from left to right while some other cultures read from right to left. Another one is that some societies prefer group consensus over individual action.
Human factors also can be situational, that is, how humans interact with a particular situation or set of conditions. For example, in one plant an arrangement of process equipment may be a particular way, while in a very similar plant the arrangement may be slightly different. Or simply, the number of people involved in a situation at a given instant might be different, which could affect how a situation is reacted to.
Not All Human Factors Are Bad
There are bad human factors and good human factors. Human factors that facilitate error or poor performance are bad ones, while human factors that minimize errors or improve performance are good human factors.
Some common human factors to consider that can cause human errors are management systems (communication, training, scheduling, culture, style, work load, etc.), procedures (response to upset, operational procedures, plant practices, etc.), physical factors (ergonomics), organization (presentation, order, structure, etc.), and facility design (equipment, controls, environment, etc.).
Some other human factors to consider are how humans process information--how much information can a human process at a time, how fast we can process information, short-term and long-term memory, how humans handle complex situations, mindset, human interaction, group think (opposite of synergy--i.e., the sum of the parts is less than the whole.
Human factors exist everywhere in the lifecycle of an instrument system. Anything that makes things difficult in the implementation, operation, and maintenance of instrument systems can lead to human factor-facilitated errors. Design, operation, and maintenance procedures, practices, and systems that ignore how people really work can facilitate errors and poor performance. Examples are failure to do the upfront design properly, a poor change management system, overly complex instrument operation, complex procedures, an instrument location that makes it hard to work on, overly complex work procedures, poor supervision, etc.
Many human errors are due to or facilitated by human factors. The Battelle study of incidents in refineries indicated that 19% of the incidents involved random human error, while 81% involved human factors. Many times there is a rush to blame “human error” (meaning a particular individual making an error) because it is an easy out and does not place any blame on the company while underlying human factors are ignored. This is sometimes known as the scapegoat syndrome.
Human factors are a significant aspect when designing, operating, and maintaining instrument systems to minimize human errors. Their effect and consideration of human factors should be part of the lifecycle of any instrument system.
Human factors should be considered in all designs, procedures, and practices as a value-added practice and in some cases a matter of law. In fact, the importance of this is recognized by OSHA regulation 29 CFR 1910.119, Process Safety Management (PSM), which requires that the process hazards analysis (PHA) address human factors.
Human Errors In Safety Systems
Errors may also be classified on the basis of safety or not. Safety errors may result in an accident, a near miss or an accident waiting to happen. Safety errors caused by humans in safety instrumented systems (SIS) are called “systematic” errors.
In a commonly reported Health and Safety Executive (U.K.) study on failure of control and safety systems, 85% of the failures were attributed to failure to get the proper specification, changes after commissioning, installation and commissioning, and design and implementation, while only 15% were associated operation and maintenance.
Approaches to Human Error Reduction
There are a number of approaches for dealing with human error. Some of the main ones are prevention, anticipation, tolerance, mitigation, and lifecycle approach.
1. Prevention: Errors can be viewed in different context. One context is: an error is an error when the error is made; while another context is: an error is not really an error until the error causes an effect. These two contexts lead to two different approaches to error prevention.
In first context, we can only prevent error if the error is not made at all. This is obviously an efficient (but many times difficult) means of keeping errors out of a system; it falls under the often-used quality statement “Do it right the first time, every time.” This is a front-end process. Primary methods used to reduce this type of error are highly motivated, competent people and the reduction of human factors-facilitated errors.
In the second context, we assume errors will enter the system but we wish to catch the error before it has a negative effect. For example, an error was made on specifying the range of a transmitter but it does not have any effect until the transmitter is put into service. If we can catch the error before the transmitter is put into service, we can then catch the error before it has a negative effect. This is a back-end process and is less efficient than the first context. Review and supervision processes are commonly used to reduce this type of error. Unfortunately, many times these processes are somewhat informal, have no organized methodology in reducing errors, and seldom consider human factors.
2. Anticipation: This is where a potential error is identified and the opportunity for the error is minimized or eliminated. Human factors that facilitate an error can be changed to minimize the possibility of the error occurring. An administrative or engineering control can be used to minimize the potential for error.
Some examples of this are revising an overly complex procedure to a simpler one; creating a procedure to control safety system bypasses to assure that a bypass is not inadvertently left engaged; placing an interlock where an operator is prevented from taking an action unless some condition is satisfied; and high-level shutdown to prevent an operator from overfilling a tank.
Trevor Kletz, an author on this subject, said, “Some errors can be prevented by better training, or increased supervision, but the most effective action we can take is to design our plants and methods so as to reduce the opportunities for error or minimize their effects.”
3. Tolerance: This is where errors are expected but the system is tolerant of them. An example could be the entry of a wrong number into a human-machine interface but with an operator prompt to verify the number. Another example of this could be back-up systems that protect against a human error.
4. Mitigation: This is where there are systems in place that could mitigate an error. Example of this is a dike around a vessel to contain the liquid from the tank if an operator overfills the tank or a deluge system.
5. Lifecycle Approach: Obviously, it’s best if we can prevent human error altogether, but in general that is an unobtainable goal. However, many errors can be prevented or minimized with proper design of the system, engineering controls, administrative controls, training, and consideration of human factors.
One of the methods of minimizing errors in instrument systems is the lifecycle approach. This is where there is a formal lifecycle for the design, installation, operation, and maintenance of instrument systems. This type of approach can use all the methods to reduce or minimize human error already discussed (and more) but formalizes their usage. An example of a lifecycle approach to a system is given in ISA 84.01, “Application of Safety Instrumented Systems for the Process Industry.”
Cost of Errors
It is difficult to estimate the cost of errors in instrumentation systems. While we many times place the blame for obvious errors, we seldom evaluate the overall cost of errors unless the errors come to the attention of upper management. This is in part because errors are “hidden,” minimizing who knows about them, to prevent negative consequences.
Also, management and supervision often consider a certain amount of errors unavoidable, treated as sort of a cost of doing business, but not something they can really control. People are yelled at, chastised, supervised, punished, criticized, etc., but there are many companies that have essentially no quality system to work at reducing errors for instrument design, installation, operation, and maintenance. It is assumed that the supervisor and the normal managerial system will minimize these errors, but that is seldom efficient in the long-term reduction of errors.
In order to quantify the probability of human error, we must somehow quantify the propensity of humans to make errors under the conditions of interest. Since we are dealing with the complexity of human actions, this is somewhat difficult.
Several methods have been developed. Some are Human Error Assessment and Reduction Technique (HEART), Technique for Human Error Rate Prediction (THERP), and Empirical Technique to Estimate Operator Errors (TESEO). A discussion of these methods can be found in Reference 6.
HEART, as an example, is deterministic and fairly straightforward. It was developed by J.C. Williams in the early 1980s. HEART quantifies human error into a probability of the error, an error producing condition multiplier, and a proportioning effect. The first two of these are provided in tables while the proportioning effect is determined by the experience of the person doing the analyses.
There have been reports that the introduction of automatic protections has actually raised the amount of human error. One conclusion is that with known automatic protections in place, operators may be prone to more risk taking either individually or in their operating philosophy. If true, this merits close evaluation of the human factors involved.
In conclusion, human error occurs in instrument systems all the time. And human factors play a large part in facilitating human error. The cost of human error can be high and there can be a substantial impact on safety.
It cannot be assumed that normal management or supervisory systems will reduce or minimize human errors. Indeed, they may create human factors that actually facilitate human error, and they may not have any formal methods to reduce errors.
|About the Author|
ControlGlobal.com is exclusively dedicated to the global process automation market. We report on developing industry trends, illustrate successful industry applications, and update the basic skills and knowledge base that provide the profession's foundation.