How to succeed at alarm management

April 13, 2005
Find out how you can improve the effectiveness of your distributed control alarm systems by better managing the techniques, tools, standards, and procedures you use in your process plant.
THE INTRODUCTION OF the DCS has made it possible to create alarms more easily and at a lower cost. Although software alarms are convenient, the ease with which they can be created removed the incentive to limit alarms. As a result, operators today are faced with more alarms than they can effectively monitor. Alarm management seeks to identify unnecessary alarms, alarms set at the wrong value, and where improvements can be made to the current procedures for dealing with alarms.What is Alarm Management, and why do I need it?
Alarm Management comprises a collection of techniques, tools, standards, and procedures that improve the operations of process plants by improving the effectiveness of alarm systems.Nuisance Alarms
When alarms are functioning well, they perform three important tasks:
  • Alerting the operator that a change has occurred
  • Informing the operator of the nature of the change
  • Guiding the operator to take the correct action

Without a disciplined Alarm Management program, the process alarms in a plant become less and less functional over time. The number of nuisance alarms increases to a point where the alarms distract the operator from the plant, rather than doing what they are designed to do.

Alarm Proliferation
Alarm management has existed as long as there have been alarm systems, but it has become more important since the distributed control system (DCS) was introduced in the 1970’s. The distributed control system introduced software alarms - alarms that are created or changed by configuring a setting in a computer, rather than requiring a hardwired signal to a panel. As a result, more alarms could be configured for lower cost. This meant safer operations, but it also relaxed the engineering controls on the creation of alarms. Since there was now no cost to implement an alarm, there was no incentive to limit the number of alarms.

Naturally, some of the alarms that were configured were not really necessary, or were set at the wrong value. This leads to "nuisance alarms" - alarms that do not tell the operator anything he does not already know, or which require no action.

Since the 1970’s, as distributed control systems have become more sophisticated, more and more process units have been centralized under the control of fewer panel operators, so each operator has become responsible for more and more alarms. In addition, the number of alarms that can be configured on a single measurement has ballooned. Alarms are now commonly set on Low, High, Low-Low, High-High, Deviation, bad value, and sometimes other values.

An individual operator now is confronted with tens of thousands of configured alarms within his area of the plant. During upsets, hundreds or thousands of these alarms can occur in a very short period. To make matters even worse, organizations such as OSHA, the EPA or voluntary programs like Responsible Care?, ISO-9000 and ISO-14000 require periodic process hazard analyses or process assessments, which all result in the creation of additional alarms.

When there are too many alarms during an upset, they distract the operator and conceal the actual nature of secondary problems, instead of alerting the operator to real problems. So, in the absence of an alarm management program, manufacturing plants become less safe rather than safer; incidents become worse rather than better, and production losses increase.

What Does Alarm Management Include?
Alarm Management consists of the set of procedures, practices, tools and systems that jointly ensure that the alarm system in a plant is as effective as possible throughout the life of the plant. These can include:

  • Operator training on how to respond to alarms
  • Reducing the number of systems that can generate alarms
  • Documentation of the steps to be taken when an alarm occurs
  • Operator assistance in finding information regarding an alarm
  • Standardization and enforcement of the criteria for determining alarm priority
  • Risk-based evaluation of all alarms, to reconfigure or eliminate unnecessary alarms
  • Identification and elimination of nuisance alarms using alarm history
  • Alarm enforcement
  • Alarm change management
  • Automatic suppression of alarms
  • Definition of different alarm limits for different operating conditions or modes
  • Replacing early-warning" alarms with continuous monitoring

All Plants Need Alarm Management
The real question is: "Is my existing Alarm Management sufficient?" Ask yourself the following questions:

  • Do you need to know how many alarms you get in a day, how often alarm floods occur, or which tags are causing the most problems?
  • When you walk into the control room on a quiet day, do more than 10 to 15 alarms show up on the alarm summary screen for a single operating area?
  • Are there more than two systems that generate alarms, for example, DCS and a single additional emergency system panel?

When there is an upset,

  • Does the alarm annunciator sound continuously?
  • Does one operator lean on the alarm silencer button continuously?
  • Is the alarm annunciator disconnected?
  • Have there been any operating incidents where the operator missed an alarm that did occur?
  • Do more than 1000 alarms occur in a given operating area in a day?
  • Do operators change alarm limits?
  • Do operators use alarms to inform them of non-critical changes in the plant, such as end-of-batch?
  • Are there alarms that operators do not know how to respond to?
  • Does any operator not understand why each alarm is the priority it is?

If the answer to more than one of these questions is "I don’t know" or "yes", then your Alarm Management system needs improvement, or at least an assessment. Poorly-managed alarms are a disaster waiting to happen. At best, they distract the operator from important events and slow down the response to the upset. At worst, the alarms will not be noticed, won’t occur at all, or will not be understood-- not a situation you want to deal with in an EPA/OSHA or insurers audit.

What will Alarm Management do for me?
Without effective alarm management, you cannot be certain that your operators will respond effectively when there is an upset. Alarms should define the boundary between normal operation and abnormal operation.

The following figure shows this relationship:

  • Continuous monitoring detects disturbances that the operator must respond to in order to assist the automatic control system regulate the plant
  • DCS alarms alert the operator to significant upsets that must be corrected
  • Safety system alarms indicate the onset of critical events that require fast, sometimes radical, action, such as shutting down the plant.

A properly applied Alarm Management program ensures that the alarms are:

  • Configured properly and enforced, so they are set at the right operating conditions
  • Understood by the operator, and so are effective
  • Practical, so the operator has enough time to respond to the alarms before a significant loss occurs

How do I do it?
There are essentially two routes to improved Alarm Management:

  1. Continuous improvement of normal operations
  2. A new Alarm Management program

Both approaches start with an assessment.

Assessment
The Alarm Management Assessment (AMA) is an evaluation of your current operation by trained, experienced personnel, who examine all aspects of your existing alarm management systems. The AMA evaluates your current exposure to risk as a result of poor alarm management, and provides you with a grade in each of the different areas of Alarm Management, as well as an overall grade for your plant. Depending on the grade, the AMA recommends one of two different courses in each area.

The areas of Alarm Management evaluation are:

Alarm Definition: the existing procedures for defining and configuring alarms, alarm settings and priorities; the documentation of alarms and alarm-related procedures.

  • Management of Change: the effectiveness of the plant organization in maintaining the alarm settings once defined, and in ensuring that alarm settings are updated when physical equipment, procedures and plant operating parameters change.
  • Operator Readiness: the current state of operator training in response to alarms, and the systems in place to ensure operators are trained, qualified, and able to respond to alarms.
  • Alarm System Effectiveness: the actual performance of the alarm system during normal operation and upsets; the frequency of nuisance alarms and alarm floods; the availability of the alarm documentation as and when needed; and the effectiveness of the alarm system during significant operating incidents.

If the existing practice in an area receives a grade of B or A, then there is no urgent need for improvement in that area. You may need some management systems to ensure that performance is maintained or improves continuously over time; however, large amounts of effort are unnecessary in that area.

If the existing practice receives a grade of C or lower, then there is an organizational need to improve in that area, as described in the following sections.

Approaches to Alarm Improvement
There are two alternative approaches to alarm improvement. The appropriate approach depends on the grade from the Alarm Management Assessment. If the existing Definition, Management of Change and Operator Readiness processes are effective, then the alarm system can be improved by relatively simple continuous monitoring and continuous improvement. If the support processes are not in place, a more thorough approach is necessary.

Continuous Improvement
If all is well, or if there are other organizational priorities, then you can apply a continuous improvement philosophy. You can greatly reduce nuisance alarms during normal operation by this method, however, this approach is effective at improving and maintaining only an already well-managed alarm system.

  1. Collect data. Collect alarm history for at least a month, including all alarms for all consoles along with all operator actions. This data will provide both an original benchmark and the basis for alarm analysis.
  2. Analyze. Typically, most nuisance alarm occurrences result from a surprisingly small number of actual configured alarms. The initial alarm analysis will quickly identify those alarms most in need of reconfiguring. It will also provide insight into the severity of the current problem.
  3. Benchmark. Analyze the alarm history to determine whether the current rate of occurrence is within the EEMUA guidelines for standing alarms, alarm occurrences per shift, and for alarm bursts. Analyze alarms area-by-area, since some operating units or areas are more prone to alarms than others. Using the alarm history, measure the original performance as a benchmark for comparison to show improvement in nuisance alarms over time.
  4. Spend 1 or 2 days per month doing Alarm Management. As more alarm history is collected, a senior operator and an engineer should spend a day or two a month to find the worst nuisance alarms and reconfigure them in the DCS. Over the long term this can have a significant effect.
  5. Measure to confirm that improvements are being maintained. The monthly alarm occurrence statistics will show the frequency of alarms and the number of standing alarms decreasing over time. This can be very powerful evidence of an effective alarm management program.

An Alarm Management Program for the 21st Century
This approach is similar to the Six Sigma program: first the problem and objective are defined, then the current state is measured and analyzed, action is taken to improve the situation, and finally the situation is monitored automatically to control, or sustain improvement.

The objective of this approach is to produce a self-sustaining alarm management competency within the organization, and to bring the initial benefits of alarm management as quickly and effectively as possible.

  1. Create an alarm philosophy. Define the purpose of each type of alarm or message that is issued by an automated system. Specify the criteria for each priority, for hardware alarms and for when an alarm is needed at all. Best practices in this area include Integrated Risk measures, addressing human factors, and cognitive limitations of human operators. Define other alarm management policies, such as the business rules governing alarm changes, the inhibiting or suppressing of alarms, expectations of operators, and so on. This step produces a document that clearly defines the objectives of the alarm system and the ground rules for its implementation.
  2. Benchmark alarm performance. Assess the actual current state. This provides useful data for the following analyses, and also provides a benchmark of the original state of the organization. Alarm configurations and history are collected from the DCS and analyzed with ProcessGuard.
  3. Decide which areas require rationalization. Alarm Rationalization, or Alarm Objectives Analysis, is a procedure where each alarm is examined to ensure it conforms to the alarm philosophy. During this process, many alarms are eliminated, others have their priorities changed, and still others have their trip points altered. The process can be time-consuming, and so it is best to start in those parts of the plant that have the worst problems with alarms.
  4. Implement Alarm Configuration System. After choosing a starting point, implement a database to collect, retain, control, and present the information collected during the Alarm Objectives Analysis process. This database will form both the change management system and the online operator assistance.
  5. Commit Resources. This team will need to work together for several weeks, so free them of most of their other responsibilities for the duration of the analysis. Form a team consisting of:
    - A facilitator, familiar with the AOA process and with alarm analysis
    - At least one area operator, preferably two
    - The area process engineer
    - The area production supervisor
    - An area instrumentation technician
  6. Conduct AOA Meetings. Use analysis reports as evidence of problems or situations.
    Review the configured alarms and answer several questions regarding actual alarm
    occurrence and the purpose for the alarm. For example:
    -  Possible causes for the alarm, whether legitimate (providing information), spurious (misleading or nuisance) or redundant (telling the operator something he already knows)
    - The procedure to identify the cause and thus validate the alarm
    - The procedure to mitigate the different causes
    - The time frame the operator has to respond in, before some undesirable consequence occurs
    - The follow-up action required to verify that the procedure was effective
    - The historical frequency at which the alarm occurred, and how many occurrences were spurious or redundant
    - The final alarm trip point and priority
  7. During the AOA process, consider alarms that do not exist, but should in addition to considering existing alarms.
  8. Re-configure DCS. Implement the changes from an AOA carefully, not made piecemeal. An AOA results in an alarm system that is designed to alert, inform, and guide the operator. The set of alarm configurations work together to define the boundaries of normal operation--if AOA configurations are made piecemeal, some alarms may be removed before alarms intended to replace them have been configured. In that case, the plant will be less safe. It is also important to introduce continuous monitoring applications identified during the AOA. Continuous monitoring applications, such as multivariate trends, provide the operator with improved visibility into the process, without having to rely on alarms as the operator’s "eyes and ears." Much of the barrage of alarms during upsets is a result of the misuse of alarms to alert the operator that an expected event has occurred. Continuous monitoring restores the alarms to their original function of identifying unexpected events that require a response.
  9. Human factors. Ensure the human factors are addressed, such as the design of the operator interface in the DCS to the number of independent systems that emit alarms, the audible alarm itself and when (and whether) it can be silenced.
  10. Continue to measure and sustain alarm system effectiveness. ProcessGuard can monitor the actual alarm occurrences, fine-tune the alarm settings, and identify errors made in the AOA. The plant personnel who participated in the original AOA will be able to maintain it as changes are made to the plant.

Conclusions

  • Alarm management is just good process plant management.
  • Advances in technology and increases in government regulation have jointly made alarm management more important and more difficult.
  • Poor alarm systems contribute to production losses, equipment damage and injuries during critical incidents.
  • Many plants can improve their alarm management without major costs.
  • Matrikon can provide Alarm Management Assessments, Continuous Improvement Programs and full Alarm Management Programs.

Appendix:

Management Responsibilities
OSHA APPENDIX C TO 1910.119 - COMPLIANCE GUIDELINES AND RECOMMENDATIONS FOR PROCESS SAFETY MANAGEMENT

In addition, various engineering societies issue technical reports that affect process design.

For example, the American Institute of Chemical Engineers has published technical reports on topics such as two-phase flow for venting devices. This type of technically recognized report would constitute good engineering practice. Operating procedures addressing operating parameters will contain operating instructions about pressure limits, temperature ranges, flow rates, what to do when an upset condition occurs, what alarms and instruments are pertinent if an upset condition occurs, and other subjects.

Computerized process control systems add complexity to operating instructions. These operating instructions need to describe the logic of the software as well as the relationship between the equipment and the control system; otherwise, it may not be apparent to the operator.

Training in how to handle upset conditions must be accomplished as well as what operating personnel are to do in emergencies such as when a pump seal fails or a pipeline ruptures.

Our Interpretation
Now that organizations like the EEMUA have defined recommended practices for Alarm Management, OSHA will soon compare your plants to these practices.

National Safety Council

Do You Know What Is Really Critical?
By Dennis C. Hendershot

In general, a more reliable plant is a safer plant. Unplanned shutdowns, with equipment in modes of operation not anticipated by the designer, can create significant risks. And, data shows that for most continuous chemical plants, the highest risk phases of operation are startup and shutdown.

Our Interpretation
An effective alarm system is important in terms of production, safety and equipment damage. Alarms need to occur early enough, but not too early.

Responsible Care Manufacturing Code of Practice
Each company shall have written operating, engineering and maintenance procedures which specify conditions for the responsible operation of any facility during normal or abnormal circumstances. The company shall:

  • Perform and document a regular hazard analysis and risk assessment of the operating facility and take action to minimize identified risk
  • Have written and up-to-date procedures which cover all phases of operation, including start-up and shutdown
  • Have written and up-to-date procedures which protect personnel during the maintenance of the facilities
  • Take action to prevent injury, damage and harm to people and the environment from explosion, fire or uncontrolled releases
  • Have a management system to control and record changes and modifications to equipment, processes, materials and associated computer hardware and software
  • Institute security procedures and systems which protect the facilities and address possible security threats
  • Maintain systems and procedures to minimize risks to safety, health and the environment during the handling and storage of all materials used and produced
  • Audit and update these procedures on a regular basis.

Our Interpretation
You need to conduct process hazard analyses, and have comprehensive procedures to address how the operator must respond to alarms.

Matrikon ProcessGuard
EEMUA
Alarm Management Workshops