By Doug Metzger
Since the early to mid '90s there has been an increasing recognition within the process industries that alarm systems, particularly in distributed control systems (DCSs), have gotten out of hand. Results of studies by the ASM Consortium (www.asmconsortium.com) documented in 1995 (Nimmo, 1995) concluded at that time that better handling of abnormal situations in the U.S. petrochemical industry alone could save up to $10 billion per year―a figure that has been repeatedly quoted in literature. Alarm systems have often been identified as part of the problem, rather than the solution. Because of the ease of adding alarms in a DCS, and the apparent added security of doing so, the numbers of alarms an operator must deal with have grown over the years from a few hundred to many hundreds or thousands (Andow, 2000). This often presents the operator with excessive, redundant and unnecessary alarms, overloading the operator and obscuring real problems. As Bransby observed, during plant upsets "there are real risks of important alarms being missed by the operator―with potentially severe consequences." (Bransby & Jenkinson, 1998, p. 62). Unfortunately, experience has shown that there is not a single quick and easy solution to the problem.
Work by the Engineering Equipment and Materials Users Association (EEMUA, www.eemua.org) in the 1990s resulted in publishing its guidelines in 1999―EEMUA Publication No. 191, Alarms Systems, a Guide to Design, Management and Procurement. This document and its update in 2007 (EEMUA, 2007) have been the premier guide for the emerging worldwide consensus on principles and practices for alarm management in industrial process control. Along with documenting the basic principles of good alarm system management and successful techniques in use, EEMUA Publication No. 191 set some clear alarm system performance targets―among them, less than one alarm per 10 minutes as "very likely to be acceptable" and less than ten alarms in 10 minutes following an upset as "should be manageable, but may be difficult" for the operator.
EEMUA performance targets, though aggressive, based on study of industry norms (Bransby & Jenkinson, 1998), and the core principles of alarm management established in Publication 191, have had a normalizing influence on the field of alarm management in the process industries.
The ASM Consortium
The Abnormal Situation Management Consortium (ASM Consortium) began in the early1990s, as an outgrowth of issues around alarm management in distributed control systems. The Consortium's focus of "abnormal situations" is broader than just alarms, but alarm management remains a central theme.
As part of its activities, the ASM Consortium had supported EEMUA's efforts to produce and publish its guidelines in 1999 (and later in 2007). Realizing early in this decade that the EEMUA performance targets were quite aggressive, the Consortium conducted human factors research which showed that EEMUA targets were, in fact, appropriate from an operator loading standpoint (Reising, Downs, & Bayn, 2004). Meanwhile, a study of Consortium member sites (Reising & Montgomery, 2005) showed that (a) such averages can be achieved, but that (b) most operating plants were not meeting these numbers, some off by an order of magnitude or more.
The EEMUA Publication 191 provided a very good basis for alarm management practices; but in a desire to achieve the EEMUA targets, the Consortium members felt the need for more specific recommendations and examples drawn from the collective experiences at member company sites. Like EEMUA, Consortium members wanted to produce guidelines and not standards. The ASM Consortium guidelines on effective alarm management practices were first created in 2003 for use within the Consortium membership, and have undergone six revisions since the first release. The current release, completed in 2009 (Errington, Reising, & Burns, 2009), was made available for purchase by the public on June 15.
The ISA18 Committee
Unlike EEMUA and the ASM Consortium, production of standards documents is one of the missions of the International Society of Automation (ISA, www.isa.org). In June of 2003, the ISA Standards and Practices Board reactivated its SP18 Committee (which had produced a standard in 1979 on Annunciator Sequences and Specifications―ISA-18.1-1979) to create standards for computer-based alarm systems for the process industries.
The first full committee meeting was held in June of 2004 (a skeleton group had met in October of 2003), and from then until now has been comprised of a cross section of experienced personnel from process control and alarm systems producers, users, consultants and engineering contractors. There are currently around 90 members on the committee, 25 of whom are voting members. Members meet face-to-face twice yearly and participate in the interim by editing and commenting on the drafts, both formally and informally. According to Nicholas Sands, co-chairman of the committee, there have been eight interim drafts, and since May of 2006, over 8,000 formal comments have been received and tracked to resolution by the committee. It was approved by committee vote in April 2009, the ANSI/ISA approval process was completed on June 23, and it was made available for purchase in July of this year (ANSI/ISA-18.00.02-2009).
The ISA18 committee was intentionally comprised of known experts in the field, including some with ties to the earlier work by the ASM Consortium and EEMUA, so that the results would be consistent with and a part of the emerging worldwide consensus on alarm management principles and practices.
Different Approaches, Same Target Problem ASM Guideline Document Structure
The structure and content of the ASM guideline document was driven by a number of factors:
- The audience was to be Consortium members, already relatively knowledgeable in alarm management and familiar with the EEMUA guidelines and goals.
- But as stated earlier, Consortium members desired to capture the lessons learned from site experiences to help achieve the EEMUA targets, which ASM research had determined were valid and important to achieve.
- In addition, though not a standards document, there was a desire to provide a prioritized listing of guidelines for site implementations, as had been done with the ASM Consortium guidelines on effective operator display design, also originally published internally to the Consortium and now available publicly (Bullemer, Reising, Burns, Hajdukiewicz, & Andrzejewski, 2009).
The structure of the ASM guideline document is made up of three main guideline sections:
- Management Practices, which contains 14 guidelines directed at effectively managing and continuously improving the alarm system.
- Alarm System Design and Implementation, which contains 22 guidelines directed at processes and methods for creating an effective alarm system.
- Training, which contains 7 guidelines directed at training practices for not only the design teams that will configure the alarm system but also the operators that will have to respond to the alarms.
Each of the 43 guidelines follows a common structure, made up of four components:
- The numbered guideline statement – a succinct statement of the guideline, e.g., Guideline 1.4, "Use a Management of Change (MOC) Process for Alarm Changes".
- "Why?" – A brief description of the rationale – why this is important.
- "How It Works" – A longer discussion of "what" should be done, often including when and where it should relate to other plant work processes. The attempt is to describe "what" and not "how," despite the use of the word "how" in the title of this segment. This section often uses the "should" style of statement, similar to ISA-18.2 wording, but it is less formally done than in ISA-18.2.
- "Examples" – A discussion of one or more examples (i.e., success cases) from ASM Consortium member sites―examples of how it could be done, without requiring that it be done in this way.
Each of the 43 guidelines is also given a priority to help sites evaluate where to focus resources. There are three priorities that are meant to represent a tiered or hierarchical approach to continuous improvement for the alarm system:
- Priority 1 – minimum guidelines for achieving good-quality practice.
- Priority 2 – the comprehensive set of guidelines for achieving high-quality practice.
- Priority 3 – the advanced set of guidelines for achieving ASM best practice.
Besides the guidelines themselves, the ASM guidelines document provides other useful supporting information, including:
- Business drivers for alarm management projects―why it is important for sites to do this.
- A summary of some related ASM work on situational awareness.
- A summary of key findings from recent ASM Consortium research in alarm management.
ISA-18.2 Document Structure
When the ISA18 Committee met in 2003 and 2004, it was clear that one of the most important keys to success was the need for alarm management to be treated formally and as an ongoing activity―not just a onetime clean-up effort at a site. Further, successful alarm management must involve multiple parts of the operating team and be integrated into a site's operating work processes. These success factors led the committee to begin by documenting the alarm management work process, and then organizing the standards document around the work process. The result provides a "blueprint" to help site people plan, fund and execute their efforts. Figure 1 shows the ISA-18.2 alarm management life cycle, which forms the structure of the ISA-18.2 standard. With some minor exceptions, each life cycle stage has a major section, or clause, dedicated to it in the standard.
Another early thrust of the ISA18 Committee was definitions. The committee created a starting list from EEMUA, NAMUR (www.namur.de/), ISA84, and other places, for a consistent starting point; then reviewed and re-reviewed them with each draft cycle for conciseness and relevance to the standard. An example of this process is the fate of the term "disable," which was extensively discussed, then ultimately removed from the standard because different DCSs use it in different ways. As this term was removed, the term "out-of-service" was added and defined to relate more directly to the Maintenance Stage of the life cycle. This leaves it up to the implementer to utilize the tools of the chosen DCS or alarm system and to adapt the site work processes to address the "out-of-service" condition to meet the ISA standard.
As a standards document, the team took care to provide concise requirements ("shall" statements) and recommendations ("should" statements) that identify the key components of each life-cycle stage shown in Figure 1. Extensive background, rationale and examples present in earlier drafts were pushed over time to an appendix, and finally out of the standard document itself. These will resurface in technical reports that are to be published by the committee starting later this year. Nevertheless, the standard does still contain a number of items of supporting information, including the following:
- The alarm management life cycle – diagram and discussion.
- An alarm state transition diagram and operator response time line – diagrams and discussion.
- Definitions, which in the author's opinion, are the best definitions yet produced for such terms as "alarm," "alert," "standing/stale," "suppression," "nuisance alarms" and others.
Hence the two documents are very complementary―same problem, same goals, but from two different directions and perspectives. The alignment of ASM guidelines with ISA stages/clauses are shown in Table 1 and Table 2. Table 1 takes the topical orientation of the ASM guidelines and identifies the ISA-18.2 life-cycle stages in which the ASM guidelines are covered; and where applicable, it lists specific related ISA-18.2 subclauses. Table 2 inverts this, taking the ISA-18.2 life-cycle orientation and identifies ASM guidelines that will give the practitioner rationale, further considerations and examples when addressing each of the ISA18.2 life-cycle stages. (The tables show the author's opinion on the primary relationships; there may be other relationships not shown.)
Note that many of the ASM guidelines relate to multiple ISA-18.2 stages, as noted in the right-most column in each table.
It would be easy to go through the details and fault each document for not covering everything that the other one did. But that would miss the point. These are different documents, approaching the same subject from different viewpoints. Each editing team did a good job of staying true to its own approach and not trying to be all things to all people. The result is two documents that are very useful together. Here are some examples.
The very first ASM guideline shown in Table 1, Guideline 1.1, is "Establish company management support for alarm management." The ISA-18.2 document does not have a specific clause that says this precisely. However, as discussed earlier, the entire life-cycle-based organization, described in Clause 5, Subclause 5.2, provides a concrete basis for selling such a program to management. ASM Guideline 1.3 says, "Establish an owner of the alarm system and ensure adequate staffing". ISA-18.2 Clause 6.2.4 in the Philosophy section of the ISA document identifies four specific roles and responsibilities that must be made clear in the site's Alarm Philosophy document, including "ownership of the alarm system, the philosophy and related documents," as well as alarm system configuration and maintenance, resolution of technical problems, and ensuring that the alarm philosophy is followed.
ASM Guideline 1.4 is "Use a Management of Change (MOC) Process for Alarm Changes". ISA-18.2 dedicates a stage to MOC, which is documented in Clause 17, listing specific items that should be documented in the MOC process and stating that the MOC process must assure that "the appropriate stages of the alarm management life cycle are applied to alarm system changes."
ASM Guideline 2.12 and ISA-18.2 Clause 9 (Rationalization) provide very similar basic recommendations. Both documents state that an alarm must be something that requires operator action. Table 2, under ISA-18.2 Clause 9 shows this and other ASM Guidelines that relate to rationalization. Each document discusses the team approach, documentation needed, and alarm prioritization. ISA-18.2 adds a discussion on "classification" at this and other points in the standard. Among those listed, ASM Guideline 2.12 discusses several site experiences, providing information like specific types of personnel used, amount of time taken and results achieved. Also, ASM Guidelines 2.2 and 2.3 provide guidance, with examples, for making the rationalization effort more efficient and effective.
In the Human Machine Interface clause (see Clause 11 in Table 2), ISA-18.2 details specifics about alarm state presentations, such as using distinctive color, blink status and audible indications (Clause 11.3); and discusses specific items that should be on alarm displays. The ASM Guideline 2.7―"Provide effective alarm annunciation"―has few words on such specifics in the "How it Works" section, but references more details in the ASM operator interface guidelines, which were made available to the public in 2009 (Bullemer, Reising, Burns, Hajdukiewicz, & Andrzejewski, 2009). Moreover, ASM Guideline 2.7 provides more specific guidance through examples on effective visual and audible alarm annunciation in challenging environments such as multi-console control rooms and remotely located field operations.
Another case where the ASM guidelines go a bit farther than the ISA standard is with respect to alarms on process graphics. ISA-18.2 identifies that alarm information should be on process graphics (Clause 11.6.5). ASM Guideline 2.6 has a more extensive guideline on usage of alarms in process graphics, including rationale and discussion of example techniques.
Integrating safety systems
Both documents recognize that there may be cases in which the safety system, including the operator interface, must be independent of the DCS―each referencing the same standard (ANSI/ISA-84.00.01-2004). Each document is careful not to specify design of safety systems or to overlap the ANSI/ISA-84 standard. But each discusses situations in which both safety systems and basic process control systems deliver alarms to the operator.
ISA-18.2 specifically includes SIS input (e.g., SIL assessments and LOPA analysis) as part of its Identification Stage (Clause 8), lists safety procedures in its list of "other site procedures" to integrate with (Clause 6.2.19), and notes in the Human Machine Interface clause (Clause 11.10) that user interfaces may need to be separate. ASM Guidelines 1.9 and 2.15 make the general statements of intent―"Ensure that alarm management is a part of an integrated safety program," and "Provide safety system design and safety-related alarm handling." The guidelines provide some additional background, rationale and examples; then they arguably go a bit further than the ISA standard, by asserting with examples in Guideline 2.9 that multiple separate alarm systems, including safety systems, should be brought to the operator in a consistent way.
ISA-18.2 includes training for operators in the Implementation Stage (Clause 13) and refresher training for operators in the Operation Stage (Clause 14), and training for advanced alarm systems (Clause 12). Clauses 13 and 14 list specific items that should be included in the training.
The ASM document has seven guidelines on training listed in the last section of Table 1. These seven are more general in nature, but encourage a broader operator training experience through the use of "what if" training (Guideline 3.3), situation support tools (3.4) and the use of dynamic simulators (3.7). Additionally the guidelines specifically cover training of the design personnel (Guideline 3.6) and the rationalization team (Guideline 3.2).
Summary and Conclusions
Generally, each ASM guideline is addressed in some way by one or more ISA stages, though ISA-18.2 describes the specific life-cycle stages in more detail than appears in the ASM guidelines document. Taking the other view, each ISA-18.2 stage has one or more ASM guidelines that relates to that stage, though ASM guidelines often make recommendations beyond those found in the ISA standard.
Thus, in the author's opinion, the ISA-18.2 document is the better reference for definitions and for work process details―in some ways a more succinct document for analyzing the impact on your plant; i.e., to plan, fund and execute. The ASM Consortium document is the better reference for background, rationale, business justification and site examples―and being less constrained, enables it to have a slightly broader scope in its recommendations.
Both documents have their recommendations prioritized―ASM using its three numbered priorities, and ISA-18.2 effectively using two―with its "shall" statements and "should" statements. The practitioner can take each document's essential practices and work from there. For example, a good starting point is the ISA-18.2 "shalls," along with the background and added considerations of the ASM priority 1 guidelines.
In conclusion, these two works, developed somewhat independently, but with common motivations and goals, complement each other nicely because of their different perspectives and approaches. Both documents will provide control system engineers and the operating team good guidance and structure to work toward the objectives and targets laid out by EEMUA Publication 191.
Andow, P. (2000). "Alarm Performance Improvement During Abnormal Situations." HAZARDS XV: The Process, Its Safety, and the Environment: Getting it Right. Institute of Chemical Engineers. Manchester, UK. April 2000.
ANSI/ISA-18.00.02-2009. Management of Alarm Systems for the Process Industries, ISA, Research Triangle Park, NC, 27709, 2009.
ANSI/ISA-84.00.01-2004, Part 1(IEC 61511-1 Mod) (2004). Functional Safety: Safety Instrumented Systems for the Process Industry Sector — Part 1: Framework, Definitions, System, Hardware and Software Requirements. ISA, Research Triangle Park, NC, 27709, September 2004.
Bullemer, P., Reising, D., Burns, C., Hajdukiewicz, J., & Andrzejewski, J. (2009). ASM Consortium Guidelines: Effective Operator Display Design. Phoenix, AZ: ASM Consortium
Bransby, M., & Jenkinson, J. (1998). "Alarming Performance," Computing & Control Engineering Journal, April, 1998.
EEMUA (2007). Alarm Systems: A Guide to Design, Management and Procurement, Publication 191, Edition 2, The Engineering Equipment and Materials Users Association. London: EEMUA
Errington, J., Reising, D., & Burns, C. (2009). ASM Consortium Guidelines: Effective Alarm Management Practices. Phoenix, AZ: ASM Consortium.
Nimmo, I. (1995). "Abnormal Situation Management – Adequately Address Abnormal Operations," Chemical Engineering Progress. September, 1995.
Reising, D., Downs, J., & Bayn, D. (2004). "Human Performance Models for Response to Alarm Notifications in the Process Industries: An Industrial Case Study." In Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting (pp. 1189-1193). Santa Monica, CA: Human Factors and Ergonomics Society.
Reising, D. & Montgomery, T. (2005). "Achieving Effective Alarm System Performance: Results of ASM Consortium Benchmarking against the EEMUA Guide for Alarm Systems," Proceedings of the 20th Annual CCPS International Conference, Atlanta, GA, 11-13 April 2005.
Thanks are due to Nicholas Sands, Chairman of the ISA18 Committee, to Dal Vernon Reising and Jamie Errington, co-authors of the ASM Alarm Management Guidelines, and to Peggy Hewitt, Director of the ASM Consortium for encouragement and help in reviewing this paper.