The dynamic world of alarms

[sidebar id =2]

Stan: Last month, Nick Sands, co-chair of the ANSI/ISA-18.2-2016, “Management of Alarm Systems for the Process Industry” standard and DuPont’s global alarm management leader, gave us an insightful perspective on what’s really needed. Here we benefit from examples and an accounting of what works and doesn’t work, courtesy of Darwin Logerot, alarm management specialist and senior consulting engineer at ProSys and member of the ISA 18.2 committee.

Greg: What were the results of a recent project?

Darwin: A refinery had 12 consoles with 1,000 standing alarms. In less than the expected project time frame of 1½ years, we were able to reduce the number of alarms to less than 55, exceeding the goal of less than five alarms per console. This reduced the alarm standing summary from five to six pages per console to less than a quarter of a page. Furthermore and most importantly, all of the alarms in the alarm summary were relevant.

Stan: What makes an alarm relevant?

Darwin: A relevant alarm must give notification of the action required by the operator to prevent or correct a detrimental consequence in terms of safety and/or process performance. The alarm must be unique (only one per situation), and the operator must be able to understand what’s needed and have the time to make the correction. When the fix is done, the alarm clears.

We have additional metrics of average alarm rate and peak alarm rate. While metrics are generally useful in providing feedback, there’s a danger from focusing on alarm rates and not on relevance. Rates can be reduced by just deleting alarms. What’s needed is to make sure each and every operator-preventable consequence has one alarm with the right priority and a clear understanding of the correction required. The alarm is only activated when the problem occurs, and clears when the problem is solved. It just so happens that software features and techniques that improve alarm relevance coincidently reduce the number of standing alarms and alarm rates.[javascriptSnippet]

Greg: What is the key functionality?

Darwin: Alarm rationalization must be in agreement with ISA18.2 and the customer philosophy. A key question to ask is, “What if the operator does nothing?” If there is no appreciable detrimental consequence, there should be no alarm. In the analysis process, we document causes, consequences and needed operator responses that can be shown on a help screen.

[sidebar id =1]

To achieve this in a way consistent with the process requires “dynamic alarms” and more specifically “state based” alarms. Conventional alarm systems tend to have “static alarms.” Industrial process conditions are not static or independent of operating state, especially when there is an impending operator-preventable detrimental consequence.

Greg: I can appreciate this underlying principle in that my focus throughout my career has been on process dynamics and on incorporating this and other process knowledge, including the cause-and-effect interactions between unit operations and operating states to make control loops smarter.

Darwin: We can detect the operating state to determine what alarms are valid. For example, if the low-flow alarm was only needed to protect the pump and the pump is not running, you turn off the alarm.

Boiler systems offer more complex examples. Impending shutdown alarms (e.g., low steam drum level or low fuel pressure) are used if the operator can take quick action to raise other boiler steam rates, giving time to determine the cause and make the fix.

If a boiler shutdown is planned and an operator simply presses a final shutdown button, this is not alarmed. If you put this logic in your sophisticated alarm management system, it can lead to an operator remark such as, “Really, your alarm system is telling me I just pressed a button?” Additionally, as a boiler cools down, drafts are turned off, oxygen is high, drum level is low, and pressure is low—there are no alarms from a planned shutdown. While you can catch operator mistakes such as closing or opening the wrong valve, it’s difficult sometimes to distinguish between an operator error and a planned shutdown.

We have another tool to help operators deal with problem alarms. That is the shelving tool. It allows operators to temporarily shelve (suppress or silence) alarms that are either out of service or are having problems with the transmitter or signal. Use of alarm shelving is another way to reduce the standing alarm count.

Stan: How do you ensure alarm suppression is effective and safe?

Darwin: We have a shelving list with timers to re-annunciate. When the shelving is done for maintenance, creating an “out-of-service” alarm, there is generally no timer. There must be clear communication between operations and maintenance on the time requirements and service completion schedule and status. You must be sure the operator has other ways of detecting potential problems and enforcement of plant procedures for timely maintenance and putting the alarm back in service.

Greg: How do you maximize performance and reliability?

Darwin: We have screens that show logic and process conditions being monitored, as well as the current state and results of the logic. We also have screens that show alarms, settings, priorities and suppression. We have different views for operators and engineers. We have redundant servers and redundant object link embedding (OLE) for process control (OPC) connections with watchdog timers and checks for valid configuration that will generate all necessary alerts.

Stan: How do you use alerts?

Darwin: We use alerts to enable awareness of a problem and that the control system is trying to deal with the problem. For example, an unusually high or low flow manipulated by a level controller would first generate just an alert if the level controller is in automatic to give the control system time to rectify the problem. If the level subsequently becomes excessively high or low for a manipulated discharge flow that is high or low, respectively, an alarm is then activated with the idea that the operator can accordingly change one or more of the flows into the vessel.

Greg: What are some common alarm management mistakes?

Darwin: Some examples are:

Making alarm management reduction the goal,
Using “check the box” mentality in project execution,
Not realizing the operator is the customer,
No consequences for an alarm,
Alarming normal events or messages,
Multiple alarms for a single event,
Unclear or irrelevant alarm messages,
Using alarm settings to trigger safety instrumented functions (SIF) or other automatic actions,
Only focusing on bad actors,
Ignoring dynamic behavior, and,
Eliminating startup and shutdown from alarm metrics, and overly relying on metrics.

Stan: How do you assess your performance?

Darwin: Our Event KPI module provides a report on alarm floods. We use this report in particular to measure the effectiveness of dynamic alarm management.

We’ve been lightheartedly accused of not liking alarms. Actually, we really love quality alarms, and enthusiastically seek them out to increase process safety, capacity and efficiency. Quality alarms are relevant, unique and instructive, giving a defined practical operator response that mitigates the consequence. Our objective is to increase the quality alarms. It just so happens there is almost always a side benefit from increasing quality alarms of significantly reducing the alarm rate and summary list.

For more details, see the blog and ISA presentation, "Common Misconceptions and pitfalls in Alarm Management."

Top Ten Benefits of Happier Control Room Operators