Getting back to the user group meeting...

April 17, 2007

A New Paradigm for Process Safety Metrics for Major Loss Prevention

This is the third book of the Gospel on Alarm Management, presented by Ed Marszal of Kenexis Consulting. Ed admitted that his expertise is SIS rather than ASM or Alarm Management, but the three areas are intimately interlinked.
The New Process Safety KPI
"Current best practice for managers in the process industries has relied on measurement of injury accident rates as a proxy for all safety prerformance, inclu...

A New Paradigm for Process Safety Metrics for Major Loss Prevention

This is the third book of the Gospel on Alarm Management, presented by Ed Marszal of Kenexis Consulting. Ed admitted that his expertise is SIS rather than ASM or Alarm Management, but the three areas are intimately interlinked.
The New Process Safety KPI
"Current best practice for managers in the process industries has relied on measurement of injury accident rates as a proxy for all safety prerformance, inclusive of major losses such as catastrophic explosions, fires, and toxic releases. These measures are not well correleated with actual performance against major hazards," Marszal revealed. "As a result, some proxies must be developed for use as key performance indicators for major losses that occur frequently enough to be measured which, if neglected, will ultimately result in a major loss." Process indistry safety preformance CAN be improved, but major hazards are still a problem. It isn't the most unsafe industry-- that would be farming, followed by mining. Upper management is disconnected from operations. They have no feel for day to day operation and important information isn't available or can be hidden. Actionable metrics would allow oversight. Typical safety metrics are not effective. "Slips, trips and falls" are just not well correlated to major losses. Better metrics are essential. Predictive information can be developed from process history, and collection and presentation are possible with existing tools. Here's "Management 101": Improved major loss prevention through improved management. The "Baker Report" (on the BP Texas City disaster) slammed the current "hard hat" safety metrics. Those metrics are just not going to tell you if your plant is ready to blow up. The only one that comes close is the "near miss report rate." But it requires people to make subjective reports. It is hard to tell whether a high near miss rate means that the plant is more dangerous, or just that at that plant people fill out reports better. The same thing is true with "unsafe activity reports." What metrics should we have? New metrics are essential to improvement. New metrics foreshadowed in recent standards, like ISA84 and hopefully ISA18.
  • Must predict major loss issues. Those are issues that could lead to major losses but stops short.
  • Must be clearly defined and consistently applied
  • Must be relatively frequent events
  • Automatic collection and reporting beneficial.
Here's the Accident Causation Model: Hypothesis: Most major accidents happen because multiple failures occur; starting with an initiating event  Failure: Initiating event-->>Failure: Propagating event--->>Failure: Propagating event--->>Accident Ed's New KPIs:Scaled Demand Rate: the actual demand rate divided by the expected frequency of occurrence. The target is 1.0 and higher is unsafe. The expected demand rate can be obtained by your existing PHA-- LOPA requires this data. So you should be able to get the data from your Layer of Protection Analysis. These demands can be automaticaly logged, tracked and reported. Demands are typically historized; they generally represent critical alarm functions. Safeguard Unavailability: the probability that your safeguards are going to work on demand. This is the actual unavailability divided by the target unavailability, with a greater fraction than 1.0 being unsafe. Unavailability is defined as the fraction of time a safeguard can not perform its intended function, either by being in a failed state or by being bypassed. The required safeguards need to be listed in good Process Hazard Analysis (LOPA). You need to log or historize the functional test of equipment and time in bypass. These metrics can be applied on all levels and allow "drill down." So a CEO receives a set of safety KPIs as part of a monthly briefing... he sees that the Scaled Demand Rate of his enterprise is over 1.0. Drilling down he finds which plant has the highest contribution. The CEO calls the plant manager, who drills down further and can find out which process unit is the problem. The operations supervisor and the unit engineer drill down even further, and finds out that the problem is located in a particular process system, say, a "separator low level shutoff"-- and then they can look at the design basis and find out why the problem exists, repair the problem and thus report a new Scaled Demand Rate for the enterprise of less than 1.0 back to the CEO. The only major limitation to these new KPIs as an approach is the inability to measure accidents that are the direct result of the initiating event-- where you don't have a causation chain. Some other mechanism is going to be necessary to detect and predict those events.