6638f0b020712600094d0b3c Ct Maycartoon

Improving safety performance: compliance vs. competence—part 2

May 6, 2024
There are changes the process industry can make for better safety performance

This is part two of a series with Michael Taube. Read part one here.

Greg: We continue this series with Michael Taube, principal consultant at S&D Consulting in Houston with a subsidiary in New Zealand. Our discussion offers insights and critical details for making long-needed improvements in process safety, which can address the issue of plateaued total recordable incident rates (TRIR). 

Michael, what is the starting point for a successful outcome?

Michael: The most important issue to understand is that a successful outcome isn’t due to “perfect human behavior.” The system humans function in must be perfected to be (human) error-tolerant. A successful outcome doesn’t mean there are no errors or mistakes; it only means errors didn’t propagate and result in unwanted/undesired outcomes.  

A common situation illustrates this point. How many accidents aren’t the result of running a red light or stop sign? Alternatively, does every occurrence of running a red light or stop sign result in an accident? The answer is no, of course, because not all mistakes (violations) result in accidents. However, most process safety management (PSM) systems, if not all, are driven to eliminate mistakes.  

Greg: What path addresses this issue?

Michael: Safe or successful outcomes aren’t the result of just (physical) layers of protection, whether they’re interlocks, procedures and policies, etc. At their core, successful outcomes are a result of the training, qualification and depth of knowledge of those doing the work at the “sharp end.”  The point is that systems can never replace a thinking human being.  

Achieving this result (a thinking human being) depends on culture, as well as a solid foundation of staff-level workers, rather than managers or a safety organization, who make the organization’s success possible. Defense in depth is achieved by investing in intensive training and qualification of every staff member, supervisor and manager.  

Unless one is willing to invest in people, one must be content with poor performance and reliability for safety and financial matters. It’s only by investing in people—rather than policies, procedures and management systems—that an organization will achieve better outcomes. This is also a two-way street because these investments increase expectations of frontline staff and supervisors. The company invested in its employees, so the bar is raised for required performance.  

However, investment and long-term management commitment comes first. Many workers have become jaundiced with management. Management must show it’s truly committed to a new way of doing things.  

Greg: What mindset is needed?

Michael: Many organizations make the mistake of comparing themselves to their peer group. I call this practice “navel gazing.” To truly understand safety performance progress, one must look outside of their peer group to other high-consequence industries. For example, commercial aviation (because of recent events) and nuclear power generation, especially the U.S. Navy’s submarine force, are two high-consequence industries that can provide a reference point for comparison.  

Greg: What are common misconceptions?

Michael: The biggest misconception, especially by safety professionals, is that 80% of all accidents are the result of human error. This misconception is based on nearly 100-year-old research by an insurance company investigator, who was more motivated to protect his employer’s interests, rather than the workers of the insured organization. This perception persists despite research and insights to the contrary, and is due to consistency and commitment bias, which is a commonly understood issue in psychology. So long as human error is cited as the root cause for incidents, safety performance progress will stall.  

A second misconception closely associated with the first is how the Swiss cheese model (SCM), as it is usually depicted, creates an illusion and skews one’s thinking into believing that accidents are the result of a linear sequence when they are not. Specifically, the SCM leads to a perception that the most proximal error is the cause for the event. Deeper investigation, if pursued, often reveals that systemic issues have far greater influence on outcomes, and shows that the path through the “holes” in the layers of defense takes a very tortuous path.  

Another misconception, and where PSM really gets it wrong, involves latent hazards. PSM is mostly focused on new designs and planned changes to existing designs and operations. What it fails to address is drift or the slow degradation of processes, equipment and facilities that creates hazards. This gap is exacerbated by the more-bad-advice philosophy persuading us to do more with less, which decimates field staff/supervision until there are too few qualified people available to identify latent hazards that inevitably manifest. There’s also the collateral practice of deferred maintenance, which is widely recognized as a bad practice, but occurs more often due to too few staff.  

Greg: What’s the wrong answer?

Michael: Government regulators, industry professionals and vendors expended a great deal of effort to improve safety performance and outcomes for the process industries. While they made a lot of improvements, incidents, accidents, injuries and fatalities continue to occur. This should make it clear that the process industries have reached a point of diminishing (even negative) returns with their current safety practices. Doing more of the same is the wrong answer; it won’t improve safety outcomes any further.  

A different strategy is needed—one that addresses the realities and deficiencies of the systems, procedures and qualifications of people in the process industries and the cultures they operate in. To be clear, strategy is not a mission statement or goal. It’s actions based on well-defined circumstances, situations or events that will overcome challenges. Strategy must recognize the constraints and realities and identify the critical ones that must be improved right away.  

Also, piecemealing probably won’t work. Often, parallel activity will be required to make the organizational and cultural changes needed to get beyond the 10-7 ceiling. Furthermore, there must be a brutally honest assessment of what’s working, what isn’t and why. Management must address the elephant in the room and sacrifice any sacred cows to prevent the necessary changes. This requires true leadership, not merely management in every organization.  

Greg: What’s the right answer?

Michael: Management must recognize that safety, like an organization’s other emergent properties such as operational excellence, reliability, profit, etc., is not something that can be managed—it is pursued. The inputs can be managed, and the system can be nominally redesigned to generate desired outputs, but the outputs themselves can’t be managed. So, to quote Dr. Stephen Covey, they must “begin with end in mind,” and then work backwards to address the system and its inputs.

The best examples are high-reliability organizations (HROs), which are characterized by at least two obsessions. First, they have a chronic sense of unease—the feeling that something is wrong. Second, they’re “learning organizations,” that glean at least as much, if not more, from successes than failures. They view accidents, incidents, etc., as learning opportunities, rather than calls for firing squads. The dichotomy of learning more from success than failure is obtained from the debrief. It’s reviewing what went right, what went wrong, and where the holes are in the processes and procedures. The debrief process lets HROs perfect their processes, so errors don’t cause unwanted outcomes. If the process industries performed debriefs after every maintenance operation, turnaround or capital project, there would be a significant improvement in all emergent properties, including safety performance.  

An HRO culture also has these personnel characteristics: team members looking out for each other, and leaders developing, mentoring and investing in training their reports. These characteristics come from the investment made in training and qualification of staff and supervisors and the collateral responsibility and trust placed on them by their management. Such investment and trust don’t come easily for management due to the fundamental and often unrecognized assumptions they have about workers.  

Finally, another perverse practice that must be addressed is hiring only the best. If one examines a typical Gaussian curve, the best (say, the top 20%) comprise only a very small fraction of the total pool of potential candidates. A very large pool of potential candidates are eliminated from even being considered. As one of my professors once said, “C students do just as good a job as A students.”  Also, Adm. Hyman Rickover, the “father of the Nuclear Navy”, was asked about how he recruited the best, and he replied: “The best already have good jobs; I hire men with potential and train them.”

Greg: I’ve found that taking the best operator actions and implementing them in procedure automation for startups and in state-based control for equipment problems prevented shutdowns, particularly for compressor, column and reactor control. The best proactive, preemptive actions were found by studying and discussing operator actions, and the cause-and-effect relationships gained from first-principle dynamic simulations. Dynamic models and assistance from operators, process, automation and maintenance professionals were important for their success. I also realized that, if operators said it wasn’t possible to automate the actions needed, it was a greater reason and justification for procedure automation, particularly since shutdowns and startups posed more safety issues. For example, procedure automation eliminated three trips and surges during startup of a huge axial compressor. I also witnessed better measurements and middle-signal selection with three sensors, and transmitters eliminated five reoccurring shutdowns per year in a complex process with extensive recycle streams pushed way beyond its original design capacity. 

We all make mistakes but we rarely hear about them. Individuals don’t even think about, and companies don’t permit, doing publications and presentations on mistakes. The following guidance by Hunter Vegas in “Tip #5: Admit Your Errors” of our ISA book 101 Tips for a Successful Automation Career is critical:

“Every engineer in the history of the world has made mistakes. In fact, the best engineers are the ones who have made lots of mistakes—and learned from them. Take time to understand what went wrong and make changes to ensure that it will never happen again. Better yet, go one step further and tell others about your mistake so they can avoid it themselves.”

I’ve admitted in my publications and presentations the mistakes I’ve made that were often initialed by mistakes made by experts that are still out there. Most notably is the guidance to not use positioners on fast loops, and to use boosters without positioners when an extremely fast response is needed like in compressor surge control. Not having a positioner on a diaphragm actuator can result in valves not opening up to 40% of controller output from bench settings and greater adverse effects from backlash, stiction and shaft windup. If a booster is used without a positioner on a diaphragm actuator, positive feedback is created that will result in unstable operation, where the valve slams shut due to fluid forces. I’ve documented this in most of my books since 2010. I also agreed in two applications to use the rotary valves requested by the plant because they were in the piping spec, and had a high capacity and tight shutoff. The smart positioner said the valves were responding fine, but the positioner was being lied to due to slop in linkages and shaft windup. When I put a travel gauge on the ball or butterfly disk, it showed that valves didn’t respond to signal changes of less than 8%. I also found that many plant loops had limit cycles due to these valves. These problems still exist today. Leaders in setting standards for valve specifications are not willing to include the specification of valve resolution and response time requirements and to add words of caution in existing emphasis on capacity and leakage. I’ve tried to alert users to the consequences in my publications and what I wrote this past year in ISA Technical Report ISA-TR75.25.02 Annex A.

The biggest nemesis to automation performance and operator actions are the deadtimes prevalent in the process industries, while negligible in machine control. Humans find it difficult to make the right correction when the effect of past and current actions are delayed and consequently not seen. A simple, “future-value” block using current deadtime can help operators deal with the deadtime issue.

I enjoy the TV series Heartland” that for 17 seasons showed the personal relationships between people and horses on a ranch in a beautiful Canadian countryside setting. Most of the drama stems from people, particularly guys, not discussing their problems. It may take nearly a whole season of episodes for people to completely resolve the many related issues from a lack of communication. There is a lesson there for all of us.

About the Author

Greg McMillan | Columnist

Greg K. McMillan captures the wisdom of talented leaders in process control and adds his perspective based on more than 50 years of experience, cartoons by Ted Williams and Top 10 lists.

Sponsored Recommendations

IEC 62443 4-1 Cyber Certification – Why ML 3 is So Important

The IEC 62443 Security for Industrial Automation and Control Systems - Part 4-1: Secure Product Development Lifecycle Requirements help increase resilience for control systems...

Multi-Server SCADA Maintenance Made Easy

See how the intuitive VTScada Services Page ensures your multi-server SCADA application remains operational and resilient, even when performing regular server maintenance.

Your Industrial Historical Database Should be Designed for SCADA

VTScada's Chief Software Architect discusses how VTScada's purpose-built SCADA historian has created a paradigm shift in industry expectations for industrial redundancy and performance...

Linux and SCADA – What You May Not Have Considered

There’s a lot to keep in mind when considering the Linux® Operating System for critical SCADA systems. See how the Linux security model compares to Windows® and Mac OS®.