Focus on resilience to cultivate Safety II mindset

Oct. 17, 2018

“In abnormal operation, there is no predeterminable path from cause to effect. This is a characteristic of complexity.” Steve Cutchen, U.S. Chemical Safety Board investigator, discussed the importance of human resilience in safely dealing with complexity.

When tasked to improve safety, most people try to minimize what goes wrong by providing procedures and having workers follow them. When workers don’t do that and something goes wrong, we tend to try to fix something, update the procedures and retrain the workers. “Some people call it, ‘Buy a tool, hold a meeting,’” said Steve Cutchen, investigator, U.S. Chemical Safety Board, at this week’s EcoStruxure Triconex User Group conference in Galveston, Texas. “What most people think about now is what we call Safety I. Some new concepts are leading us to Safety II.”

The U.S. Chemical Safety Board investigates incidents and makes specific recommendations to improve safety based on the results of those investigations. In one incident, a refinery had a puddle on the floor due to an overhead pipe dripping about once every five seconds. “It was puzzling because there was no pipe flange at the drip location,” Cutchen said. “A rather large group of refinery employees had gathered around, trying to identify the cause of the leak, when the pipe blew out, releasing an opaque cloud that engulfed the employees and drifted over the neighboring city.”

How we got to Safety I

Today, the accepted practice is to try to have as few things as possible go wrong. Management writes procedures and in their vision, people follow them. “People work as we imagine they do,” Cutchen said.

The current concept of safety dates back to the late 1960s, when management targeted people as the immediate cause of unintended outcomes. Management implemented programs like “Take 2 and Stop, Think, Act, Review (STAR). We did job safety analysis and had ‘stop work’ rules if a hazard was perceived,” Cutchen said. “The implication was, it’s always the person. It’s about reducing mistakes. To reduce mistakes, make people care more.”

Then managers started looking at the systems themselves, and latent problems. They implemented process hazard analysis, management of change, safety instrumented systems, integrity levels and layers of protection.

Next, they started to target non-routine as well as routine operations. “These include startups, shutdowns, online cleaning of heat exchangers, etc. using procedures from the shelf,” Cutchen said. “Again, work-as-imagined, with operational discipline.”

But unplanned things happen, in particular, abnormal operations bring unforeseen situations. “You can’t pre-specify every task, so abnormal operations have to be worked out on the fly, then we write a new procedure,” Cutchen said. “But system fixing becomes endless, like whack-a-mole. Work as-done doesn’t always match work-as-imagined. And at the times of highest risk, procedural operations aren’t there.

“In abnormal operation, there is no predeterminable path from cause to effect. This is a characteristic of complexity. This is why there is a necessity for resilience, and the genesis of Safety II.”

Simple, complicated, complex, chaotic

What is a complex system? “For example, you are a complex system,” said Cutchen. “You are the way you are for many reasons, including your genes, how you were nurtured and raised, your experiences, etc. To understand complexity, start with simplicity."

A simple system works in an obvious way, with a clear and unchanging cause and effect, and a predeterminable path. “You do X, you get Y, like a light switch by the door. It’s always there and if you flip it, you get light,” Cutchen said. The operator approach is sense-categorize-respond, to apply the best practice.

A complicated system has a predeterminable path, but understanding cause and effect requires analysis, which results in an expert procedure. An example would be making a cheesecake—there are many ways to do it. You could go to the store, buy a mix and follow the directions, or learn from your grandmother how it’s done. The operator approach is sense-analyze-respond, to apply a chosen good practice.

But a complex system has no predeterminable path, only guidelines and constraints. Cause and effect are only apparent in hindsight, “like raising children,” Cutshen said. “Suppose one of them asks permission to go to a friend’s house for a party and for the first time, there will be boys and girls there. You don’t know the cause and effect, so you ask questions and get more information before you decide what to do.” The operator approach is probe-sense-respond, to apply a unique, constrained practice.

Finally, a chaotic system has no predeterminable path and no visible constraints. Causes and effects have no apparent relationship. For example, most people would experience a tire blowout on the freeway as chaotic. The approach is act-sense-respond, applying a fast, novel practice to stabilize the situation.

From human error to human resilience

Safety systems started as ways to target human impact on unintended outcomes, using tools to prevent mistakes. “We realized the solution extended beyond the person, and it’s our systems that need to be improved, so we look for latent causes,” Cutchen said. “Then you realize the system-fixing is endless—it’s impossible to identify every error--provoking situation, to pre-specify every task. Work-as-done does not always match work–as–imagined.”

The key is learning from responses to complexity. “It’s not that people are the cause of things going wrong a fraction of the time. It’s that people are the cause of things going right almost all the time,” Cutchen said. “When people respond to complexity, they do the right thing almost all the time.”

The natural human response to complexity is resilience. People demonstrate resilience when they resolve conflicts, anticipate hazards, accommodate variation and change, cope with surprise, work around obstacles, detect and recover from miscommunications and mis-assessments, and close gaps between plans and real situations. “People are good at this stuff because people are resilient,” Cutcheon said.

Realizing that people are resilient and using that strength gets us to Safety II. Adverse outcomes are not the result of unusual actions in usual conditions, they are the result of usual actions in unusual conditions.

Contrast Safety I with Safety II: In Safety I, people are error-prone, a liability or hazard; in Safety II, people take actions based on decisions that seem correct at the time.

In Safety I, the goal is that as few things as possible go wrong. In Safety II, the goal is that as many things as possible go right.

In Safety I, the operational discipline is to execute work–as–imagined and reprimand for failure. In Safety II, leadership inspires resilient action and collaboration toward common goals.

A Safety I risk assessment is a classification of known hazards to reduce frequency and consequences. In Safety II, risk assessment includes searching for boundaries where procedural controls become ineffective.

Safety I investigations use hindsight to critique technical, human, and organizational failures. Safety II investigations recognize that hazardous activities normally go right—what was different this time?

Safety I is reactive: respond when something happens or is categorized as unacceptable risk. Safety II is proactive: anticipate complexity and set guidelines. “Strive to merge work–as–imagined and work–as–done,” Cutchen said.

Cultivate resilience

Complex systems require a resilient response. The refinery with the leaking pipe was Chevron’s in Richmond, California. Eighteen Chevron employees were caught in the opaque vapor cloud. All but one escaped just before the cloud ignited. The last survived, and six suffered minor injuries. In the weeks following the incident, nearby medical facilities received more than 15,000 members of the public.

“Instead of shutting down, the workers wanted to daylight the source of the leak so they could diagnose it. They were pulling insulation when it ruptured. It turns out the insulation bands were reinforcing the line,” Cutchen said. “Many of the employees survived by dropping to their knees and crawling along the curbing to find their way out of the cloud.”

To put your facility on the path to Safety II, “You need to implement traditional safety system improvements, error-preventing tools and strong process safety management systems,” Cutchen said. “But also pay attention to the hair on the back of your neck. Recognize and incorporate good lessons from work–as–done.

“To diagnose complexity, discover where procedures are not enough. Harness human response, and learn the reasons things go right. Collaborate to create system robustness—to create the ability to handle the unexpected. This is implementation of resilience.”

Start by finding a task where work–as–imagined can be improved by implementing lessons from work–as–done. “Maybe it’s your interlock bypass system, maybe something else where a procedure is not being followed,” Cutchen said. “Fix it, then find another. Make work easy to do right and hard to do wrong. That’s it.”

Get the best of the 2018 Triconex User Group Conference

The editors of Control were on site at the 2018 Triconex User Group Conference to bring you breaking news, innovations and insights from the event. Now that the event is over, the editors have put together an event report featuring the top news. Get your copy today.