Control Talk: Failure identification and management

Greg: We’re fortunate to talk with Diane Doise, who’s worked as a process automation engineer for 25 years and is an active member the Society of Woman Engineers. We last talked with Diane in February 2019 for a Control Talk column titled, “How to support gender diversity.” In this column, we discuss how to minimize the occurrence and consequences of failures. Diane, what are the fundamentals of what’s needed to proactively address failures?

Diane: We continue to add instrumentation and automation to our processes to automate operator action, so operators can focus on more value-added tasks. Operations rely heavily on operators to respond to failures. As we increase automation, we must also increase failure identification and automate failure response.

In modern automation, control scheme/algorithms are more complex. Increased availability of instrumentation that can improve control schemes increases the importance of properly identifying failures and coding appropriate responses.

One tool that can assist with failure identification and assessment is failure mode and effects analysis (FMEA). FMEA is a qualitative tool that provides a structured technique for failure analysis. I was introduced to this tool more than 20-years ago during my Six Sigma Black-Belt training. FMEA was used to identify high-probability failure modes that could affect production rates. The tool identifies failures, estimated probability of failure and operational effects. It provides direction for teams to focus their efforts on high risks and cost failures.

Greg: How is the FMEA tool used?

Diane: FMEA is designed to identify and address potential problems/failures before they occur. It can be used on an existing process, a proposed process or process modifications. It’s a systematic way of identifying potential failures (instrument, equipment, operator), the probability of occurrence, severity of failure and prevention recommendations. There are many examples and tutorials on the web.

Greg: What is an example of the need for FMEA?

Diane: I support an automated program that starts up and shuts down a series of reactors in a chemical production unit. With the high turnover rate, less-experienced process operators became dependent on the program and have limited failure-response knowledge. I’ve often used what-if analyses for failure identification, but due to the quantity of loops the program controls and multiple process states, a what-if just wasn't going to cut it. I remembered the FMEA tool from my Black-Belt training.

Greg: Once a failure is identified and assessed, can proactive actions be detailed by FMEA to maximize process performance and safety? Can these be added to state-based control and procedure automation defined in ISA ISA-TR106.00.01 and ISA-TR106.00.02 technical reports?

Diane: FMEA asks teams to subjectively estimate failure severity, occurrence and detection. This gives us a risk priority number, which guides us to where we need further controls. Because it’s subjective, I like to use 1, 5 or 10. This provides good distinction and identification of where we should spend our efforts. Once the team determines where actions are needed, we can brainstorm potential controls. Nextr, FMEA asks us to rank proposed actions, and tells us if each action provides the needed risk reduction. Not only does the tool identify potential failures, but also it develops an actionable list to improve automated responses to failures.

Greg: How do you improve control schemes using FMEA?

Diane: It identifies the parts of a control scheme that need coded failure response. It allows us to purposefully think about potential failures and address them. This is done in the design state, instead of waiting for the failure to occur and then updating the controls. I think all automation engineers prefer identifying a failure during working hours rather than due to a midnight call that something’s gone wrong.

Greg: I advocate intelligent signal selection to find and reduce measurement failures, procedure automation to proactively deal with abnormal operation, and an enhanced PID for protecting against instrumentation failures detailed in the following excerpt from ISA-TR5.9-2023.

The goals of signal selection are to primarily protect against failures and, in the process, improve accuracy and the 5Rs of signals, namely reliability, resolution, repeatability, rangeability and response time. With the advent of increasingly smarter digital transmitters, most of the concern is often the sensor type and installation. Thus, redundancy of sensors and independence of installation that eliminate common-mode failures is the first step. The next step is to decide whether to use Lo, Hi or Middle signal selection (commonly referred to as Median Select) and/or supplement it with signal averaging.

Where profiles aren’t uniform and lead to unpredictable nonuniformity and noise, location of sensors at different points and signal averaging may help. For example, there’s considerable unpredictable uniformity and noise in concentration, phase, temperature or velocity where there are piping discontinuities or plug flow (little to no back mixing). The most common example is an averaging pitot tube. For fluidized bed reactors, temperature sensors are installed in a pipe carrying the lead wires that traverses the reactor. Several of these pipes may be installed and the average computed for each pipe with Hi or Middle signal selection of averages to determine the PV used for temperature control. Hi signal selection, where each sensor signal is compared to average, may also be used to rule out single sensor failures.

To promote independence, multiple sensors should be installed with separate process connections. Differential pressure and pressure transmitters shouldn’t share the same impulse lines. Separate nozzle connections are used to help maximize independence. Temperature sensors aren’t installed in the same thermowell, which could be coated or have a loose fit or excessive vibration or other common-mode problems. Meanwhile, pH sensors should be separate and not share the same reference electrodes due to many possible sources of errors and failures.

The most recognized failure is downscale, possibly associated with transmitters failure or loss of signal, which leads to frequent use of Hi signal selectors. With digital transmitters and signals, this may be less of a concern. For a wireless system, there may be a loss of updates, which may be addressed by a Hi signal selection. Downscale failure may have more safety and environmental concerns stemming from excessive concentrations, levels, pressures and temperatures.

Middle signal selection inherently protects against a single failure of any type, including the last value, which is extremely difficult to detect and deal with automatically. Middle signal selection is particularly important for pH measurement because it reduces the common effects of noise, drift, coatings, glass electrode premature aging (for example, caused by high temperatures or strong acid concentrations), dehydration and abrasion, and reference electrode contamination. Middle signal selection offers a distinct advantage of ignoring a slow sensor. This is particularly advantageous for pH measurement since significant aging, dehydration and coating of the glass electrodes can increase an 86% response time from 2 seconds to 2,000 seconds.

At one large chemical company, Middle signal selection was used on all pH loops. Middle signal selection can also offer simple, effective diagnostics that enable prompt maintenance of a defective sensor to retain full, inherent protection. Middle signal selection was also used on all measurements in several large complex continuous plants, reducing trips due to signal errors from four to less than one per year (each trip costing $10 million or more). In general, the most hazardous operation occurs during shutdown and startup, posing safety as well as monetary concerns.

While maintenance and operations need to see each sensor signal, manual signal selection by individuals is often based on favoritism that’s not attentive or fact-based. Manual signal selection should be limited to situations where a sensor has a confirmed problem or is being serviced, such as calibration or replacement).

Procedural automation, also known as state-based control, takes advantage of process knowledge and procedures to proactively manipulate process equipment and controls. This includes changing the PID mode, output and setpoint to position the PID to handle unusual situations in continuous processes and learning from benefits in procedures applied to batch processes. Procedural automation is used to achieve faster and more efficient startups, and transitions to different product mixes and shutdowns. It’s also used to prevent disruptive responses to abnormal conditions, which are often associated with equipment or prime mover failure.

The open-loop backup needed by axial and centrifugal compressors to prevent or recover from the surge is an example of critical procedure automation because the surge cycles are too fast and large for feedback control, potentially causing equipment damage and process shutdowns. A high rate of change of suction flow, or an approach to the flat maximum in the compressor characteristic curve for a given speed or suction damper position, is used to proactively jump open and hold open a vent or recycle valve. This is done to increase the suction flow enough to get back to a negative slope on the characteristic curve. A fast rate of change calculation with a good signal-to-noise ratio can be created by simply using a dead time block to create an old PV value. The new PV value is the input to the block minus the old PV value. This is the output of the block divided by the block dead time and is the rate of change. The dead time is chosen to provide a good signal-to-noise ratio with enough predictive capability. If this rate of change is multiplied by the minimum arrest time, you have a useful future value. Notably, this future value can provide the fastest possible setpoint response. The CO is set at its corresponding output limit and held there until the future value indicates the PV has reached setpoint, at which time the CO is set to its final resting value, and the PID returns to the auto mode. A future value pH can also be used to momentarily increase reagent addition to prevent resource conservation and recovery act (RCRA) pH violations.

The enhanced PID provides protection against failures of the manipulated variable (for example, stuck valve or frozen secondary loop PV) and the controlled variable (for example, analyzer or wireless measurement) since external-reset feedback is used, and the input to the one-time execution of the integrator is the actual manipulated variable.

Top 10 signs of a successful FMEA

10. Unknown failures become extinct.

9. ISA Standards and Practices ask you to develop a technical report on FMEA.

8. A portrait with you holding your Excel spreadsheet is hanging in the control room.

7. Operators name their favorite Cajun recipe after you.

6. You become friends on a first-name basis with the plant manager.

5. 60 Minutes asks you to do a segment on FMEA.

4. You can work remotely from a lanai in Hawaii.

3. You solve problems before they develop.

2. Operators miss you.

1. Full nights of uninterrupted sleep.