Center on Reliability

Forget preventive maintenance. Today's uptime requirements call for an entirely different approach

June 19, 2003

16 min read

Maintenance has come a long way since the "fix it when it breaks" mentality of the 1940s and the preventive maintenance philosophy of the 1970s and 1980s. Today, the new world of reliability-centered maintenance (RCM) calls for computers, software, and sensors to achieve maximum plant availability and reliability at the most effective cost (Figure 1).

A big surprise is preventive maintenance (PM) actually can be bad for certain systems! Not only is it expensive, but it doesn't work well with modern, high-tech equipment. So instead of PM, we are entering a whole new world of condition monitoring, loop analysis, and predictive maintenance.

Figure 1: Shoot for Uptime

Reliability-centered maintenance aims to find the correct balance between maintenance spending and process operations. According to Honeywell, only 5% of all plants have achieved the correct balance. (Source: Honeywell)

Although RCM often works best with fieldbus architectures and high-level asset management software (because they can more easily obtain and process data) the techniques involved are not beyond the reach of a typical process control user, even those with legacy control systems. This is because it's not how you acquire the data that's important, it's what you do with it.

Military Maintenance

Modern maintenance technology procedures began years ago in the military. The recent Gulf War II proves beyond a doubt just how effectively these techniques work.

"I used vibration monitoring and maintenance management [15 years ago] in the propulsion power plants of U.S. Navy vessels," says Robert Rosenbaum, an automation consulting engineer in American Canyon, Calif. "The use of portable handheld vibration instrumentation was so successful that the Navy purchased several permanently installed vibration monitoring systems for its fleet."

Rosenbaum also says he's familiar with RCM as once used by United Air Lines to prevent failure in commercial aircraft systems.

Vibration monitoring is just one part of an RCM program. The overall RCM process includes procedures to determine the functions and performance standards of an asset, what causes it to fail, what happens when it fails, and what can be done to prevent failures (Table I).

Table I: First Answer These

Before considering reliability centered maintenance, determine the functions and associated performance standards of the asset in its present operating context:

1. In what ways does it fail to fulfill its functions?

2. What causes each functional failure?

3. What happens when each failure occurs?

4. In what way does each failure matter?

5. What can be done to predict or prevent each failure?

6. What if a suitable proactive task cannot be found?

Source: Aladon

The commercial airline industry was the first to realize the benefits of a maintenance decision-making process. According to a white paper by Aladon, this led to the development of the MSG3 process in the aviation industry; in manufacturing, it's just called RCM. (For a complete description and multiple articles on RCM, go to www.aladon.co.uk.)

One of the most startling developments to come out of RCM studies involves preventive maintenance. "Many people still believe that the best way to optimize plant availability is to do some kind of proactive maintenance on a routine basis," says Aladon. This assumes one traditional view of failure (Figure 2), where devices fail as they enter a wear-out zone after a certain period of time.

This may have been true 30 years ago, but equipment is much more complex these days. Now, we identify six patterns of failure to deal with (Figure 3). According to Aladon, studies on commercial aircraft showed only 4% of failures conformed to pattern A, which contradicts another widely held belief that most products have a "bathtub" failure curve. Only 2% conformed to B, 5% to C, 7% to D, and 14% to E.

But here's the startling development: a whopping 68% conformed to pattern F--high infant mortality followed by random failures.

"These findings contradict the belief that there is always a connection between reliability and operating age," says Aladon. "Nowadays, this is seldom true. Unless there is a dominant age-related failure mode, age limits do little or nothing to improve the reliability of complex items. In fact, scheduled overhauls increase overall failure rates by introducing infant mortality back into otherwise stable systems."

Commercial aircraft and process control systems both use similar systems: pneumatics, electro-hydraulics, servomotors, networks, control valves, pumps, miles of wire and cables, networks, computers, electronic controls, and flow, temperature, level, and pressure sensors.

Figure 2: Good Old-Fashioned Breakdowns

Thirty years ago, devices were understood to fail when they wore out. (Source: Aladon)

Some process industry data seems to correspond exactly with patterns E and F. "The majority of failures in valves and control loops is not predictable and the probability of failure does not increase with time," says Lane Desborough, manager of loop management services, Honeywell Industry Solutions, Thousand Oaks, Calif. "There is little evidence that valve failure can be predicted reliably based on accumulated stem travel alone."

If failures of process equipment are random, so much for preventive maintenance. What do we do now?

Three-Pronged Attack

Preventive maintenance isn't completely dead, of course. Rosenbaum, who bought into RCM 15 years ago, still believes in PM. "It heads off trouble before it starts, and the return is well worth the money invested."

Certain equipment does have a wear-out zone, and prudence dictates that it should be maintained before it breaks.

"We count valve operation cycles automatically, using our data historian, an OSI PI system," says Don Erb, manager of production planning and information, Ciba Specialty Chemical, McIntosh, Ala. "When the cycle count reaches a certain trigger value, the valve is scheduled for maintenance during the next opportunity."

Ciba's valve performance is evaluated based on historical data. "After we have one fail at a number of cycles, we check valves in similar service next time before they reach the same number, with the objective of service before failure occurs," explains Erb. "One major objective of our plant is reliability improvement. Our reliability has been improving over the past year, and although this is certainly not the only program in place, it is contributing."

Many process plants have developed similar PM programs for valves, only to find that 30% of the valves that are taken apart for preventive maintenance have absolutely nothing wrong with them (Chemical Processing, November 2001). As Honeywell's Desborough points out, this is probably because these programs drive maintenance actions based on device usage, not on control loop performance degradation.

Therefore, what we need is a better way to determine when assets actually require maintenance. This requires a three-pronged attack:

Sensors, tracking systems, or on-board diagnostics on each asset that help identify the presence of a problem.
A data acquisition system to collect asset information.
Software to analyze the data, determine that a problem exists, and suggest maintenance procedures to correct the situation.

All of the above pieces are readily available on the open market. Plants with fieldbus-based hardware, a frameworks-based control hierarchy, and asset management software already have the infrastructure in place to do RCM. Those with legacy systems can buy the necessary hardware and software and install it on their process. As with all things in this industry, you can get the RCM capability you need by spending anywhere from a few thousand to a few million dollars.

Sensing Problems

In olden days, supervisors would dispatch technicians to the field to check on problems. But not anymore. "The days of having instrument technicians run to the field every time there is a problem are long gone," says Rami Mitri, director of asset optimization, New England Controls, Mansfield, Mass. Downsizing and reduced budgets have taken a toll on maintenance operations in many plants, he says. As staff and budgets decrease, equipment problems increase. "Many customers neglect to link downsizing to reduced maintenance on critical equipment that can either shut down or delay production."

To overcome problems caused by downsizing and budget cuts, Mitri says end users have to adopt new, enabling technologies for maintenance. In many cases, this means being able to identify problems before they occur, so maintenance dollars go further.

Several ways exist to determine if a device or system is having problems:

Manual observation (leaking, making noise, boiling over, etc.).
Condition sensing (running hot, vibrating, losing pressure, etc.).
Internal diagnostics (the device itself detects problems).
Performance analysis (valve sticking, slow control response, hunting, etc.).

PG&E, the giant utility in California, uses manual techniques to check its gas distribution operations, says Brian Steacy, general manager of DST Controls, Benicia, Calif. DST supplied PG&E with a PDA-based data acquisition system.

"PG&E opted out of fully automating its data acquisition because it would have been cost-prohibitive and, more importantly, not entirely safe," says Steacy. "Much of PG&E's compressor station instrumentation is too far flung to be hardwired, and many of the thousands of gauges that must be read daily are old, mechanical, or otherwise too costly to match up with transducers or hang on a network."

Figure 3: Space-Age Failure Patterns

Today, complex devices are understood to fail according to one of these six patterns.

Using a handheld system provides regular human presence and keeps an eye on things to help avoid disasters, such as leaking compressor lubricant, unusual conditions, and graphic evidence that a cat had strayed into a compressor cooling fan. "The fan kept running, so the alarm wasn't triggered, but visual inspection revealed the necessity to shut the fan down for cleaning, repair, and balancing," says Steacy.

Wandering cats aside, manual observations are becoming the solution of last resort these days. Therefore, users must seek out ways to detect problems remotely, or predict them based on operating conditions.

One of the best ways is via condition monitoring, as explained in "Prevent Failure" [CONTROL"November '02]. That article explains how vibration analyzers and sophisticated data analysis can predict equipment problems in advance.

Condition monitoring, of course, often requires sensors to be installed on equipment to detect the conditions. Fortunately, this is getting much easier for end users. Many devices now come with HART or fieldbus interfaces, both of which can transmit diagnostic information.

Manufacturers also are building diagnostics into various devices, such as power supplies. The S8VS power supply from Omron Electronics, for example, can monitor percent usage and available life remaining.

For devices that do not have embedded diagnostics, users can install the necessary sensors on vital assets. It's not something you would want to do on thousands of devices in a typical plant, but condition sensors can be installed on assets of particular interest. If a certain pump, valve, compressor, or similar device is failing and causing problems, it could be fitted with vibration or voltage sensors on a permanent or temporary basis until the problem is diagnosed. For example, Allen-Bradley's MachineAlert relays can be installed in a control panel to monitor phase, current, temperature, and motor rotation in any motor control application.

It's also possible to make manual vibration measurements on certain key machines. For example, SKF's MicroVibe portable vibration test and measurement instrument can be used with a PDA; this lets a technician run out into the plant periodically to check critical systems. The Ultraprobe 1000 from UE Systems has its own on-board recording, logging, and application software for ultrasonic condition analysis locally or later at a computer.

When buying new or replacement equipment, it's a good idea to seek out devices that have built-in sensors and embedded diagnostics.

"Investing in assets that can communicate when they require attention, such as maintenance or calibration, is critical to proactive strategies," says Mark Bitto, product manager of asset optimization products at ABB, Wickliffe, Ohio. "Intelligent field devices, control systems, workstations, and network hardware all contain a rich set of embedded diagnostic information. Unfortunately, unless the device is enabled to report these health conditions, the information will go unnoticed for long periods of time."

This means all that condition sensing and diagnostic data needs to be acquired for further analysis.

Plucking Data by PDA

At one end of the data acquisition cost spectrum, PDAs are rapidly replacing notebooks and clipboards in the maintenance arsenal. PG&E technicians, for example, use them to record daily readings and make on-the-spot checks of equipment.

Steacy says software in PG&E's PDAs can check the current reading to see if it is within limits for each device. "If the operator makes an entry the system deems out of range, DST's dBehold software will alarm and prompt for data re-entry," he explains. "This prompts the technician to make a visual inspection of the meter to determine if it was a transcription error or if the meter is having a problem. If an equipment fault is discovered, the tech can flag it for maintenance." Maintenance departments everywhere are using similar handheld PDAs and laptop computers.

Many maintenance departments realize the benefits of automated maintenance technology, but simply can't afford it, so they stick with their manual systems.

"We are looking at replacing any failed transmitters and new installs with fieldbus transmitters, mainly because of wiring and future advances in information provided," says Matt Smith, process control supervisor, Amalgamated Sugar Co., Twin Falls, Idaho. "We looked at Emerson's AMS, but couldn't justify the per-point costs because we have about 2,500 transmitters and 1,000 control elements. We employ 10 instrumentation technicians and are currently implementing an electronic work-order maintenance management system. I guess the bottom line is, we have the labor to do it manually."

When we asked end users for inputs on how they were acquiring data for maintenance purposes, several agreed with Smith, telling us they simply could not afford to install fieldbus instrumentation and asset management software.

Few are as lucky as James Loar, engineering group leader at Ciba Specialty Chemicals, Newport, Del. "We are in the process of installing a system to monitor reliability of process control and instrumentation," he told us. "We are installing a system with Foundation fieldbus, DeviceNet, Profibus, and AS-i. A new corporate standard for control systems forced us into the luxury of having this capability."

Fieldbus, DAQ and Asset Management

Clearly, the dream solution for an RCM system is fieldbus instrumentation connected to a distributed control system (DCS), an asset management software package, loop analyzer, performance analyzer, and a computerized maintenance management system (CMMS), all of which costs only slightly less than one of Saddam's gold-plated bathrooms.

"Predictive maintenance technologies that integrate with the process automation system offer distinct advantages to users," says Stuart Harris, vice president of Emerson Process Management's Asset Optimization Div. (www.emersonprocess.com). "We see a clear trend toward operators playing a first-line role in reliability and maintenance. Therefore, the ability to send predictive equipment advisories to operators is very valuable. Emerson accomplishes this goal in its PlantWeb digital plant architecture, which combines process automation with asset management."

All the major process control vendors have their own asset management/CMMS software, or they form alliances. Honeywell integrated Asset Manager PKS and its Loop Scout directly into its asset management system, Experion PKS. Invensys combined its Archestra framework architecture with its Wonderware HMI/SCADA software and Avantis asset management software. ABB has allied with Accenture, and Integraph uses AM software from Meridium. Suffice to say, if you have a process control system from a major vendor, you can get everything you need to do RCM.

Much less expensive solutions are readily available. For example, at National Manufacturing Week, we saw a number of vendors--HMW, InduSoft, Applied Data Systems, Advantech, H-P, and Siemens--combine their products into a data acquisition system just by plugging them together via Ethernet and .Net, and writing a little software. It took two weeks to set it up, said the InduSoft programmer who put it all together.

MTS and National Instruments put together a joint solution that combines MTS' noise and vibration I-Deas software with NI's I/O cards and LabView software. It can sample, process, and analyze up to 5,000 channels of vibration data.

Much of what a maintenance department needs is already contained within a system's real-time database or its process historian. Temperatures, pressures, control signals, and a host of other process data can be used by software to analyze loop performance and detect problems.

You can get the rest of the data you need by installing condition-based monitoring systems on certain assets, or dig diagnostic data out of your HART and fieldbus instrumentation. If you can find it, that is.

"Last year, every major DCS player introduced HART I/O, either their own or as a reference," says Louis Szabo, vice president of Meriam Instrument (www.meriam.com). "There are HART multiplexers to bring HART diagnostic information into existing systems." The problem is, HART device descriptions (DDs) supplied to the HART Communication Foundation can't handle all the capabilities of the devices. "So vendors are coming up with different schemes. The HART Communication Foundation, Fieldbus Foundation, and Profibus International are promoting enhanced or extended device descriptions (eDDs) while Invensys and European-based vendors are supporting FDT/DTM alternatives."

In other words, the information is out there, buried inside HART and fieldbus instrumentation, and all you have to do is extract it. It's not always easy, especially in mixed legacy systems.

"I'm using Foundation fieldbus [FF] in our process plant, and I have problems with the instruments due to the architecture," laments Jorge Cano, process control engineer at MetMex Penoles, in TorreÂ³n, Coah, Mexico. "We have a Rockwell ControlLogix PLC, but all interface to FF is with National Instruments FF Configurator software." Cano is using Rosemount instrumentation and Rockwell RSView 32 software. "The results are bad. The maintenance costs are very high, process improvements are difficult to implement, and failures in our process and equipment occur many times. I have plans to migrate to Plant Web with our platform."

With all due respect to the equipment named, such problems are not rare and are not caused by the equipment. We hear from many engineers that bringing up a fieldbus system of any kind can be a bear. But there has been progress. Perhaps in a year or so there may be better software available that will let you obtain the necessary maintenance information from HART and fieldbus more easily.

Once you obtain the necessary field data, a host of software packages is available to help you interpret data, analyze conditions, predict problems, and recommend solutions. These range from CMMS packages that help schedule maintenance procedures to performance monitoring software that analyzes plant data and looks for loops that are not performing up to snuff.

RCM is such a major change from the old, easily understandable preventive maintenance techniques, it's no wonder that engineers are reporting mixed results.

"We use Emerson's AMS on control system equipment," says Joe Pittman, principal safety systems specialist at Lyondell/Equistar Chemical, Channelview Texas. "Other than an automated documentation system, I have seen little benefit on the sensor side. It has provided benefit on the valve side, with the ability to do valve scanning and define which valves need to be pulled and repaired during turnarounds."

Such systems also require a major change in attitude. "Syncrude has an AMS server from Emerson in parallel with its Honeywell TDC 3000 system," says Ian Verhappen, instrument engineer at Syncrude in Fort McMurray, Alberta. "It has not been integrated with the remainder of the maintenance software system for two reasons: first, bureaucracy; second, buy-in from the maintenance team and their supervisors, who do not understand these systems require work to get results. As with many engineering projects, the biggest hurdle is not introducing the technology, but rather the culture change required after the fact to use it effectively."

In other words, the tools to implement an RCM system are available. You just have to conquer a few minor obstacles--such as fieldbus idiosyncracies, bureaucracies, old equipment failure theories, politics, and maintenance department mindsets--to make it work.