The best defense

July 10, 2004
In the feld and on the plant floor, redundancy protects and secures critical processes.

R

edundancy is a requirement for many process control systems. For each part of the process, users need to determine if redundancy is needed, and then decide where and how to implement redundancy schemes.

The first item to consider is the necessity of redundant control. "Redundancy should be an economically-based engineering decision," says Kevin Totherow, the president of Sylution Consulting (www.sylution.com). "Decision factors are cost of redundancy, likelihood of failure, cost associated with downtime, recovery time and cost of maintenance for redundant systems," adds Totherow.

According to Totherow, many redundancy decisions are not based on cost/benefit analysis. "Companies often make emotional decisions about redundancy. Many of the companies that insist upon redundancy would have much better project ROI as well as lower on-going costs with non-redundant systems and good recovery plans," observes Totherow.

Others echo Totherow's comments. "The basic calculation is economic: Is the cost of device failure times its probability of failure greater than or less than the cost of redundancy?" asks Ed Bullerdiek, the control group leader with Marathon Ashland Petroleum in Detroit. "For Safety Instrumented Systems (SIS), the processes by which you determine this are well known (Fault Tree, FEMA, LOPA, Markov Models), and this basic thinking is easily extended to other systems," concludes Bullerdiek.

One difficult cost to quantify is risk of injury or death. "Costs of unsafe conditions should always be presumed exorbitantly high," according to Matt Bothe, a senior automation engineer with CRB Consulting Engineers (www.crbusa.com). "Therefore, operations that pose danger to personnel should always apply redundant systems," continues Bothe.

Cost versus benefit calculations can be complex, but many processes can be analyzed for redundancy without detailed mathematical analysis. "Reliable generation of electricity is a necessity, but some of our sub-processes can handle downtime," reports Dale Evely, PE, an I&C consulting engineer with the Southern Company in Birmingham, Ala. "Ash and coal handling as well as sootblowing and water treatment don't need redundancy because of built-in storage capacity," adds Evely.

"For our primary process, the boiler and steam turbine, we design in dual redundancy of HMIs, controllers, and communication networks. For critical measurements we install redundant field devices and connect those devices to separate I/O cards in separate I/O racks,' concludes Evely.

Some processes are low risk in terms of hazards, and this can be a key factor in redundancy decisions. "Use of redundancy in our consumer goods industry is perhaps lower than more critical industries," observes James Reizner, a section head with Procter & Gamble in Cincinnati. "But on our paper machines, where downtime is very expensive and getting back on line can be a major event, we perform financial redundancy calculations," continues Reizner.

Redundancy is often needed when a process must run uninterrupted for a long period of time. This is often the case in biotech where batches can take months to produce. Another such scenario is test systems. "Some of our tests run for thousands of hours, ideally uninterrupted, so redundancy is critical," according to Robert Shaw, PE, an electrical engineer with the QSS Group (www.theqssgroup.co.uk) at the NASA Glenn Research Center in Cleveland.

Once a decision is made on which process needs redundancy, the next step is to determine where to apply redundancy in the control system architecture. It is rarely feasible to make an entire process redundant, and there are major differences in cost and benefits depending on where redundancy is applied.

Redundancy Simplified
If redundancy is needed, the next question is where to implement it. For simplicity's sake, let's divide the control system into five areas: HMI/Server, controller, I/O, field devices and communications (see Table).

Readers surveyed ranked each of these five areas according to most benefit and least cost.

According to our readers communications yield the best cost/benefit ratio.

An almost perfect confirmation of the reader survey comes from one of the industry's leading vendors. "Where do customers spend their redundancy dollars", asks Steve Lazok, the technical solutions support manager for Yokogawa Corporation of America (www.us.yokogawa.com)? "In order: communications, controller, I/O and HMI. Redundant field devices come into play only for SIS solutions."

Do It Yourself?
In the final analysis, the responsibility for redundancy implementation lies with the end user, but selection of the proper control system can make implementation less difficult.

Field device redundancy is specifically designed by the user for each process, but redundancy at the other four levels of the control system can either be an integral part of the control system or a custom add-on.

When redundancy is required, most users suggest buying a control system designed for redundancy from the ground up. "Buy a system with redundancy built-in and it doesn't cost much in the long run, nor does it take much in the way of resources to support. Buy or inherit a cheap system and try to add redundancy and you will rue the day you were born," says Bullerdiek.

"Examine the vendor's redundancy scheme at all levels. The two most important questions to ask are: 'If it breaks, how do I fix it without taking my process down?' And, 'Do I have to program anything in to get the redundancy or associated diagnostics to work?" suggests Bullerdiek.

Without diagnostics, redundancy can disappear unbeknownst to the user. "Support and diagnostics on the HMI or communication side are needed, even with rigorous initial testing," observes Kyle Austin, a technical specialist, critical control systems for process information & control, with UOP LLC in Des Plaines, Ill. "If a secondary communications path or hardware option fails, it can often be neglected if critical control is not directly affected, and then the benefit of redundancy is lost," adds Austin.

Others echo Bullerdiek's opinions concerning single source responsibility. "If someone is concerned about redundancy, they should look at a DCS or a Triple Modular Redundant (TMR) system because these systems have hardware-based options that make support much easier. Single vendor solutions eliminate finger pointing and minimize oft overlooked life cycle costs," says Robert Burgman, a senior automation engineer with the Pigments Division of Sun Chemical in Muskegon, Mich.

Burgman has implemented redundancy with both a DCS, and with a PC-based HMI and a PLC controller, and he says the differences are significant. "By far the weakest link in an HMI/PLC system is the PC's hard drive. Unfortunately, our HMI vendor's solution to redundancy is simply duplication, which means that both HMIs poll the PLCs-in effect doubling communications traffic to the PLCs," reports Burgman.

According to Burgman, this scheme requires ongoing management and can result in poor performance because duplicate polling can eat up the bandwidth and cripple the highway. His firm has had to resort to "warm standby" for some HMI/PLC systems, with both HMIs running, but with the backup HMI's polling shut off.

By contrast, HMI redundancy on Sun Chemicals' DCS system is seamless. "Our DCS doesn't have this issue because the HMIs talk to the DCS controllers directly via unsolicited communications over a redundant highway. This has proven to require much less support than our PLC/HMI systems. In addition, redundancy is a snap with our Yokogawa DCS because it was designed that way from the ground up. We don't even think about it- it just works," concludes Burgman.

Not only can the HMI be a problem with a HMI/PLC redundant system, so can the PLC. "We have never found a PLC redundancy system that works. It has been our experience that the promise of the technology is beyond its performance," says Andrew Rowe, the technical manager of process controls & MIS with the United States Gypsum Company in Chicago.

DCS-type control systems are almost always more expensive the HMI/PLC systems, but in the case of redundancy the cost differential may be an illusion, especially when life cycle operations and maintenance costs are included.

This can be especially true in terms of software and hardware upgrades. "Like many HMI/PLC systems, we also use Windows at the HMI level," reports Bob Hausler, the vice president of system marketing for ABB (www.us.abb.com). "A key difference is that we test all upgrades with the entire control system including the redundancy features prior to releasing these upgrades to our clients."

Hausler's point is well taken. If a plant has an HMI/PLC system controlling a critical process, it would not be wise to simply accept the latest Microsoft patch. All such changes to the operating system or to any other areas should be thoroughly tested prior to installation on the control system. With a DCS, this testing is done for the user. With an HMI/PLC system, the user has to do the testing.

It is clear that DCS and TMR vendors have spent a lot of time and money designing and implementing redundant systems (see Sidebar). It may be wise to take advantage of this expertise if redundancy is needed.

No matter what control system is selected, it is important to examine all failure points including power supplies, fuses, and other ancillary components. "No control system is truly redundant if there is a single point of failure, and there usually is such a point somewhere in the system," observes Hausler of ABB. A cooling fan may be a minor component, but failure could bring down an entire plant if care is not taken when designing redundancy.

I/O Network Redundancy
Whether the control system is a DCS, triple modular redundant (TMR) or a HMI/PLC; redundancy can be implemented at various system levels. As discussed  previously, the best cost/benefit ratio comes from redundant communications networks.

The first decision is what networks need redundancy. Networks among HMI/servers and controllers are usually duplicated for most redundant systems, but controller to I/O networks should be considered individually.

"I/O networks may be redundant depending on whether or not they include active or passive devices and their length. Short passive networks will not be redundant; long networks such as remote with active devices such as fiber optics will often be redundant," according to Bullerdiek.

Installation details for networks are critical. "Our communications networks are always redundant for reliability and safety, and we use two very different paths whenever feasible to minimize the risk of losing a path from localized incidents," says McCormick.



Redundancy and Fault Tolerance In DCS Controllers
In the Foxboro (www.invensys.com) I/A Series system, the term, "fault-tolerant" is used to indicate controllers that are not only redundant, but are also designed to exercise all components in both controllers in lock step. "This ensures that no incorrect messages are sent to the I/O modules or to other controllers, workstations, or computers," says Alex Johnson, the director of systems products technology at the Foxboro Automation Systems unit of Invensys.The fault-tolerant version of the I/A series controller consists of two modules operating in parallel, each with two interfaces to the control network and to the I/O network buses. The two control processor modules, married together as a fault-tolerant pair, provide continuous operation of the unit in the event of hardware failure occurring within one module of the pair.Both modules receive and process information simultaneously. Faults are detected by the modules themselves through the use of synchronization points and memory comparisons."One of the significant features of Foxboro's fault detection approach is the comparison of communication messages at the module external interfaces. Messages only leave the controller - whether going to an operator console or an I/O module-if both controllers are attempting to transfer exactly the same message. Message mismatches-and other internal checks-generate faults," observes Johnson.Upon fault detection, self-diagnostics are run by both modules to determine which module is defective. The non-defective module then assumes control without affecting system operations. This fault-tolerant solution has the following advantages over controllers that are merely redundant:No bad messages are sent to the field or to applications using controller data because messages are not allowed out of the controller unless both modules match bit-for-bit on the message being sent.The secondary controller will not have latent flaws detectable only upon switchover because it is performing exactly the same operations as the primary controller.The secondary controller is synchronized with the primary one, which ensures up to the moment data in the event of a primary controller failure.

Homeland Security Drives Redundancy Requirements
According to Connie Chick, controllers business manager for GE Fanuc Automation (www.gefanuc.com), homeland security initiatives are creating new redundancy requirements. "Redundant control processors will need to be physically separated for applications such as water, utility, chemical, refinery, defense and others," says Chick."Depending on the process, separation can entail separate control cabinets in one building or even separation into two buildings. This separation is meant to prevent a wide range of service interruptions, and it presents a challenge in terms of communication speed," continues Chick.Redundant control processors are constantly exchanging data at high rates of speed, typically via a common backplane. When processors are physically separated these high-speed communication requirements can become problematic, so GE Fanuc has created a custom solution for communication between physically separated redundant control processors."Our control memory Xchange connects physically separated processors via fiber. This technology allows multiple devices to share large amounts of control data over a fiber-optic deterministic network at speeds up to 200 times faster than standard industrial Ethernet LANs," explains Chick.