I was in a joking mood when in my November column on SIS, I referred to the camel being a horse designed by committe. I apologize for that, I did not intend to offend the committe. I know that the members of the IEC61508 and other committes (for decades, since 1998) worked very hard and deserve our thanks. My goal with the November coulumn was to move our safety standards away from equations filled with acronyms and towards quantifiable and practical improvements, and not be turned off by anything or anybody.
Let me start this second column on safety standards by restating that they must become made simpler, more practical, more quantifiable and focused on the whole loop, not only on its equipment components. This also means that because we are interested in the SIL of the complete loop, it must also reflect the reliability of the communication links, the power supplies and most importantly, the automatic protection against human error. I will show that what we need is not more onion diagrams (Figure 1), but a clear understanding that a loop will only be safe, if all of its components are safe. We must understand, that it does little good if some electronic components are "suited for" a certain SIL level, if the loop itself is not protected from operator errors, cyber attacks, communication or power supply failures.
The Cultural Divide
We live at a time when cultural attitudes concerning automation are changing and the re is a debate about who should have the "last word" on safety, the machine or the operator? Standards, such as SIS reflect the old culture which trusts the operator more. Let us review, if this "manual mentality" is still valid in the 21st Century and if not, why not?
Safety statistics tell us that the number one cause of all industrial accidents is human error. One could refer to 3-Mile Island where the operators poured water into the instrument air supply, to BP where there was no automation the keep the drilling pipe straight, to the ferry accident in Korea where safety overrides were not provided to prevent the captain from turning too sharply into a fast ocean current or to airplane accidents where the pilots were not prevented from landing at the wrong speeds. The list goes on...
This is occurring in an age when we land robots on Mars, target enemy tanks by drones and are getting ready for the driverless car, in which we can play bridge on our mobile telephones, while it is parking the smart vehicle. Why is this contradiction? Is it just tradition? Is it that the older generations still do not realize that the choice is not between humans or machines, but between trusting the judgment of panicked rookie operators running around in the dark at 2AM or trusting the judgment of professional control engineers, who spent months in analyzing all potential "what if" combinations, before deciding on what emergency actions should be triggered under a particular set of emergency conditions?
So What Do We Need?
What we need is a change in safety philosophy, a change in our attitude towards the role of the operator and a change in our fragmented approach to safety. I would for example combine all those layers in the traditional onion diagrams (Figure 1), into a simple three layer one:
The 1st Layer would be the core, containing the Basic Process Controls (BPCS). This innermost core and its operation would be identical to those of the old "onion diagrams". In this region the operator is in charge. Here he/she is free to change set-points, modify control/logic algorithms, retune controllers, add or change sensors, final control elements, etc. While the plant conditions are within this region, the goal is to obtain optimized plant operation and maximized production.
The 2nd Layer in this onion diagram of mine is the safety instrument system (SIS) layer. When the plant conditions enter this layer, the safety actions are triggered and these instruments completely overrule the 1st (BPCS) layer. This 2nd layer is using its own dedicated (when necessary redundant or voting) sensors and/or final control elements and has no interconnection whatsoever with the 1st layer. In this 2nd layer, the operator can still make changes, but only with the formal (written) approval of plant engineering.
The 3rd Layer is the Override Safety Control (OSC) layer, which can not be turned off or overruled by anything or anybody. When the plant conditions enter this highly accident-prone region, safe shut-down of the plant is automatically triggered no matter what. This layer depends on its own sensors, overrules any and all actions of the inner layers and has absolutely no connection to the Internet. In other words, within this layer, the operator is out of the picture (operator actions are blocked) and the plant is shutting down under preplanned, totally automatic control.
In a control loop, the least reliable component is the valve, because of its moving parts. Over 2/3s of loop failures are due to valve and power supply failures. Sensors come in second and logic as a distant third. Therefore, one of the first steps to increase the fault tolerance (FT) of the loop is to provide backup for the block valves that are serving safety. Having two block valves in series increases the fault tolerance of the pair to FT=1 (1oo2), because as long as one of the two valves closes, the flow in that line will shut down when needed.
One must realize that the fault tolerance of a pair of backup valves further increases, if each of the two valves are provided with different actuator types and different power supplies (hydraulic, electric, etc.) I call these backup configurations as FT= 1+ (1oo2+), because I found over the decades, that using valves with different actuators and power supplies substantially increased their fault tolerance.
The role of power supplies is particularly important and is often neglected. For example, there are no nuclear power plants in the world today, that would not melt-down due to loss of cooling, if every power supply failed, including the backup ones (electricity, diesel, battery, steam and so on), as was the case at Fukushima. For details, see my book here.
The same logic that applies to final control elements also holds true for sensor and logic control equipment backup, except that in these cases no positive feedback exists (there is no position detector signaling if the instructions sent to a valve have been carried out). In other words when the backup sensor of a safety variable disagrees with the main one, there is no positive feedback to indicate which measurement is correct. In such cases either multiple voting sensors or intelligent ones with self-diagnostic software are needed to increase the fault tolerance of the system..
While in this article I have focused on the shortcomings of the present SIS standards, I would also like to emphasize that SIS has already made major contributions in some areas. Probably the most important is that it brought attention to the chaotic state of safety standards. Another is its recognition that SIS and BPCS systems should be separate and should not share sensors or final control elements.
I hope that over the years SIS standards will deemphasize complicated equations and acronyms and focus on the simple quantification of the safety levels of complete loops, after all, the chain just as strong as its weakest link! I also hope that SIS will grow out of the "manual control culture" by incorporating protection against both operator errors and cyber terrorism. Finally, I hope that in case of critical processes SIS will recognize the need for Override Safety Control (OSC), which can not be turned off by anything or anybody.