Safety Systems: Modifying SIL-Certified Equipment Failure Rates on the Basis of Deployment

Author

Harvey T. Dearden is with Time Domain Solutions Ltd.

In order to calculate the probability of failure on demand (PFD) it is necessary to identify estimates of equipment failure rates. One approach is to rely on failure rate data declared in a Safety Integrity Level (SIL) certificate from a vendor or independent test house. These figures may however be optimistic in that they are typically for reference conditions which cannot necessarily be guaranteed to apply on an uninterrupted basis.

Deployment effects

In the real world, equipment may fail due to:

Being stood upon
Being dripped upon
Scaffold strikes
Lightning strikes/voltage spikes
Process excursions
Process connection failures
Compromised ingress protection
Environmental extremes
Inappropriate modification/adjustment.

We might categorize these influences as being due to deployment. They are likely to be difficult to quantify, but may well dominate the actual failure rate of a function. It might be argued that these influences lead to systematic failures rather than random hardware failures, but although they may be systematic (in that they are '…related in a deterministic way to a certain cause' per BS EN 61508), they may still arise on a random basis, (unlike other systematic causes such as errors in the safety requirements specification) and would be revealed by a proof test. Unless they are specifically designed out, any PFD calculation that does not make allowance for these concerns is likely to be unrealistic. You might well see equipment with certified mean time between failure (MTBF) figures of, say 200 years, with a safe failure fraction of 80%, implying that the MTBF for dangerous failures will be 800 years. This might well be an optimistic claim for real-world circumstances where deployment influences might undermine the certified failure rate.

The SINTEF reliability data handbook ^[1] makes the point that "vendor estimates of dangerous undetected failure rate are often an order of magnitude (or even lower) than that reported in generic data handbooks. In their estimates of failure rate, they identify a parameter r as the fraction of dangerous undetected failure rate arising from random hardware failures. Values of r for field equipment (sensors and final elements) typically vary between 30% and 50%. It can be seen that systematic failures may be very significant and indeed will typically dominate. Although identified in the SINTEF handbook as a proportion of the overall dangerous undetected failure rate, that is not to say that the random hardware failures and systematic failures will be in a fixed ratio. I suggest that the contribution from systematic influences may be characterised as fixed for a given equipment type, installation and environment, and essentially independent of the inherent random hardware failure rate of the equipment.

Failure mode effects analysis (FMEA) is often deployed as an evaluation tool as part of a certification process; it will examine the impact of component failures, but does NOT include any of the real-world influences listed above. The equipment is assumed to be operating at reference conditions. It is from these considerations that we might well come to have greater faith in reliability figures derived directly from plant history and associated judgments rather than declarations on certificates. What such judgments lack in rigor they make up for in being grounded in real experience. It could be argued that the real value of SIL certificates lies in the assessment of systematic capability (SC) rather than of failure rate.

Vulnerability categories

An initial figure might be adopted where the manufacturer's certified figure for an equipment item is combined with default figures for deployment on the basis of vulnerability; i.e., susceptibility to additional factors. For these purposes, the level of vulnerability may be assessed on the basis of three categories: environment, duty and exposure, as detailed in Table 1.

Table 1: Vulnerability Level and Vulnerability Categories
Vulnerability Category	Vulnerability Level
	Reduced	Standard	Increased
Environment	benign - well within capability, IP rating not critical to suitability, not subject to excursions beyond capability	Not having a 'Reduced' vulnerability in any ONE category; environment, duty or exposure	Not having a 'Reduced' vulnerability in MORE than one category
Duty	benign - clean, non-aggressive (or not susceptible to fouling/attack) or not process wetted, little vibration, not subject to excursions beyond capability
Exposure	Limited - no exposed process connections/ isolation points (e.g., impulse lines, press tx, valve manifold, capillary tubing) or protected from

Logic solvers and non-field equipment will typically be located in equipment/auxiliary rooms and not subject to the same range of influences as the plant sensors and final elements. They will, however, remain susceptible to unauthorised interference (probably well-intentioned, but perhaps ill-advised). Some equipment will be less susceptible to such interference, and it is here suggested that vulnerability levels may be assigned on the basis of security, together with an assessment of whether the assumption of benign duty and environment is valid.

Table 2: Vulnerability Level for Logic Solvers and Non-Field equipment
Logic Solvers & Non-Field Equipment	Vulnerability Level
	Reduced	Standard	Increased
	Inherently secure design (i.e., not dependent on added security control measures) e.g. Soild State Logic, Safety PLC Benign environment - well within capability, IP rating not critical to suitability, not subject to environmental excursions Benign duty - well within capability or derated	Secure Access e.g., Relay System/Trip Amp in secure environment; locked cabinet/room authorized personnel access only Benign environment- well within capability, IP rating not critical to suitability, not subject to environmental excursions Benign duty - well within capability or derated	Standard PLC OR Relay System/Trip Amp with unsecured access OR Non-benign environment OR Non-benign duty

All sorts of qualifications might be introduced into the analysis of vulnerability, but given the levels of uncertainty in so many aspects of functional safety, there is limited value in refining the analysis. The important thing here is to recognise that the "raw" figure supplied by the vendor is not likely to be representative of the actual performance of the equipment in the field, and it is prudent to make some allowance in recognition of the influence of deployment.

Comparison of Field and Certificate Data

As a means of establishing a suitable allowance, we may compare nominal field values with vendor figures and identify the difference as being due to deployment. Table 3 compares some generic field values for field equipment, with nominal values representative of those typically found from vendor certification.

Table 3: Comparison of Field and Certified Values for Mean Time Between Dangerous Failures
	Nominal MTBFd values (years)
	Field Database	Vendor Certificate	Deployment
Press Tx	150	1000	175
Level Tx	115	140	600
Flow Tx	225	900	300
Temp Tx	150	2900	160
Remotely Operated Valve	115	200	265
Solenoid Valve	125	200	340

From consideration of these "ranging shots" and other database values, it is here suggested that a nominal figure of 300 years MTBF for dangerous failures be adopted in respect of "standard" vulnerability deployment.

From inspection of a range of database values, and as a matter of judgement, a factor 3.0 shift in values for field equipment is here suggested in respect of different categories of vulnerability. It is acknowledged there is little enough science here, but having identified that equipment will be more or less vulnerable depending on the circumstances of the deployment, it is appropriate to make some corresponding allowance, and the above is suggested as a starting point. Ultimately, the figures assumed should be validated by monitoring of the ongoing performance of the equipment.

Logic solvers and non-field equipment typically operate on benign duties in benign environments, but there will be some remaining vulnerability to the likes of voltage spikes, environmental control failures, electromagnetic and electrostatic effects, as well as unauthorized or misjudged interference. Given this residual vulnerability we may look for correspondence of the deployment element of standard vulnerability logic solvers and reduced vulnerability field equipment. In nominating the same value as for reduced vulnerability field equipment, the implication is that these residual influences represent 33% of the contribution to standard field equipment vulnerability, which does not appear unreasonable.

Suggested figures for dangerous failure rates due to deployment are given in the table below.

Table 4: Dangerous Failure Rates for Deployment.
Equipment Type	Vulnerability	Vendor Certificate	Deployment
Sensors	Reduced	900	0.0011
	Standard	300	0.0033
	Increased	100	0.01
Logic Solvers & Non Field Equipment	Reduced	2700	0.00037
	Standard	900	0.0011
	Increased	300	0.0033
Final Elements	Reduced	900	0.0011
	Standard	300	0.0033
	Increased	100	0.01

The values in this table should be regarded as indicative rather than definitive. Users may wish to compile their own values, based on an evaluation of their specific applications' susceptibilities to real-world influences, particularly if site-specific field data is available. A more comprehensive table might be compiled for individual equipment types, considering a range of generic database values and a range of vendor figures, but given the uncertainty in the data, there may be limited value to be gained. (Functional safety is NOT an exact science; we seek to establish the right order of risk reduction commensurate with the circumstances.)

Use of deployment figures

It is suggested that the raw safe failure fraction (as declared on the certificate) is assumed to apply to the deployed equipment unless there is specific evidence to the contrary.

So if we have a pressure transmitter with an MTBF (dangerous) of 750 years (failure rate 0.0013 yr-1) on a "standard" deployment, we would combine this figure with the default deployment failure rate of 0.0033 to give an overall figure of 0.0013 + 0.0033 = 0.0046 (217 years MTBFd). With reduced vulnerability deployment the corresponding figure would be 0.0013 + 0.0011 = 0.0024 (417 years MTBFd).

Note that a "perfect" item of equipment would never have a "real" (deployed) MTBF of better than the default deployment figure. A perfect pressure transmitter on standard deployment could not be claimed to offer an in-service figure better than 300 years MTBFd. Where redundancy is claimed, the usual approaches to common mode failures may be adopted. The intention here is to simply qualify the individual device failure rate figure.

This approach is proposed as a default; specific values might be identified for individual items on an exceptional basis, the implication being that the user should identify why any exception is considered appropriate.

As a sanity check on the values and the approach, we may examine a notional final element sub-system consisting of a solenoid valve driver barrier, a solenoid valve, an actuator and remotely operated valve, with respective certified MTBF dangerous figures of 10000, 200, 200, 200 years.

With standard deployment we have a sub-system MTBFd of 38 years:

(0.0001 + 0.0011) + (0.005 + 0.0033) x 3 = 0.0261/year.

If we allocate 50% of the function PFD to this final element sub-system, we would need to test every 1.2 years to meet a mid-SIL1 target for the function of 0.0316. If the vulnerability was "reduced," the corresponding MTBFd would be 53 years:

(0.0001 + 0.00037) + (0.005 + 0.0011) x 3 = 0.0188/year,

with a corresponding test interval of 1.7 years.

For a pressure transmitter, repeater barrier and trip amplifier, with certified MTBFd figures of 1000, 2000, and 1250 years and standard deployment, the sensor sub-system MTBFd would be 128 years:

(0.001 + 0.0033) + (0.0005 + 0.0011) + (0.0008 + 0.0011) = 0.0078/year.

If we allocated 35% of function PFD to this sub-system, we would need to test every 2.8 years to meet the same mid-SIL1 target. With increased vulnerability, the sub-system MTBFd would be 53 years with a corresponding test interval of 1.17 years.

(0.001 + 0.01) + (0.0005 + 0.0033) + (0.0008 + 0.0033) = 0.0189/year

These results appear sensible; in essence they show that there would typically be no difficulty in meeting SIL1 with a single channel with standard deployment and that SIL2 would be a realistic prospect, provided vulnerability was reduced or testing frequency was increased.