The safety manual should clearly describe the analysis boundary and the assumptions with regard to installation, commissioning, configuration, diagnostics, maintenance and testing that support the SIL claim. Without such information, it is very difficult for users to comply with IEC 61511 Clause 220.127.116.11―Procedures shall be implemented to evaluate the performance of the safety instrumented system against its safety requirements―which requires a comparison of equipment reliability assumptions with field operating performance. Identified failure modes and effects should be included in the maintenance troubleshooting guide in order for in-service failures to be tracked using the same modes, thereby allowing users to periodically compare their actual operating results against manufacturer’s claims.
2. Manipulation of safe failure fraction
The IEC 61508 committee included the safe failure fraction (SFF) and associated hardware fault tolerance requirements as a way of preventing manufacturers from claiming high SILs for non-redundant devices simply based on the PFD calculation. The SFF tables were intended to ensure fault tolerance (through required redundancy) in an environment of optimistic theoretical data. However, because the SFF is calculated from the same potentially “bad” data, the SFF is susceptible to the same error.
In practice, there is no correlation between SFF and product safety. The inverse is being demonstrated in the product approval process where it has become easier to certify a high total failure device with a high diagnostic coverage claim to SIL 3 than it is to certify one with a low total failure rate but no diagnostics.
While reviewing the various safety manuals, it became obvious that manipulation of the SFF is quite common. Many analysis reports, in direct disregard of the original intent of SFF, have included failure classifications that are not even acknowledged in IEC 61508 or IEC 61511. Contrary to what these reports frequently state, “no effect,” “residual,” “don’t care,” and “annunciation undetected” are not discussed in IEC 61508 and are not included in any failure definition.
Within some analysis reports, failure classes, such as “no effect,” “don’t care” and “residual” are being loosely defined as a failure that is neither safe nor dangerous. IEC 61508 defines a failure as the termination of the ability of a functional unit to perform a required function. Similar definitions can be found in IEC 61511 and the CCPS book, Guidelines for Safe and Reliable Instrumented Protective Systems. If the device has not failed in a deterministic state―safe or dangerous―it is still functional. It has not terminated its ability to function as specified. However, the analysts have counted these non-failures as safe in the SFF calculation, thereby artificially inflating the calculated SFF value.
IEC 61508 only acknowledges two types of failure, safe and dangerous, so it must be that analysts believe that any degraded, not safe, or not dangerous failure can be assumed to be a safe failure. Ironically, though these non-failures are generally included in the SFF calculation, the analysis reports actually recommend not including them in any spurious trip rate calculation.
Some reports are defining “annunciation undetected” as the failure of a diagnostic circuit such that it will not annunciate a future fault occurrence. The simple truth is that, if the user is not notified of the diagnostic failure, the user can not be in compliance with IEC 61511 Clause 11.3.1 which addresses the requirement of using diagnostic tests, proof tests or other means to detect dangerous faults. Dangerous diagnostic failures should not be classified as safe, but again, analysts are consistently reporting “annunciation undetected” failures as safe and astonishingly cite IEC 61508 as a basis for their claim.
When analysts count these “new” failure classes as safe, the product achieves a higher SFF without any measurable safety benefit. Products with mechanical components are often assumed to have a substantial percentage of “no effect” failures thus achieving SFF values greater than the 60% or 90% values required to reduce the hardware fault tolerance requirements in accordance with IEC 61508 Tables 5 and 6. A higher SFF frequently leads to SIL 2 or 3 Claim Limits without any redundancy requirement.
For example, a diaphragm-actuated globe valve manufacturer has claimed failure rates of 9.356e-03/yr “no effect” on a product with only 8.226e-03/yr of real safe and dangerous failures. More non-failures are included than real failures. The SFF calculated without the “no effect” failure is 59.2%, which is below the SIL 2 claim for a type A component. With the “no effect” failures included, the SFF is increased to 80.9%, sufficient for an SIL 2 claim.
All this to say, these “new” failure classes appear to have been created since IEC 61508 was approved solely for the purpose of inflating the SFF; thus these “new” failure classes are unsubstantiated theoretical constructs―the very phlogiston of safety engineering. Users should reject hardware fault tolerance claims based on such failure classes and should demand that manufacturers substantiate their claims following accepted reliability engineering principles.
3. A tendency to shift responsibility for safe operation to the operator
Many reports are issued with SIL claims assuming detected failures are configured to alarm rather than forcing the failed product to its specified safe state. This assumption allows manufacturers to report a low spurious trip rate and a low dangerous undetected failure rate, even when the product is inherently unreliable. Under IEC 61508 requirements, a product with a high total failure rate can achieve a high SIL Claim Limit as long as its failure is detected and annunciated. The SFF is not penalized by the choice to alarm rather than achieving the safe state. Therefore, the more failures that are detected, the higher the SFF becomes, regardless of the number of times or the total amount of time that the device is in the failed state, essentially dumping responsibility for process protection back on the operator.