Cybersecurity in the SIS world

Nov. 11, 2016
Find and slay the dragons lurking in the typical safety instrumented system.

Cybersecurity is a growing concern in the process industries, and a number of good articles have been written about it for industrial control systems (ICS)—many full of doom and gloom. Here, we will divide the ICS into two parts: safety instrumented systems (SIS) and all other ICS components, which we lump into the basic process control system (BPCS). There are distinct differences between the SIS and BPCS in function, design and cybersecurity.

The SIS and BPCS differ in regard to cybersecurity from a process safety perspective, how traditional SIS design practices can help provide cybersecurity, and how cybersecurity concerns can affect the design of the SIS.

This article examines some of the differences between the BPCS and the SIS, SIS vulnerabilities to cyberattack and other security concerns unique to the SIS. It also covers how traditional SIS design can help with cybersecurity, and how traditional design practices of the SIS are affected by cybersecurity. Due to its size limits, one article can’t cover all aspects of designing or securing a SIS in the presence of cybersecurity threats, but it’s instead intended to provide food for thought on this topic.

When a cyberattack gets physical

It’s important to note that operating a chemical plant or refinery is complex, with many checks and balances as well as human beings to provide 24/7 oversight and some level of resilience. A cyberattack is really a cyber-physical attack because it involves a system with direct connections to the real world, as opposed to attacking a computer and data. A process plant is also a system designed to work in the presence of failures (even multiple ones) and uncertainty, even if the failure mode is unknown, whether it be a cyberattack, control valve failure, pump failure, etc.

For example, if a tower is over-pressurized, chances are you’ll have an independent, high-pressure alarm, possibly a high pressure override of the tower reboiler, an SIS and a relief valve protecting it, plus operator observations. This illustrates how defense-in-depth achieves process safety, which also provides protection against a cyberattack as an initiating cause. This is not to say that cybersecurity is not important for process safety, but rather that it must be considered in the mix of potential failures and safeguarding against those failures.

[sidebar id =1]Figure 1 illustrates the overall cyber-domain including the SIS. Generally speaking, only digital systems are a concern for a direct cyberattack, however, even analog or mechanical systems aren’t as completely immune as one might think. For example, the safe operating limit database (alarm and trip setpoints), asset management (changes in device parameters), SIS field instrument calibration databases (incorrect calibrations), and even the relief valve database (incorrect trip setpoints and test intervals) can potentially be corrupted by a cyberattack, leading to failure in the SIS or other process safety systems under the right circumstances.

The role of the SIS in safety

It’s important to understand how process safety is achieved through functional safety, and how the SIS fits into the overall picture. Achieving process safety using functional safety typically involves a defense-in-depth protective scheme consisting of independent protection layers (IPLs).

In Figure 2, we can see the SIS is not the only IPL in the layer of protection scheme. Some IPLs are subject to direct cyberattack and some are not. Modern design of functional safety protection systems (FSPS) for hazardous processes is all about preventing a hazardous condition, even in the presence of failures of some of the IPLs. The cyberattack threat does not change that paradigm, but rather adds additional potential failure modes of the BPCS and process equipment that may lead to potential safety demands of unknown frequency (an important risk consideration).

A fundamental SIS design principle is that failure of the BPCS to control the process for any reason should not cause a simultaneous failure of the SIS protecting the process. This does not change with the introduction of the cyberattack threat; if a cyberattack has compromised the BPCS, it should be substantially more difficult for the same attack to compromise the SIS either synchronously or asynchronously.

[sidebar id =2]Defense-in-depth and the related principle of requiring multiple failures or difficulties—a “tortuous path” before you have a successful cyberattack—are important protective concepts. This also applies to the BPCS, where safety controls, alarms and interlocks (SCAI) and other protective safeguards should present a difficult path to defeat them all to cause a loss of process safety protection and situational awareness of the operator.

How a SIS differs from a BPCS

Her are some the primary differences between the SIS and BPCS. The primary purpose of the BPCS is as an active, continuous system that controls level, pressure, temperature and other process variables designed to keep the hazardous materials in the process under control within the safe operating envelope, while efficiently and cost-effectively making on-spec product. The vast majority of SISs, on the other hand, operate as passive systems that sit there doing nothing until a safety demand occurs. When the process exceeds its safe operating limits, the SIS acts to maintain or bring the process to a safe state. This passiveness also makes it difficult for an intruder to analyze the system and its relationship to the BPCS by observation alone.

[javascriptSnippet]

Failure of the BPCS can be an initiating cause for a hazardous scenario, whereas a properly designed, low-demand SIS can’t typically be the initiating cause of a hazard—even during a cyberattack.

The BPCS will have tens of thousands of data points (reads and writes) and other parameters transferred digitally between BPCS boxes via multiple paths, where the SIS may have a few hundred data points, mostly reads with a limited number of writes. The BPCS will typically talk to the SIS through only one communication path per SIS. The SIS will also have its own internal communication structure.

In most cases, the SIS is implemented on different hardware, in some cases by a different manufacturer than the BPCS equipment.

The SIS is periodically proof-tested, while the BPCS is many times operated to failure. This provides a mechanism for detecting unauthorized changes.

Cybersecurity standards for SIS

There are several standards pertinent to cybersecurity and the SIS. The second edition of IEC 61511-1 will require that a security risk assessment be carried out to identify the security vulnerabilities of the SIS, including both physical and cybersecurity vulnerabilities. The standard also will require that the design of the SIS provide the necessary resilience against the identified security threats. This is a new, substantial requirement.

The ISA 99 committee has generated a series of pertinent standards, one of which is IEC 61511-1, ANSI/ISA/IEC-62443-1-1, “Security for Industrial Automation and Control Systems Part 1-1: Terminology, Concepts and Models.” The ISA 84 committee also has a subcommittee looking at cybersecurity for technical reports (TR). They’re in the draft stages of dTR84.00.09, “Cyber Security Related to the Functional Safety Lifecycle,” which is attempting to bring the principles of ANSI/ISA/IEC-62443-1-1 to functional safety and the safety lifecycle. Hopefully, they will do this in a practical manner without too much computer-speak.

Protecting SIS assets

Protecting the SIS against cyberattacks is a simple matter of preventing unauthorized changes that can compromise its safety functionality. Easy as pie, right?

To get a high-level view of your SIS and its potential vulnerabilities, draw a boundary around all the SIS assets (typically your SIS zone). Then, identify all of the communications paths and any other data, remote or physical access paths that cross that boundary. This is illustrated in Figure 3 for a generic SIS, but your system may have more or different vulnerabilities. This conceptual boundary can help you visualize your potential cyberattack vulnerabilities and systematically address them.

To evaluate your cybersecurity vulnerabilities and current protections, one of the first things to do is an inventory of all SIS equipment, software (with version numbers), and critical operating parameters. This should be followed by a security assessment of the SIS as required by the IEC 61511-1 2nd Ed. This inventory will provide a baseline for monitoring changes in your system.

[sidebar id =3]

{pb}

Contact your equipment vendors and ask them to provide an analysis of their equipment’s known cyber vulnerabilities, and include those in this assessment. This should be coordinated with the cybersecurity efforts on the BPCS. The identified vulnerabilities should be eliminated or their risk minimized.

Potential vulnerabilities include remote access, uncontrolled writes, ability to program remotely, configuration database indirect attacks and cyberattacks via manufacturer or third-party software.

Red flags include any computer equipment that is Windows-based, commercial off-the-shelf (COTS) technology, “open” systems, connections to the enterprise or Internet, the ability to write to the SIS, SIS equipment under lock and key, and/or portable media (USB ports, memory sticks, CDs, etc.). Ethernet and Ethernet switches (too vulnerable) are a no-no in a SIS zone or to cross the boundary. Wireless may be an open invitation and should be avoided in a SIS.

Don’t connect to what you don’t have to connect to. Risk vs. benefit has to be factored in, particularly when convenience is considered.

Implement intrusion detection, including monitoring for changes in software and safety-critical parameters. Fortunately, ICSs typically have extensive logging, and the SIS should log to them all changes in parameters and SIS accesses for programming, maintenance, etc. A cyberattack response plan should be put into place, including operator procedures and a recovery plan. Failure to plan is planning on failure.

SIS design for cybersecurity

Most articles on cybersecurity for ICS revolve around the conceptual approach of dividing the control systems up into zones and conduits (essentially, protection by controlled isolation), doing a cybersecurity assessment, and placing a bunch of firewalls or security appliances in your networks. These are important aspects of cybersecurity, but they are not the only things to do in designing the SIS system for protection against cyber attacks.

Many of the traditional design principles for SIS provide some level of cybersecurity protection (e.g. independence, separation, diversity, limited digital connectivity, controlled writes, distributed architecture, 4-20 mA signals, etc.). So legacy SIS have some cyber vulnerabilities, but they are not as exposed to cyber attacks as many people seem to assume. Unfortunately, in recent times, the use of some of these principles have declined due to cost considerations, competitive differentiation and changing demographics.

Independence, separation and diversity are philosophies that have been cornerstones of SIS design. Independence keeps the safety functionality separate from the control functionality. Keeping the SIS hardware physically separate eases physical security and isolating the SIS into zones. Having diversity in hardware can mean different hardware from the BPCS but the same manufacturer, or it can mean different manufacturers for the SIS hardware. Both are good practices, but having different manufacturers reduces common cause failures, and increases the knowledge required to hack both systems.

The safety PLC is commonly used as a logic solver for a SIS, and is typically the focal point of most cyberattacks on the SIS because it communicates with the outside world and is extensively software-based (e.g. requires programming software, software updates and patches, commonly networked, etc.). Safety PLCs are different from general-purpose, industrial PLCs and DCS controllers, and are much less open to unauthorized changes. SIS logic solvers that are more tightly integrated with the BPCS or use the same hardware as the control system can be more exposed to a cyberattack. Most safety PLCs have a hardware watchdog timer that monitors their logic cycle. A separate hardware watchdog timer may also be a good idea. Communication watchdogs can also be created in logic to detect communication problems between the SIS and the BPCS independent of the communication channel, and can detect problems with the communication channel.

[sidebar id =5]

Run/stop/program/remote switch: Almost all PLCs and certainly safety PLCs have some form of this hardware key-lock switch that can control programming access and in some cases, control “writes” to the PLC. No programming of SIS equipment should be allowed across a network connected to the BPCS, enterprise network or outside world, even through firewalls. It should be verified with the SIS logic solver manufacturer that their key-lock can’t be overridden externally through a communication link of any sort

“Read” requests from the BPCS are common to transfer the SIS status to the BPCS. These should be limited in scope. It should be verified that problems with the PLC communication processor/port (e.g. denial of service attacks, incorrect or garbled “read” requests, etc.) can’t affect the PLC’s safety logic cycle or its safety functionality.

Writes: the safest approach is to not allow any writes to the SIS logic solver from the BPCS. Most SIS logic solvers can limit writes to specific memory locations (e.g. will not accept writes to other locations). DCS, PLC or foreign device gateways may also allow only certain tags be read or written to the SIS. These features should be implemented. If you must write to a SIS logic solver, you might consider an analog or digital input to transfer the data.

Deep-packet inspection (DPI) security appliances and data diodes can stop all writes, and in some cases can whitelist read and write tags or memory locations. These security appliances must be able to get down to the write command and the write tag or memory location to be effective. The safety PLC should also ensure that write data values are within an acceptable range.

Non-digital SIS logic solvers such as relay logic and trip-amps are directly immune to a cyber attack. Indirectly, they may have a small cyber vulnerability if the database for their trip and alarm points is corrupted by a cyber attack. These systems can be used as a back-up safety PLC’s safety instrumented functions (SIF). In small applications or localized systems, they can provide a cyber-immune solution.

SIS field devices are less prone to cyberattack because the vast majority of their outputs are 4-20 mA or on/off 24 VDC/120 VAC, which are notoriously hard to hack. Safety protocols have been developed for digital fieldbus communications between field devices, but are not very common for SIS. When these are used, they may be more exposed than a 4-20 mA loop to a cyberattack. When a fieldbus safety protocol is used, the transmitters should be connected point-to-point, and high-speed Ethernet (HSE) should be avoided for SIS service.

A hardware security jumper blocks changing any of the parameters of a field transmitter, including changes via HART or fieldbus. SIS field sensors and other applicable SIS field devices should always have their security hardware jumper engaged in normal operating service. Software lockouts should not be used unless they’re the only security feature available. If an AMS system is present via HART, it should have read-only access for SIS field devices, even if the security jumper is not engaged. All SIS transmitters should have a deviation alarm where feasible.

4-20 mA smart transmitters typically communicate via a HART communicator during calibration and maintenance. There is a cybersecurity exposure due to the software in the communicator, but it must come indirectly through corruption of the software from the communicator’s manufacturer.

Calibration tools are another potential cybersecurity vulnerability for SIS field devices because modern ones are digitally based, and may communicate with a database or AMS system that would typically be on a Windows machine connected to the site enterprise network. Corruption of the calibration database could lead to miscalibration of safety transmitters.

Keeping a computer backup of the calibration data, trip and alarm points, transmitter parameters and programs is a good practice, and historical copies should be kept in case the current one gets corrupted. This will help you recover from a cyber or internal security attack. Remember, plan ahead.

Final elements, such as solenoids, valves, motor starters, etc. are typically immune to cyber attack. If your valve has a digital valve controller or a smart positioner, it may be possible for a cyberattack to spuriously trip or cycle the SIS valve. These devices may be communicated with by a portable Windows-based computer, and may be subject to a cyber attack.

Bypasses are points in the SIS logic solver that are commonly a write from the BPCS to the SIS to bypass a particular sensor to allow maintenance. Erroneous activation by the BPCS or SIS would defeat a SIF or part of a SIF. The common practice of having a manual, bypass-enable switch that must be activated, with a short timeframe to enable a bypass and bypasses that time-out are good practices. Also, having a bypass alarm generated from the SIS that restrikes periodically when in bypass, and remotely monitoring the bypass state, are also good practices, again, making it more difficult for a cyber intruder to enable the bypasses undetected.

Manual shutdown: IEC 61511-1 states that a “manual means (for example, emergency stop pushbutton), independent of the logic solver, shall be provided to actuate the SIS final elements unless otherwise directed by the safety requirement specifications.” This was put in place because there was a fear that the PLC logic solver would not operate when required, the PLC might go into a loop, or it might begin operating erratically (sounds applicable to a cyberattack). However, primarily for convenience and cost reduction, it’s been the practice of many people, to use the “…otherwise directed by the safety requirement specifications” to route the manual shutdown to the SIS logic solver because they rationalize that the logic solver is highly reliable. From a cybersecurity perspective, this is a bad practice because if the SIS logic solver is compromised, so may be the manual shutdowns in this logic solver. This takes away the operator’s ability to quickly implement a manual shutdown to bring the plant to a safe state. This is particularly worrisome for SIF where there are no other IPLs associated with the hazardous scenario.

Also, companies often have procedures that are “gun-drilled” (exact and ingrained) for shutting down the plant due to power loss, cooling water loss, etc. It makes sense that you should have a gun-drilled procedures to shut down your plant if you suffer a cyber attack that compromises process safety.

Reset function: Software resets may provide some protection against a cyberattack that cycles the safety PLC outputs, but may be compromised by a knowledgeable attacker. Field manual resets on the solenoids physically prevent the shutdown valve from cycling, and are immune to cyber attacks.

Cybersecurity is never done

Cybersecurity is a complex, important topic that is ever evolving. Many practical things can be done based on engineering analysis of the SIS’s vulnerabilities and data flows. Use of the zones and conduit concept, defense in depth, and the torturous path concepts are steps in the right direction. The standards in this area have a steep learning curve, and with the ever-changing cyber threat environment, may require a specialist to keep up. I will leave you with an interesting question: are identified hardware (and related software) vulnerabilities in your ICS and SIS covered by your hardware warrantees and afterward?  

[sidebar id =4]