OT networking personnel need to work with engineering to address safety impacts – it isn’t happening

Sept. 16, 2019
It is unacceptable to take almost 4 years to recognize there are engineering issues associated with a cyber attack intended to damage equipment. It is even more unacceptable that after almost 4 years, OT still doesn’t get it right. Stuxnet, Triton, and CrashOverride/Industroyer point out the need for control system and safety engineers to be trained in cyber security and to be an integral part of the cyber security process. This is also why there is a crying need for an ICS conference whose focus is ICS not networks.  

Control systems consists of field devices such process sensors, actuators, relays, and drives which have minimal to no cyber security and authentication. Protective relays often have cyber security but depend on the unauthenticated sensors for actuation. Control systems directly affect facility reliability and safety. Since you go “boom in the night” from the field devices, control and safety systems are designed and maintained by control and safety/protection engineers not networking (OT) specialists. Before 9/11, securing control systems was very much an engineering problem. This is why when I started the ICS Control Systems Cyber Security Conference in 2002 it was for the control system engineers. Unfortunately, following 9/11, the engineering issues got superseded by the network community’s laser focus on networks leading to a gaping hole in the understanding of control system cyber security and safety. This engineering gap is reinforced in almost every OT security organization and ICS cyber security conference.

Whenever process safety systems (Triton) or protective relays (Aurora) are targeted, the objective is the physical destruction of equipment. These types of attacks may not be detectable from network monitoring nor would network vulnerabilities and malware directly translate into impacts on actual control system equipment and processes. Without understanding the impacts on the physical process, network threat hunting and anomaly detection can leave a significant gap in understanding of the event. Often IT and even OT networking personnel fail to understand that control systems do not operate according to the programming languages and databases in which they operate. Control system operate on a level equivalent to machine code. This difference often leads to the networking personnel failure to understand that control systems do not operate according to the programming languages and databases in which they operate. The gap between OT networking experts and control system experts was exposed by Stuxnet. Symantec discovered the zero days but had no idea of the actual goal of the malware. Several of us in the engineering community couldn’t understand why a PLC database would be targeted rather than the archival database. It took Ralph Langer who understood the control systems to realize the real intent – damage to the centrifuges.

A safety system such as Triconex cannot change plant conditions to cause the unsafe condition that would call for the safety system to operate. That comes from the plant distributed control system (DCS). Yet I have seen little discussion of the DCS installed at the plant. Specifically for the Triton attack, the OT engineer in the plant affected by Triton didn’t detect the malware that shut the plant down in June 2017 until it was “identified” in the August 2017 shutdown (https://www.controlglobal.com/blogs/unfettered/dangerous-cyber-attacks-may-not-be-detected-by-network-monitoring-engineers-are-also-needed/ ). The OT engineer did not mention the DCS. He also did not communicate potential cyber concerns to Schneider when Schneider did the root cause analysis on the safety module that tripped the plant in June. Because there were no obvious cyber impacts (though there should have been) and the safety module that tripped the plant was “clean”, the plant was restarted with the malware still resident in the Engineers Workstation – a major failure on the part of OT. The lack of consideration of the DCS comes from a lack of understanding, outside of control system and protection engineers, of the unique design of these systems and their operation.

System considerations are understood by engineering but generally not the OT and network threat hunting communities. This concern led to the following blogs

- Network anomaly detection can provide a false sense of security -https://www.controlglobal.com/blogs/unfettered/network-anomaly-detection-can-provide-a-false-sense-of-security/

- Hacking the grid may not be as difficult as the October 13, 2017 Wired article suggests -https://www.controlglobal.com/blogs/unfettered/hacking-the-grid-may-not-be-as-difficult-as-the-october-13-2017-wired-article-suggests/

The first Ukrainian cyber attack was December 2015. June 12, 2017 Dragos issued their first report on CrashOverrode/Industroyer. The Dragos report addressed remotely opening breakers which led to the short term outages but did not address reclosing of the breakers out-of-phase with grid which can lead to equipment damage and long term outages (months). Additionally, Aurora was not mentioned.  Ironically, the next day, June 13, 2017, I gave a presentation at the American Nuclear Society Conference in San Francisco – “The Implications of the Ukrainian Cyber Attacks to Nuclear Plants”.  The focus of the presentation were issues such as Aurora because Aurora can damage nuclear plant safety equipment from outside the plant. Aurora is insidious as it can come from legitimate system commands and there is no malware associated with it. These type of issues were not addressed in the 2017 Dragos report.

On September 12, 2019, Wired ran the story - New Clues Show How Russia’s Grid Hackers Aimed for Physical Destruction - https://www.wired.com/story/russia-ukraine-cyberattack-power-grid-blackout-destruction/ based on the Dragos report – “CRASHOVERRIDE: Reassessing the 2016 Ukraine Electric Power Event as a Protection-Focused Attack”. The Dragos report mentioned Stuxnet often but never mentioned Aurora. How can this be when this is a report on the relays which is Aurora not Stuxnet? Moreover, how can it take almost 4 years to have a “complete” assessment of the event?

The second Dragos report mentioned the Siprotec relays and a specific cyber vulnerability. The intent was to tie the vulnerability to potential equipment damage. In the 2015-2016 time frame, I reviewed detailed cyber security analyses of the Siprotec relays used in nuclear safety applications. Consequently, it is not clear to me the Wired and Dragos descriptions are correct nor am I the only one. Nathan Wallace, an accomplished PhD electrical engineer, had the following preliminary observations. "There doesn’t appear to be any new findings in the article or in the vendor report. The fact that a DoS was performed on the Siemens Siprotec protection relays was already known when the story initially broke back in 2017. However, the attackers wouldn’t have been able to destroy the electrical equipment (transformers, breakers, etc) directly in this case. A fault (short circuit) has to be present and it’s the job of the targeted protection relays to see the fault and deenergize the line or bus. Outside of weather events (down poles, trees brushing lines, etc), faults are rare and 99% of time the relays are just listening. Additionally, even if the relays at one station are temporarily rendered inoperable there are backup relays (typically from a different vendor) at that station and even the relays at neighboring stations are programmed to respond accordingly."

As I mentioned in my blog, “Control system cyber incident hunting – input for a playbook on control system cyber incident investigations”, there is a need to correlate the actual events (engineers) with what is occurring at the network layer (OT)  https://www.controlglobal.com/blogs/unfettered/control-system-cyber-incident-hunting-input-for-a-playbook-on-control-system-cyber-incident-investigations-2/.

It is unacceptable to take almost 4 years to recognize there are engineering issues associated with a cyber attack intended to damage equipment. It is unacceptable that after almost 4 years, OT still doesn’t get it right. Stuxnet, Triton, and CrashOverride/Industroyer point out the need for control system and safety/protection engineers to be trained in cyber security and to be an integral part of the cyber security process. The current short-sighted approach assumes that OT does not need to understand control systems or work with Operations in order to improve cyber security. It is very important that OT work closely with Operations to coordinate an overall cyber security approach and response for equipment operation. It is my hope that the credit rating agencies and insurance industry ask the right questions before they blindly accept the recommendations of OT cyber security experts when it comes to "enterprise" risk. This is also why there is a crying need for an ICS conference whose focus is ICS not just OT networks.

Joe Weiss