Dangerous cyber attacks may not be detected by network monitoring – engineers are also needed

Operational Technology (OT) network monitoring and threat detection is necessary for control system cyber security. What was clear about the 2017 Triconex cyber attack was network monitoring and threat detection were not sufficient. Luck and some mistakes kept the Saudi Arabian petrochemical plant from a dangerous explosion. Mistakes included the attackers' inadvertently tripping the plant twice - in June and then again in August. (A plant is said to “trip” when it ceases production for a reason related to safety.) Without the plant trips, it is questionable if the malware would have been found.

All of the presentations I had heard on the Triconex cyber attack, including those by national laboratories, focused on the malware found in the safety systems during the August 2017 outage. Julian Gutmanis was in the plant at the time and gave a presentation at the January 2019 S4 Conference on the Triconex cyber attack - https://www.youtube.com/watch?v=XwSJ8hloGvY&feature=youtu.be There was one aspect of his presentation that was new and led to my blog - https://www.controlglobal.com/blogs/unfettered/the-need-to-train-control-system-engineers-and-monitor-process-sensors-for-possible-cyber-attacks. That point was the plant initially tripped in June 2017 – 2 months before the August 2017 outage when the malware was discovered.

According to Julian, the initial outage was June 2017 when one Emergency Shutdown (ESD) controller caused the plant to trip. The plant Distributed Control System (DCS) did not reflect unsafe conditions (the DCS doesn’t monitor cyber threats). The vendor (Schneider) was called onsite to investigate and removed the affected ESD controller for analysis. The ESD controller logs and diagnostics (physical not cyber) were checked and no anomalous conditions were found. There were safety alarms indicating the ESD controllers were in the “Program” mode. That is not a safety concern so the alarms were essentially ignored by the operators. As a controller in the “Program” mode is not a communication issue, it would not have been identified in the security/communication logs either. Additionally, mechanical testing found the controller to be fully functional. Consequently, the engineers considered it to be an unintentional malfunction and operations were restored. There was no mention of any possible cyber security involvement in the June incident (cyber monitoring did not detect anomalous conditions). However, the controller tripped because of the Triconex system malware (even though the attackers didn’t want the plant to trip). The malware was missed even though the plant tripped!

Julian said there were many red flags about the June 2017 incident. Specifically, not identifying the June trip as possibly being cyber-related was a missed opportunity that gave the attackers two additional months of unimpeded time to tune the attack tools. If the plant hadn’t tripped in August, it is possible the cyber compromise of the safety systems would not have been identified until it was too late.

Julian’s presentation had an OT and not a plant engineering focus. This culture gap between the networking organizations (whether IT or OT) and plant engineering is common and being reinforced by the discussion of IT/OT convergence. That is because the plant engineers and vendor staff who analyzed the controller and responded to the HMI alarms are NOT OT but engineering/Operations – and there is a BIG difference!

Consider the similarities with the Triconex cyber attack and Stuxnet. Both Stuxnet and the Triconex attacks compromised the Windows HMIs and engineering workstations. For months, the centrifuges were being mechanically damaged with no apparent indication of anything but mechanical design problems. That is, the culture gap between the engineers and the cyber security organizations enabled the damage to continue for months until Stuxnet was “discovered”. In Ralph Langner’s treatise, “To Kill a Centrifuge”, Ralph asked if Stuxnet can be used as a blueprint for copycat attacks. I think you are seeing that blueprint followed in the Triconex attack. The Triconex attack demonstrated that hacking control/safety systems and controllers was not a Siemens-unique problem but an attack mechanism against control/safety systems regardless of vendor. Both Stuxnet and Triconex demonstrated the need for an out-of-band monitoring solution that would not be compromised by compromising Windows and the IP networks.

What are some of the implications of the June 2017 plant trip?

  • Depending on Windows for safety critical applications is questionable at best.
  • The ability to identify a cyber attack is critical to the cyber security regulations in NERC CIPs and NEI-0809/Regulatory Guide 5.71 for nuclear plants. Both assume that cyber attacks can be detected which turned out to be a wrong assumption. This issue isn’t confined to nuclear plants, either. Considering that Triconex safety systems are used for safety applications in nuclear plants and burner management in fossil power plants, how can you meet nuclear and fossil plant security and safety requirements if you can’t recognize a cyber attack? The same question can be asked of Safety Integrated System (SIS) standards such as ISA84/IEC 61511 in the process industry. Sophisticated cyber attacks against control and safety systems can impact any control and safety system vendor. Potentially, they might not be identified. The dependence on OT network monitoring to detect malware proved to be inadequate. As sophisticated malware may be able to circumvent malware detection capabilities, real-time sensor health monitoring can be used as a systems-integrity check to understand if upset conditions are a malfunction or a possible cyber attack. A safety system can be, and has been, compromised to cause damage and death. In order for a hack to be successful, the attack might well suppress alarms that would indicate a safety condition’s approach. Such alarms are part of the compromised HMI. An out-of-band sensor monitoring program (monitoring the raw electrical signals BEFORE they become Ethernet packets) would provide confirmation that dangerous conditions are approaching so the operator can take manual actions if the safety system hasn’t already done so. An out-of-band monitoring system could also provide confirmation that the dangerous conditions have been mitigated.
  • The lack of cyber security training of control system/plant engineers contributed to the lack of identifying upset conditions as possibly being cyber-related. In 2015, I supported the International Atomic Energy Agency (IAEA) on scenario-based training for engineers to be able to a recognize non-IP network-related upset conditions as possibly being cyber-related.
  • Unfortunately, the lack of coordination/cooperation between engineering/Operations and cyber security/networking is alive and well. While at the Cyber War Games at the Naval War College in 2017, I met the senior director of physical security from a major utility. He assured me cyber security was not an issue because he met with the senior director of cyber security every day. However, when I asked how often he talked to the VP Power Production or VP Power Delivery his question was why? There was simply no thought that a trip of a power plant or the lights going out because of relays in the substation could be cyber-related. The culture gap between security/networking and Engineering must be addressed.
  • Alarm management currently tends to reside in separate organizations and often even is separate buildings – Security Operation Centers (SOCs) and plant control rooms, for example. There is a need to coordinate SOC and network/security logs with equipment monitoring. Currently, there are alarms that are important to both Operations and security, but are not shared as noted by the alarms from the ESD controller being in the “Program” mode. Alarm management becomes a bigger issue as process sensors become smarter and more configurable. In this case, security and operations alarms can be of value to both security and Engineering or keep one side from understanding the true conditions.

Sophisticated cyber attacks can be misidentified as malfunctions. This brings up the need for out-of-band sensor monitoring as an independent view of process conditions from the potentially compromised IP networks. The current focus on IT/OT convergence rather than reaching out to engineering will continue to lead to “blind spots” when it comes to detecting sophisticated cyber attacks such as Stuxnet and the Triconex cyber attacks.

Joe Weiss