Aurora is being ignored despite attacks and incidents

Dec. 6, 2020
The INL generator test, Iranian Iranshahr power plant, US data center, and other cases demonstrate that Aurora can cause catastrophic physical damage despite available relay protection yet not be detectable as a cyber attack. Additionally, Aurora doesn’t need either physical or cyber access to the target itself. The INL generator test and Stuxnet employed engineering and network security experts to ensure the attacks would cause the desired damage. However, the same coordination between networking and engineering has not occurred with Aurora mitigation. Substation protective relays will continue to be used in the grids and switchyards of power generation, industrial, and manufacturing facilities that can lead to Aurora events. Consequently, there is a clarion call to recognize Aurora as being a real threat and bridge the culture gap between engineering and network organizations before it is too late.

Introduction

Given the sensitivity of this blog, it is reasonable to ask why write the blog and why now? The answer to why write the blog is because there are still those who question whether Aurora is real as well as the validity of the 2007 Idaho National Laboratory (INL) Aurora generator test (https://www.controlglobal.com/blogs/unfettered/not-all-cyberattacks-are-malware-incidents-it-didnt-take-any-lines-of-code-to-blow-up-a-27-ton-generator). The answer to why now is because I recently received a second independent confirmation of the Iranshahr incident being an Aurora attack. Even though both individuals pointed to it as an attack, an accidental grid mishap also can cause the same impact. As the Iranshar event occurred in Iran, there are no direct details and forensics available to determine attribution, so none are implied or given.

Background

The Iranshahr turbine failure was identified as being a coupling failure. I have experience dealing with vibration issues including coupling issues (see below).  Coupling failures can cause mechanical damage but not wide-spread electrical equipment damage. Aurora causes both mechanical (very large torques) and electrical (very large current spikes) equipment damage.

In the early 1990’s time frame, I was managing the Electric Power Research Institute’s (EPRI) Nuclear Plant Instrumentation and Diagnostics Program. A primary focus of the diagnostics program was to identify potential cracking of the main coolant pump shafts in nuclear power plants as there were numerous plants with cracked shafts. Unfortunately, the vibration monitoring systems weren’t always detecting the cracking conditions. I had some of the top vibration monitoring experts in the world working on the project. In fact, we did testing using a full-size nuclear plant main coolant pump at Toronto Hydro’s Mississauga test facility (for a picture of the test participants in front of the actual main coolant pump, see Protecting Industrial Control Systems from Electronic Threats). We were able to characterize which vibration signatures were indicative of specific issues. One of the issues pertained to an imbalance of the coupling. A coupling is a device used to connect two shafts together at their ends for the purpose of transmitting power while permitting some degree of misalignment or end movement. Consequently, the need to monitor for vibration conditions that would detect potential imbalance issues. For the main coolant pumps, coupling issues were considered to be reliability, not safety issues. There have been numerous reports documenting coupling issues including coupling failures. I found several papers on coupling failure issues from the annual the Texas A&M Turbomachinery Conference. Examples papers on coupling failures include https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/162947/LectureT08.pdf?sequence=1&isAllowed=y and https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/163514/T2217-24.pdf?sequence=1&isAllowed=y. As outlined in these documents, the analyses were performed by mechanical and electrical engineers. However, none of these cases experienced widespread electrical equipment damage as seen in Iranshahr.

The analyses of the vibration issues resulting from imbalance issues were not considered to be cyber issues. However, the March 2007 INL Aurora generator test used laptops to open then reclose the breakers out-of-phase, resulting in destroying the coupling and the generator. Considering Aurora is not network malware, network tactics, techniques, and evaluations are not likely to apply to this investigation. Yet, because the INL Aurora generator test was considered to be a cyber security test, the NERC CIP cyber security process attempted to address it, but without the mechanical and electrical engineers necessary to fully understand and resolve the Aurora vulnerability. This culture gap continues to exist and must be closed.

Real Aurora Attack

Several years ago, I was urged by a knowledgeable expert on Aurora to check out an overseas power plant that had suffered a coupling failure similar to the generator in the INL generator test. This was the 2009 Iranshahr Power Plant coupling failure (https://www.steamforum.com/images/IranshahrThermalPowerPlant4x64MW-Couplingfatigue_2009.pdf ). Iranshahr was a 4-unit fossil power plant (4- 64MW turbine-generators) located in Iran. One of the potential causes of coupling failures is manufacturing/assembly issues. However, this turbine was installed in 2001 and the incident occurred in 2009. From a mechanical engineering perspective, a coupling issue is assumed to be unique to each turbine-generator as the mechanical failure is unique for each device. There is no indication about the operating status of the other three units at the time of the Unit 3 failure and none appeared to be in the pictures. Not only were the turbine, generator, and coupling destroyed, but also the Alternating Current (AC) motor pump casings overheated, and generator windings and power cables were fried which would be expected from an Aurora event. As mentioned above, this would not be the case from a “simple” mechanical imbalance condition. It is not clear who caused the Aurora event (as mentioned no attribution possible) or how extensive the damage was to the other units as some of the turbine-generator units were not in the pictures. However, Aurora is one of the only explanations for the types of extensive mechanical and electrical damage to multiple electrical and mechanical systems.

So as not to confuse people, Stuxnet and Aurora were occurring during the same time frame. They are different attack vectors. Even though Aurora can be caused by cyber remote access, Aurora doesn’t need physical or cyber access to the intended target. The common threads between Stuxnet and Aurora are both can cause catastrophic damage and neither is readily identifiable as being a cyber attack.

Other Aurora Issues

In order to explain how Aurora impacted the Iranshahr plant, compare this incident with the Russian GRU cyber operators in the 2015-16 hacking of the Ukrainian power grid. The Russians remotely opened the relays to turn the lights out. Had the attackers so chosen, they could have reclosed the breakers out-of-phase with the grid which could have caused an Aurora event damaging critical infrastructure equipment leading to very long outages. In fact, so as not to cause Aurora out-of-phase conditions, reclosing the breakers would have required significant due diligence. For whatever reasons, the Russians did not reclose the breakers. It’s sobering to recall that the Russian BlackEnergy malware has been detected in the US grids since 2014. We can’t always count on the Russians showing the same restraint they did in the Ukraine.

Another example of how Iranshahr could have been attacked is to consider an Aurora event that occurred at a US data center. The data center was shut down from an Aurora event originating from the neighboring electric utility’s substation providing power to the data center (“outside the data center’s fence”) that damaged the data center’s chiller motors. It was unclear if the data center case was malicious, unintentional, or from mechanical failures. The micro-controller logs showed no breaker operation (opening and reclosing) which caused the Aurora event whereas the mechanical counters showed breaker operation. The micro-controller logs generally would be stored on a networked historian or logging server whereas the mechanical counter logs would not. Like the INL generator test, the existing substation relay protection failed to prevent the Aurora event and resulting data center damage.

There have been other Aurora incidents in the US that have caused physical damage.

Conclusions

As the March 2007 INL industry test participants stated, the Aurora generator test demonstrated the ability to exploit the capability of modern protective equipment and cause them to serve as destructive weapons. The same results could be achieved by any competent power system protection engineer if provided access and the desire to do so. These types of results could be expected if similar operations occurred against a utility or industrial plant. As can be seen from the data center event, the damage can be to more than just electric utility equipment.

 The INL generator test, Iranshahr, and US data center cases demonstrate that Aurora can cause catastrophic damage despite available relay protection. All three cases also demonstrate that Aurora may not be readily detectable as a cyber attack. The Iranshahr case demonstrates that Aurora can simultaneously impact multiple electrical and mechanical systems. The data center case demonstrates that networked forensics/logging may not be available.

The INL generator test employed mechanical, electrical, and network security experts to ensure the test was representative of actual installations and successful. This was similar to Stuxnet in that physical security, cyber security, and domain engineers were involved to ensure the attack would cause the desired damage. However, because Aurora has been identified as being a cyber event the protective relay and mechanical engineers necessary to assure correct understanding of the event and the necessary mitigation process have not been involved.

Substation protective relays will continue to be used in the “old grid”, new grids, microgrids, and switchyards of existing and new industrial and manufacturing facilities that can lead to Aurora events (https://www.controlglobal.com/blogs/unfettered/the-chinese-hardware-backdoors-can-cause-transformer-failures-through-the-load-tap-changers/). Consequently, there is a clarion call to recognize Aurora as being a real threat and bridge the culture gap between engineering and network organizations before it is too late.

Joe Weiss