What would a cyber attack mean to control system recovery – is extended manual operation possible

The prevailing view of SCADA/control system recovery following a cyber event/attack is having a valid stored image of the HMI will assure system integrity and result in a fairly quick turnaround (at most a few days). However, that notion needs to face reality which is entirely different. At a recent DHS briefing, DHS stated it could take 6 months to fully recover from a cyber attack assuming no major equipment damage. Damage to major equipment with long-lead delivery times could be 9-18 months just for replacing equipment. It is now more than 4 months after the Ukrainian cyber attack and the grid is still being operated in manual even though there was no major equipment damage.  A cyberattack that disabled a US SCADA system took approximately 4 months for the utility to recover where there was no equipment damage. During that period, they were forced into manual operation with substations having to be manned. An equipment manufacturer took 3 months just to “clean the system” following a virus affecting their manufacturing systems even though there was no equipment damage.

Control systems are systems of systems, not just Windows-based HMIs. The control system loop consists of data acquisition, field controllers, field sensors, field actuators, etc. not to mention all of the switches, routers, and firewalls. The field sensors, actuators, etc are generally not Windows-based. Many control system components are custom-built and cannot be replaced with off-the-shelf components which could extend outages or manual operation even longer.

The US electric utilities and nuclear plants have no requirement to remove malware. DHS has acknowledged that BlackEnergy malware is in the US electric grids and possibly other critical infrastructures. Consequently, how will system integrity be assured that would allow manual operation following identification of malware?  Manual operation may be the only prudent approach until system integrity can be assured. However, that may not be as simple as it sounds. As Ray Parks who worked for Sandia National Laboratory for many years noted in an April 29, 2016 SACASEC posting (https://us-mg6.mail.yahoo.com/neo/launch?.rand=502ma5rapkdk4#9781096986): “Every one of the power generation and refineries I've visited for assessment has claimed to have a backup plan using manual methods in case the I&C system goes down.  In reality, when you ask a few penetrating questions like "Who will be at each of these locations in the plant?" or "When did you last test the manual backup plan?" the truth comes out - their backup plan is a sham. This architecture adds an additional layer - the primary is the cloud solution, the secondary is the local control system solution, and the tertiary is the manual solution.  When did they last test their local control system solution for backup?  Who knows how to operate the local control system?” My experience mirrors Ray’s comments.

With the “graying” workforce retiring, and the Internet of Things (IOT) becoming more prevalent, extended manual operation becomes even more problematic. Making this even more of an issue is the move to more automation making it less likely that facilities can be operated in manual mode. The also affects the move toward more big data analytics because it is unclear if the data can be trusted following a cyber attack.

Are the US critical infrastructures ready to operate in manual mode for an extended period of time? There may be no other choice.

Joe Weiss

 

Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.

Comments

  • My email quoted was in response to a thread about an Australian LNG processing plant using big data analytics in the cloud to detect and predict foaming in a unit that had been opaque to them. This is the first instance of what may become many. It adds another layer to what is already a big problem - how does a utility or plant recover from a cyber attack? In the course of many assessments, I have heard stories from operators of incidents of cyber attacks and found out how broken their manual operation plan can be. A plant in operation when the big worm infections of the early 2000s occurred suffered an infection of the HMI and Engineering Control systems in their control center. They isolated and shut down all of their HMI and EC computers in succession, trying to disinfect them, and keeping at least one HMI operational so they didn't lose view for more than five minutes (the threshold for a shutdown that would cost millions). It was a close thing, but despite repeated re-infections and rebuilds, they barely scraped by, minutes from disaster. In another case, the manual plan called for an operator to deploy to a field unit to observe gauges and sensors and radio in the results. However, when we looked at that location, the gauges and sensors had been removed - some or all of them had needed repair and, since the were no longer "used", they had been remove rather than repaired. At another location, there was one critical location that only an experienced operator could monitor. We discovered that the computer user files included somebody who was not actively at the company. We were told the person was on extended disability after a motorcycle accident. Then we realized that the experienced operator critical to the manual operations plan was the same person whose user login was out-of-date. At that point, they had been operating for six months with an ineffective manual operations recovery plan.

    Reply

RSS feed for comments on this page | RSS feed for all comments