NERC's cyber security approach is preventing the electric grid from being secured

Feb. 11, 2013
Background: In computing, a denial-of-service attack (DoS attack) is an attempt to make a machine or network resource unavailable to its intended users. One common method of attack involves saturating the target machine with communications requests, so much so that it cannot respond to legitimate traffic, or responds so slowly as to be rendered essentially unavailable. Such attacks usually lead to a server overload. Moreover, NIST defines a cyber incident as electronic communications between systems (or systems and people) that affects confidentiality, integrity, or availability.
Background: In computing, a denial-of-service attack (DoS attack) is an attempt to make a machine or network resource unavailable to its intended users. One common method of attack involves saturating the target machine with communications requests, so much so that it cannot respond to legitimate traffic, or responds so slowly as to be rendered essentially unavailable. Such attacks usually lead to a server overload. Moreover, NIST defines a cyber incident as electronic communications between systems (or systems and people) that affects confidentiality, integrity, or availability.

NERC, FERC, DHS, Congress, and the electric industry have said it is important to secure the electric grid. September 27th, the presidents of the American Public Power Association (APPA), Edison Electric Institute (EEI), Electric Power Supply Association (EPSA), Nuclear Energy Institute (NEI), and the National Rural Electric Cooperative Association (NRECA) sent a letter to Senator Rockefeller proclaiming the importance of cyber security and stating they were working to secure the electric industry. Control system cyber incidents are real and numerous. My database contains more than 75 electric industry control system cyber incidents (this does not count power plants) and the number is growing. However, the electric industry and NERC generally have been silent on disclosing control system cyber incidents even within the industry.

There have been numerous discussions about the differences between compliance and security. The spirit of the NERC CIPs is to maintain the reliability of the electric grid in the face of cyber threats. However, the reality is the NERC CIPs fall far short of meeting that spirit. Specifically, the February 8 NERC Lessons Learned document provided four case histories that in IT would be considered denial-of-service events. Each of the four incidents has occurred elsewhere in the electric and other industries. In most cases, they were unintentional but it was not immediately obvious they were unintentional. In addition, there were cases where the similar incidents were caused maliciously. Three of the incident descriptions did not mention the word "cyber". The fourth stated it was "not a cyber security incident". Below is a summary, in quotes, of the four cases in the February 8 NERC report:

- "Engineers identified the hard disk on the SCADA server was fully utilized which prevented the supervisory control from functioning properly. Operators had visibility of the system, but did not have control. The post event investigation identified that an automatic file purge process was not functioning correctly which caused the hard disk to exceed its maximum capacity. The problem was found to be a historian test server issuing unidentified packets to the other historian servers. The network, not able to interpret the packets, sent them back creating a loop and ultimately resulted in network traffic congestion. This had been a latent code bug which had not previously been found by the vendor or others using the software."
o A similar situation occurred several years ago at a major control center and was also not identified as cyber.
o The SCADA alarm failure during the 2003 Northeast Outage was from a latent code bug.

- "A large utility's Energy Management System (EMS) began to lose data necessary for visibility of portions of its transmission network causing functionality and/or solution interruptions. No loss of load occurred during this event and it was quickly determined to not be a cyber security event. Excessive data packets being sent on the data network resulted in heavy loading. The extreme loading created a performance degradation of the data flows between the SCADA system, EMS Supervisory Control and various supporting systems. At times during the event, the degraded data flows limited the visibility of the EMS SCADA data to several control centers and the generation operations group. To compound the problem, as the event unfolded over an eleven hour period, EMS personnel were not able to determine the root cause of the excessive data network traffic, could not accurately predict when the problem(s) would be solved and when data would be restored to operations."
o Similar to many other events that took up to 24 hours or more to identify
o Sophisticated attacks like Stuxnet took over a year to identify

- "A utility's control center experienced a SCADA failure which resulted in a loss of monitoring functionality for more than thirty minutes. During the event, the utility's Inter-Control Center Communications Protocol (ICCP) data links remained in-service. All data sent and observed were frozen at the values transmitted at the time of the failure and remained at these values for the duration of the event. The utility's EMS did not alarm or indicate any abnormalities with the data for an extended period of time."
o Similar to what happened with the Bellingham, WA gasoline pipeline rupture in 1999.

- "A control center experienced a loss of control and monitoring functionality of the EMS due to the loss of the operator's user interface application between its primary EMS computer/host server and the system operator consoles. The EMS servers run a software application that enables the system operators to view, monitor and control the transmission system via system operator consoles. Following a time of higher-than-normal system utilization of the EMS, in particular the heavy use of the study network software application, the user interface application failed while running on the primary EMS server. Contributing to the user interface application failure was a limit to the amount of memory (RAM and Virtual) available to run the ongoing and background software application processes. As a result, the failed state of the user interface application did not trigger a system failure that would have automatically switched functionality to the redundant EMS server due to a software application configuration setting."
o Similar to what happened with the Bellingham, WA gasoline pipeline rupture in 1999.

These incidents are not trivial:
- Some of the incidents took a significant amount of time to identify the problems.
- Some of the events did not trigger alarms.
- None were identified by intrusion detection systems.
- Redundancy considerations were not always effective.
- Operator training was not always effective and did not address the cyber issues.
- The only mention of any cyber consideration was coordination of firewall modifications.

Each of these cases are loss of view/loss of control cyber incidents that were direct threats to the reliability of the electric grid. Yet the NERC CIPs doesn't address these situations. What does that mean to the cyber security of the electric grid when most utilities' cyber security programs consist of verbatim following the NERC CIPs? The NERC CIPs have done an admirable job in making cyber security of the grid more mainstream. However, if the intent is to keep the lights on, the NERC CIP approach (regardless of version) needs to be changed and an approach such as NIST SP800-53 Appendix I (control systems) implemented.

Joe Weiss

Sponsored Recommendations

2024 Industry Trends | Oil & Gas

We sit down with our Industry Marketing Manager, Mark Thomas to find out what is trending in Oil & Gas in 2024. Not only that, but we discuss how Endress+Hau...

Level Measurement in Water and Waste Water Lift Stations

Condensation, build up, obstructions and silt can cause difficulties in making reliable level measurements in lift station wet wells. New trends in low cost radar units solve ...

Temperature Transmitters | The Perfect Fit for Your Measuring Point

Our video introduces you to the three most important selection criteria to help you choose the right temperature transmitter for your application. We also ta...

2024 Industry Trends | Gas & LNG

We sit down with our Industry Marketing Manager, Cesar Martinez, to find out what is trending in Gas & LNG in 2024. Not only that, but we discuss how Endress...