The electric utility industry was recently provided lessons learned about a utility incident that resulted in a loss of SCADA/EMS (Energy Management System) functionality for almost an hour. It was obviously a cyber incident though the notification never mentioned the word “cyber”. According to the notification, an EMS loss-of-functionality event will occur every time a full system restart is performed with the duration of a typical event being five to eight minutes. The extended duration of this event was due to the communication vendor’s maintenance operation.
This event and associated notification (assuming it was written correctly) lead to many questions:
- Why are critical systems designed such that full system restarts that can cause multi-minute loss-of-functionality considered acceptable? This is especially curious when the industry is moving to wide area situational awareness where millisecond inputs are considered essential.
- Why isn’t this loss-of-functionality scenario addressed by the NERC CIPs as SCADA/EMS should be considered a critical asset and this is apparently a known problem?
- Why didn’t NERC consider this specific event to be a cyber incident?
- What other critical systems can have unexpected loss-of-functionality? Are they covered by the NERC CIPS or should they be?
- What training and/or forensics are provided to determine whether the loss-of-functionality was accidental or malicious? What does this mean to a utility’s response to CIP008 – Incident Response?