Patch Management and Risk – are we really moving forward

Caveat:  Because of the sensitivity of this subject, I had NERC provide comments to this blog.

Among others, Mike Assante at NERC is appropriately concerned about the electric utility industry’s implementation of effective patch management programs for industrial control systems. Patch management is not just a Microsoft problem. However, for this blog I will focus on Windows. Patch management is a very difficult problem with the following often mutually exclusive constraints:
- These systems often cannot be shutdown while the process is running which means patching intervals can be months even years
- They often have a Windows operating system that has been modified by the control system supplier so the plain vanilla Microsoft patch using the enterprise patch management program won’t work (plain vanilla Microsoft patches have actually crashed control systems)
- Some critical systems use older versions of Windows (eg, Windows 97, NT4, etc) that are no longer supported by Microsoft and yet cannot be replaced
- The patch could have unintended consequences with the overall system, even though the patched workstation passes off-line testing and validation
Realistically, I believe it may be very difficult to patch these critical systems on an expeditious basis. As stated in NIST SP800-53, what are your compensating measures?

June 30 NERC issued an Advisory on the Conficker worm (see www.nerc.com). There are two issues that need to be addressed on a generic basis. The first is the Advisory specifies that all machines (even those used for 24x7 and operational systems) should be included in the enterprise patch management program, with the required warnings that control systems require special testing and custom patch deployment strategies. Control systems should NOT be simply included in an enterprise patch management program! They must be addressed on a case-by-case basis. IT can certainly help, but this can’t be treated as a routine IT issue.
 
The second item deals with estimates of risk. NERC is not the only organization to have problems with defining “risk”. Within a short period after Tom Donaghue from the CIA made his announcement about the two international utilities having been held-up for extortion, DHS issued an advisory stating there was “no imminent threat” to the US. The term “imminent threat” has a specific meaning in the intelligence community that is different than used in the private industry. By stating there was no imminent threat (intelligence community definition), many in the electric industry discounted Tom’s disclosure as FUD- Fear, uncertainty and Doubt. This Advisory states the ES-ISAC estimates the risk to bulk power system reliability from Conficker is LOW due to the limited exploitation of this vulnerability and generally widespread awareness of the issue.  As with any estimation of risk that deals with an intelligent advisory – it can be just plain wrong! If the possible consequences are low, why publish the Advisory? If the possible consequences are medium-to-high, which I (and others at NERC and FERC) believe, how can you call it a low risk? I agree there is generally widespread awareness of Conficker in the IT community. However, I would venture to bet the awareness in the control system community is near zero. Any takers?

Joe Weiss

 

Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.

Comments

  • <p> MS08-067 was taken seriously by many in the control system community.  Certainly not all.  As an ISV and support provider the activity was evident...eg inquiries about compatibility, scheduling service work orders etc. </p> <p> As a control engineer, I couldn't help observing a 1st order response characteristic in the time required to patch managed systems (hundreds of systems observed). The overall response could not be considered expeditious by IT norms. </p> <p> Some of the key lessons learned: </p> <p> Task scheduling where there is potential for a service interruption is the primary bottleneck. System architecture for critical components must provide degrees of freedom for managment - eg bumpless transfer through use of redundant systems.  </p> <p> Deployment rate is highest for systems that regularly exercise good operational management procedures including backups, failover, and cold start. </p> <p> Accept that compensating measures are appropriate for a significant number of systems. For MS08-067, SMB and Server services were safely disabled in many cases. </p> <p> While more must be done across the supply chain to avoid defects in the first place, clearly, there is progress with respect to patch management as part of control system operational excellence. </p> <p>   </p>

    Reply

RSS feed for comments on this page | RSS feed for all comments