10 principles for securing control systems

By Jay Abshier, CBCP CISSPMOST UTILITIES know that cyber security of their control, diagnostic, and SCADA systems is crucial, but still have many questions about how to secure them. Several cyber security standards and best practices are being developed for securing control and SCADA systems (subsequent references to control systems will include SCADA). However, while these should be followed, there is no silver bullet tool guaranteed to secure any system. This article summarizes the 10 most important design and process principles for ensuring that due diligence has been followed to make these systems as secure as practical. Following this approach also should meet the intent of the North American Electric Reliability Councils (NERC ) 1200 and 1300 Cyber Security Standards. These principles are:

Governance
Security Awareness and Training
Policies and Procedures
Change Management
Security Architecture
Adding Devices and Remote Access
Vulnerability, Risk Assessments and Penetration Tools
Incident Response
Configuration and Patch Management
Monitoring

1) Governance
A structured, formal governance policy ensures that input and/or concurrence from appropriate stakeholders are obtained before decisions are made. Stakeholders will differ from firm to firm, but there are typical roles and responsibilities involved.

For the IT function, there is usually a governing IT council, including the CIO, chief IT architect, and leaders responsible for IT in their units. Technical teams for architecture, telecom, application development, and information security typically report to an IT council.

Also, business units should have governance teams for business functions. For control systems, an operations unit might have a governance team responsible for its control systems. Similar to an IT team, this team could be called the Control System Governance Team.

Ultimately, the business unit that relies on IT systems should be in charge of changes made to those systems and how theyre managed. Input should be solicited from appropriate technical governance teams before important changes are made to equipment, software or procedures. A formal governance structure will help ensure that the appropriate individuals and roles provide that input, and allow executives to document that appropriate vetting occurred before funding those projects.

2) Security Awareness and Training
Most employees, contractors and vendors do whats necessary to meet business objectives, while also making quality a priority. However, it often doesnt occur to some employees that they should also pay attention to security issues. An effective security awareness training program not only tells the audience what is expected of them, but it also tells them the reasons why.

3) Policies and Procedures
There are accepted standards for how to structure policies, which are usually divided into operations, procedural, and technical categories.

Operational Policies are high-level objective statements, followed by standards and guidelines associated with each policy statement. For example, Policy Statement 1.1 might be Scheduled Reviews: The Cyber Security Policies will be reviewed according to the following standards and guidelines.

Standards are actions for achieving the Policy Statement that must be followed. Guidelines are actions for achieving the Policy Statement that should be followed. Usually, one of the Operational Policies also will grant exceptions to policy. If its impossible to adhere to a standard, an exception to policy request should be required.

"An effective security awareness training program not only tells its audience whats expected of themit also tells them why."

Procedural Policies are repetitive actions required to accomplish a process, usually associated with an operational policy. For example, a required cyber security procedure likely would be monitoring logs (reports/log files will be reviewed, how often, etc.) and the change management process. Procedural Policies often are documented using flowcharts or other process-capture methods.Technical Policies are similar to operational policies, but are focused on more technical aspects. For example, an operational policy may require passwords and specify general standards regarding construction, aging, expiration, etc. The associated technical policy would specify how this would be implemented. 4) Change Management
A fundamental principle of effective IT governance is that a business unit must have absolute control over the systems, applications and infrastructure on which its processes rely. This means that all changes to a system must be reviewed and approved by the business unit that owns or relies on the system. For shared systems, such as e-mail, domain name services, etc., all units that rely on those systems should be able to review and provide input to proposed changes. This requires a robust Change Management software system and a rigidly adhered to Change Management process. A robust change management process/software system typically maintains a list of individuals, who must be notified if a change is proposed for particular categories of systems (SCADA, network, applications, firewall, etc.). It should enforce a time limit in which changes must be reviewed prior to approval and implementation. It should require and maintain documentation of what tests were performed to make sure the change functions properly. The process also should require back-out procedures in case something goes wrong, and document the steps for each change. Finally, the change management system should streamline and shorten the process for emergency situations (usually called HotFix). This usually provides one approval with notifications and documentation after an emergency change is implemented. 5) Security Architecture
To function securely, infrastructure devices that allow a control system to function must be isolated from outside negative influences, such as an engineer requesting a massive amount of data or a high-volume of traffic generated by a worm or virus. To accomplish isolation, machines associated with the control systems primary function must be grouped on a common network and protected from other networks. The control systems perimeter must be clearly defined; all outside connections have to be documented, and the appropriate method of securing these connections must be identified and implemented. For Internet Protocol (IP) connections, this requires a firewall.Firewalls are built to regulate connections between machines inside and outside it by allowing or restricting traffic to devices and applications. To help secure a control system, a firewall should be configured to reject all inbound or outbound connection requests. This forces connections to terminate in a DMZ (see below), which can implement database and application servers that can bridge two networks. Systems that need to get data to and from the control system vary depending on the application. To remain secure, applications in the control system should push data needed by external applications out to them. Also, when the control system needs external data, applications in the secured environment should pull the data in.This discussion will be limited to 802.11-type, WiFi implementations, and it will be assumed that users want to encrypt and secure these types of connections. While WEP offers very little protection and should not be used in a business environment, WPA is a good alternative for 802.11 installed bases that only have WEP as an option because its available with only a firmware upgrade. However, even though good encryption device authentication is available, WPA is susceptible to denial-of-service attacks, which can be damaging in a control environment.WPA2 (Ciscos term for 802.11i) is the best option for companies just putting in WiFi technology. Device authentication is available, the AES encryption algorithm is used, and these devices reportedly arent susceptible to denial-of-service attacks. For more information, visit www.wi-fiplanet.com/tutorials.6) Adding Devices, Remote Access
To prevent introduction of malicious code from infected devices, allow only authorized devices to connect to the control systems telecom environment, and communicate that policy to visitors, vendors and consultants upon arrival. Likewise, remote access to this telecom environment should be severely restricted, and only allowed under controlled circumstances. There are three common ways to accomplish this while maintaining a secure environment: DMZ application servers, virtual private networks (VPNs), and modem access.

"Experience shows that the manager with the most skin to lose is the one who takes charge."

DMZ application servers reside in the DMZ created between the corporate and control environments by the firewall. Remote users, either on the corporate network or outside via a dial-up or VPN connection, authenticate to this application server, and work from that environment. Similarly, standard IPSec VPNs dont allow control via a remote device thats connecting. However, solutions from Nortel and Cisco offer extensions to the VPN standard, which does allow this control. Control can include requiring updated anti-virus and personal firewalls on the remote device, requiring specific patch levels for operating systems, and prohibiting split tunneling, which is highly recommended. Split tunneling occurs when a remote device connects to the control systems telecom environment, while simultaneously connected to another telecom network. 7) Vulnerability, Risk Assessments and Penetration Tests
Though theyre complementary and often used interchangeably, vulnerability and risk assessments are different. Vulnerability assessments identify weaknesses that can be exploited to do harm, and usually propose alternative actions, or potential mitigating controls, which can be taken to reduce vulnerability or possible exploitation. Risk assessments identify the probability that a vulnerability can be exploited, allowing a cost-benefit trade-off.Vulnerability assessments can be sub-categorized into technical reviews, device scans, and penetration tests. Technical reviews use staff interviews, technical document reviews and visual device inspections to identify weaknesses in the security architecture and/or the systems policies and procedures. Device scans load scripts onto devices or use remote vulnerability scanners to identify devices technical vulnerabilities, such as improper configurations or missing patches. Penetration tests try to exploit a vulnerability to gain unauthorized access to systems, data or SCADA environment. Risk assessments first determine whether a threat will likely try to exploit the vulnerability, and then determine its probability of success, which is reduced by each mitigating control implemented. These assessments also make impact estimates, sometimes financial. Resulting probability and possible impacts can be used to prioritize potential vulnerabilities. Penetration tests are similar to invasive vulnerability assessments because they can unintentionally disrupt the SCADA environment, and so they should be performed by knowledgeable SCADA system users.The passive vulnerability assessment is usually the first step, which allows the client to stop there, or choose any combination of the other assessments or tests. Figure 1 illustrates the methodology that KEMA uses for a passive vulnerability assessment, followed by a risk assessment. KEMA is a Netherlands-based energy consulting, testing and certification firm.FIGURE 1: THE KEMA CYBER SECURITY ASSESSMENT METHODOLOGY
(Click on the image to view an enlarged .pdf of this chart.)

Typically, the passive Vulnerability Assessment is the first step in the process. From that point, the client can stop there or choose any combination of the other options: invasive vulnerability assessment, risk assessment and/or penetration test.
8) Incident Response
Though incident response plans are tailored to each firms organization and management, there are some common tasks to include in any response plan. While most plans pick skilled technical staffs to be in charge during an incident, experience shows that the manager with the most skin to lose is the one who takes charge. A solution might be to have a technical response coordinator and an overall incident manager from the business unit with the compromised systems. Its also important to identify what can and cant be done by staff when an incident is discovered. While it may be prudent to give system administrators emergency authority to shut down or cut off e-mail servers or file servers, it may not be prudent to let them shut off control system devices. These issues need to be explored and clearly communicated to front line responders, so theyll know what they can do on their own authority and what must be approved regardless of the emergency situation.Team members required for incident response must be identified for each type of incident. Also, its a good idea to have one person, usually someone both technical and familiar with management, to serve as management liaison. Table-top drills that explore what people would do to respond to different types of incidents are priceless. Make a list of all the different threats, identified in vulnerability and risk assessments, and discuss with each target what they would need to do to recover. For instance, if you discovered that a malicious hacker had been in your control system telecom environment for 10 days, what would you do to ensure that all your systems were configured and working properly?In addition, a plan that no one knows about is worthless. So, tell everyone who could possibly be involved in an incident response team about the plan and how it works. Finally, because most incident responses are studies in controlled chaos, dont expect people who have the most to lose to review the Incident Response Plan before taking action. Actual incidents are totally different than whats anticipated, but planning and practice gives participants the knowledge and experience they need to adapt.9) Configuration and Patch Management
Configuration management means knowing whats supposed to be installed and running on a device, and being able to determine whether its configuration is approved. Manual and automatic processes and tools can check configurations to see if unauthorized changes have been made to a system. Configuration management can also help reconstruct a system or machine if a disaster occurs. Patch management tests released patches and applies them to appropriate systems. This is difficult in all environments, and especially so in control systems. Patches must be tested to ensure they dont introduce unintended problems, and to check their interrelationships with other patches, systems, and application software. 10) Monitoring Malicious CodeWhile regular or continuous system monitoring for malicious code is needed, its counter productive to implement a security tool or procedure that prevents a system from fulfilling its business objective. Unfortunately, systems that must complete transactions in a small, discrete portion of time are often impacted by anti-virus software.Usually, restricting anti-virus software from running on files that are used in real time is a sufficient precaution. With control systems, however, tests must be made to ensure that system resources consumed by the anti-virus software dont interfere with the process control software.For example, a problem with isolating a control system behind a firewall is that it automatically provides virus definition update files to the control devices. One solution is to place a server containing tested updates in the control system DMZ, and configure anti-virus software in the control environment to retrieve updates from this DMZ server. In addition, many control devices dont support log file creation, but there isnt much to log since most of these systems dont require authentication or authorization. However, control systems that use Windows or UNIX variations do support logging of security events, and these should be used. Each companys security analysts should decide what to log. Most agree that, at a minimum, failed logon attempts and failed attempts to access files should be logged.Also, regular review of the logs can glean information undetected by other tools, such as network intrusion detection/prevention (NIDS). However, since log files are usually large and there are so many systems, users should consider buying a log file collection and analysis tool. Likewise, host intrusion detection/prevention (HIDS) technology can detect unauthorized activity at the server. However, HIDS agents usually consume approximately 5% of a servers resources, and so testing is recommended to make sure this agent doesnt interfere with control functions. Also, many events of most interest, such as unauthorized access attempts and failed logons, can be detected using NIDS. This is a passive control in the sense that agents run on dedicated devices, which analyze data packets that flow through the network without introducing measurable latency. Also, many new intrusion prevention devices have anomaly detection capabilities. Since the network traffic in a control environment is basically static (there may be more or less of it, but the types of traffic and network behaviour rarely changes), NIDS with anomaly detection capabilities may be ideal. Initial tests by several NIDS suppliers and control system vendors are confirming this assumption.No Silver Bullet!There is no silver bullet tool or technique for securing any computing system. No matter what steps you take, vulnerabilities will still exist. However, pursuing a comprehensive security program that is constantly monitored, updated, and reviewed by third parties does constitute due diligence and will keep your system as secure as possible.

About the Author

Jay Abshier is a principal at KEMA Inc. Consulting Group. He can be reached at [email protected].