Ten steps to secure control systems

This "special to the web" article summarizes the 10 most important design and process principles that, while not guaranteed to secure a system, ensure your control systems are as secure as practical.

May 12, 2005

22 min read

By Jay Abshier, CBCP CISSPWHILE MOST UTILITIES acknowledge that the cyber security of their control, diagnostic, and SCADA systems is important, there are many questions regarding what should be done to secure them. There are several efforts under way to define cyber security standards and industry (“best”) practices for the security of control and SCADA systems (subsequently, references to control systems will include SCADA). While these need to be adhered to, there is no “silver bullet” tool that once implemented will secure any system.

Control system security is a process of due diligence to security principles. The purpose of this paper is to try to summarize the ten most important design and process principles that, while not guaranteed to secure a system, will ensure that due diligence has been followed to make it as secure as practical. These are:

Governance
Security Awareness and Training
Policies and Procedures
Change Management
Security Architecture
Remote Access
Vulnerability and Risk Assessments
Incident Response
Configuration & Patch Management
Monitoring

Following these principles will provide a due diligence approach that should meet the intent of the NERC Cyber Security Standards (1200 and 1300).

Governance
The purpose of a structured, formal governance policy is to ensure that input and/or concurrence from appropriate stakeholders are obtained before decisions are made. The appropriate stakeholders will differ from company to company, but an educated guess can be made as to typical roles and levels of responsibilities involved.

Of course, at the very top of any governance hierarchy is the executive management team made up of the CEO, CFO and business unit executives. Typically, for the IT function (not the IT Department, but systems throughout the company that rely upon IT) there is a governing council composed of the CIO, the Chief IT Architect and business unit leaders responsible for IT in their business units. Quite often the business unit does not actually have IT staff, and this role is filled by the business unit leader responsible for the business systems that rely upon computers and the IT infrastructure. Typical names for this team are IT Council or IT Executive Team. Very large companies sometimes have CIOs for major business units, and this team is sometimes called the Council of CIOs. This paper will refer to this group as the IT Council.

The IT Council typically has reporting to it technical teams, which also include technical business unit representatives where possible, for functions such as architecture, telecoms, application development and information security. Each team will develop, and upper level teams will approve their procedures for considering, vetting and approving new initiatives before they are submitted to higher level teams for approval.

Additionally, business units should have governance teams for their significant business functions. For the case of control systems, Operations might have a governance team structure similar to IT, with a team responsible for the control systems. For the sake of reference, this team will be called the Control System Governance Team.

Ultimately, the business unit whose critical business functions rely upon IT systems should be in charge of changes made to those IT systems and how they are managed. But, it is critical that the Control System Governance Team solicit input from the appropriate IT technical governance teams before important changes to equipment, software or procedures are made. Having a formal governance structure will help ensure that the appropriate individuals and roles provide that input, and will allow executives who are required to approve new projects to document that appropriate vetting occurred before those projects are funded.

Security Awareness and Training
The vast majority of people – employees, contractors and vendors – are diligent about doing what is necessary to meet their business objectives while making quality a priority. Many times, though, it does not occur to some employees that security issues are something to which they should also pay attention.

In a previous position I held as Director of Information Security at a large oil and gas company the person in charge of IT at a very large global business unit was not paying attention to information security. In a private conversation I mentioned this to a senior executive of the company, who replied “he has quite a few fires burning him, and yours isn’t very hot.” If you are faced with this problem, one option is to wait until the information security fire gets so hot the business unit leaders notice – but getting attention this way usually involves nasty consequences such as hacks or infestations of viruses and worms. A better alternative is Security Awareness Training.

An effective security awareness training program not only tells the audience what is expected of them – it also tells them why. In fact, at least half of the education should focus on the why. I have found that if actions required of employees do not make sense or they do not understand why those actions are important, the employee is much more likely to ignore the rules. A wise security guru once asked me “What is the best security program?” His reply when I said I didn’t know was “The one people use.” The purpose of a security awareness program isn’t so much to tell people what they should do, but to convince them they should want to follow the rules.

The importance of training is especially evident in thwarting hackers who try to gain access to system resources through social engineering. Social engineering is the practice of convincing people to willingly divulge confidential company information, such as passwords. The result of a social engineering exercise I once commissioned was abysmal – 50% of the people contacted in four sites located in three countries divulged their user IDs and passwords to the ethical hacker. We immediately embarked on an extensive security awareness program to try to improve the results.

What we discovered were two things. First, once people understood how hackers worked and how easily they could do damage to the company they became enthusiastic supporters of security initiatives and policies. Second, we discovered that any given communication method only reached about 20% of the user population.

At first we tried mandatory classes. While those that attended them rated the classes very high, only about 30% of the people for whom the class was mandatory actually attended it. We then branched out to a monthly newsletter (changed to quarterly after 6 months), posters (tied to company advertising themes and programs) at the common entrances to major offices, global “security awareness days" (booths were set up explaining security policies and tools, vendors were invited in, trinkets with security themes were handed out, etc.), set up an “asksecurity” email that anyone could use to ask a question or report a problem, and scheduled one-on-one visits to business executives explaining why information security was so important.

After twelve months we repeated the social engineering exercise. We were hoping for a 50% reduction in the number of employees divulging critical information. What was achieved was that no one gave up any critical information such as User IDs or passwords, and 20% of the employees contacted by the ethical hacker reported the incident.

Security awareness training can be very effective. But, to be effective, the training must communicate WHY information security is important and it must use several different methods of communicating the message.

Policies and Procedures
There are accepted standards for how to structure policies. Typically they are divided into categories called Operations Policies, Procedural Policies and Technical Policies.

Operational Policies
Operational Policies are the high level statements of objectives, followed by the standards and guidelines associated with each policy statement. For example, Policy Statement 1.1 might be “Scheduled Reviews: The Cyber Security Policies for COMPANY will be reviewed according to the following standards and guidelines”.

Standards are actions related to achieving the objective of the Policy Statement that MUST be followed. Guidelines are actions related to achieving the objective of the Policy Statement that SHOULD be followed. Usually, one of the Operational Policies will be related to granting exceptions to policy. If it is impossible (either technically or due to impaired business functionality) to adhere to a Standard, an Exception to Policy request is required.

Procedural Policies
Procedural Policies are repetitive actions required to accomplish a process, usually associated with an Operational Policy. For example, required procedures from a cyber security perspective would be the process for monitoring logs (the reports/log files that will be reviewed, how often, etc.) and the change management process. Frequently, Procedural Policies are documented using flowcharts or other similar process-capture method.

Technical Policies
Technical Policies are structurally very similar to Operational Policies, but are focused on the more technical aspects of achieving an Operational Policy. For example, an Operational Policy may require passwords and specify some general standards regarding their construction, aging, expiration, etc. The associated Technical Policy would specify in detail how this would be implemented (types of characters allowed, specific length, etc.). Since different systems have different technical restrictions on what can be specified, they will each require different technical policies – for example, what can be specified in a UNIX system will be different from what can be specified in a Windows system. Consequently, technical policies are quite often associated with specific devices or classes of devices.

Policy Formatting
I have found it useful for reference purposes primarily but also for clarity to number each statement in a policy document in an outline format. For instance, a policy statement might be number 1.1, while the associated standards are numbered 1.1.1, 1.1.2, and so on. Separating each standard and policy statement into a new paragraph and number also assists during the long meetings during which the new policies are edited and finalized before submission for approval.

Adherence to Standards
If it is important for your company to adhere to a standard, such as ISO 17799 or NERC 1300, you may want to number the policy statements to match the corresponding section in the standard document. This will help you to ensure that all parts of a standard have been covered and provide for easy reference during an audit or compliance review. If numbering the policy statements is not feasible, then you may want to consider creating a reference table that correlates policy statements with the appropriate sections of the standard.

Change Management
A fundamental principal of effective IT governance is that a business unit must have absolute control over the systems, applications and infrastructure upon which their critical business processes rely. This does not imply that Business Units should be able to do absolutely anything they want. Rather, it means that ALL changes to a system must be reviewed and approved by the Business Unit that owns or relies upon the system. For shared systems, such as Email, Domain Name Services, etc., while the IT Department may “own” or be in charge of the system, all the Business Units that rely upon those systems should be able to review and provide input to proposed changes. Essential to accomplishing this is a robust Change Management software system and a rigidly adhered to Change Management process.

Additionally, a method of auditing for compliance needs to exist. The methodology should not just rely on examining records in the change management system, but also should detect changes in the environment and ensure that the change management process was used to effect the change.

Typically, a robust change management process/software system will:

Maintain a list of individuals who must be notified if a change is proposed for particular categories of systems (SCADA, Network, Applications, Firewall, etc.). The process should ensure that the appropriate individuals are notified of the proposed change.
Enforce a defined time limit in which changes must be reviewed prior to approval and implementation.
Maintain a list of people who must approve a change for each category of change.
Require and maintain documentation of what tests (reference to test procedure if a standard one exists) were performed to make sure the change functions properly.
Require backout procedures in case something goes wrong.
Keep a record of the above steps for each change, recording information such as change number, dates, who submitted, who approved, and who implemented.
Incorporate process for streamlining and shortening the process for emergency situations (usually called HotFix). This usually provides for a single approval with notifications and documentation after the emergency change has been implemented.

Secure Architecture
In order to function securely, the infrastructure devices used to accomplish the functions of the control system must be isolated from outside negative influences. A negative influence can be anything from an engineer requesting a massive amount of data to the high volume of traffic generated by a hacker’s worm or virus.

To accomplish this isolation, all of the machines associated with the primary function of the control system must be grouped together on a common network and protected from other networks. Before this can be done, the perimeter of the control system environment must be clearly defined and all connections to the outside documented. The appropriate method of securing these connections must be identified and implemented. For Internet Protocol (IP) connections, this requires a firewall.

Firewalls
Firewalls are built to regulate connections between machines inside the firewall and machines outside the firewall. Firewall rules can be written to allow any traffic or to restrict traffic to only specific devices and applications. In order to help secure the control system environment the firewall should be configured to reject all connection requests either inbound or outbound. Then, as functionality is added to the control system environment, new rules can specifically allow the connections required by that functionality. In general, connection requests from the outside should never be allowed.

Implementation of a firewall between the corporate and control systems telecom network will also allow a Demilitarized Zone (DMZ) to be established between the two. This DMZ can then be used for placement of database and application servers that can bridge the two networks in a secure architecture. This will be explored further when we discuss remote access.

Getting data in and out
The types of systems that need to get data to and from the control system environment will vary depending on the application. To remain secure, applications within the control system environment should push data needed by applications external to the control system environment out to those external applications. Also, when external data is needed within the control system environment, applications within the secured environment should pull the data in.

Again, when data residing on the control systems environment is needed by employees or applications on the outside, the data should be pushed to a data repository on the outside. The employees and applications that need the data then should query the outside data repository, not the control system environment.

Wireless
This discussion will be limited to 802.11 type WIFI implementations. It will be assumed that you want to encrypt and secure these types of connections. Specific technologies discussed will be WEP, WPA and WPA2/802.11i encryption.

WEP offers very little protection and should not be used in a business environment. It takes only about an hour or two for a high school hacker to collect enough information to break the encryption key and connect to your network.

WPA is a good alternative for installed bases of 802.11 that have only WEP as an option because the technology is available with only a firmware upgrade. Besides good encryption device authentication is available, but the technology is susceptible to Denial of Service attacks. Just a few packets sent with the wrong encryption key can cause the device to reboot. This was intended as a precaution against hack attempts, but in a control environment the results are damaging.

WPA2 (Cisco’s term for 802.11i) is the best option for companies just putting in WIFI technology. Device authentication is available, the AES encryption algorithm is used, and (reportedly) these devices are not susceptible to a Denial Of Service attack.

For additional information, you can visit www.wi-fiplanet.com/tutorials

Adding Devices To The Environment
One of the most common methods by which malicious code, such as viruses and worms, is introduced into a telecom network is when an infected device, such as a laptop, is plugged into an unused and active Ethernet port. There are a couple of ways this risk can be minimized.

First, deactivate unused ports, especially in unoccupied offices and conference rooms. The hassle of activating ports is not nearly as great as the hassle of dealing with a rampant worm.

Second, make it a policy that only authorized devices are allowed to connect to the control system telecom environment – and communicate that policy to visitors, vendors and consultants upon arrival. In fact, a summary of all information security policies that apply to visitors should be conveniently available to hand out on every visit.

By taking these simple steps it is often possible to keep rogue or improperly configured devices from becoming part of the telecom environment.

Remote Access
Remote access to the control system telecom environment should be severely restricted and only allowed under very controlled circumstances. The reason for this is that much care and diligence has been expended defining the physical and electronic security perimeter and securing it against unauthorized access and introduction of malicious code. Remote access of devices allows the remote device to become part of the secured environment. Since the device is remote and by definition not completely controlled, this dramatically increases the probability of security incidents in the control system environment.

However, the business need for remote access is a reality that must be dealt with. Let’s examine three common ways to accomplish this while maintaining as secure an environment as we can – DMZ application servers, Virtual Private Networks (VPNs), and modem access.

DMZ Application Servers
DMZ application servers reside in the DMZ that was created between the corporate and control system telecom environments by the firewall. Remote users (whether resident on corporate network or outside the corporate network via a dial-up or VPN connection) would authenticate to this application server and do all their work from that environment. The important point here is that the remote user’s computing device never becomes part of the control system telecom environment.

Virtual Private Networks
If true remote access where the device becomes part of the control system telecom environment is unavoidable, then a VPN connection is the next best solution – but with extensions. Standard IPSec VPN solutions do not allow any control over the remote device that is connecting. However, solutions offered by companies such as Nortel and Cisco offer extensions to the VPN standard that do allow this control. Control can extend from requiring up to date anti-virus and personal firewalls on the remote device to requiring specific patch levels for specific operating systems to not allowing split tunneling. Split tunneling occurs when a remote device connects to the control system telecom environment while simultaneously connected to another telecom network (the vendors or partners corporate network, for example). It is highly recommended that split tunneling not be allowed.

Modems are much maligned as being insecure connections, and in fact, a dial-in modem connection that is always listening isn’t very secure even if a User ID and password is required. Better alternatives include unplugging the telephone connections when not in use, dial-back modems and encrypting modems. But, all of these do allow the remote device to become part of the control system telecom network.

It must be acknowledged that modem connections that allow vendors to perform maintenance without becoming part of the control system telecom network pose less of a security risk than those that do become part of the network. But, without implementing any of the control measures mentioned in the previous paragraph the security risk is still high.

Vulnerability Assessments, Risk Assessments and Penetration Tests
The terms vulnerability assessment and risk assessment are often used to mean the same thing. While complementary, vulnerability assessments and risk assessments are very different. The purpose of vulnerability assessments is to identify weaknesses that can be accidentally or maliciously exploited to do harm. Usually, alternative actions that can be taken to reduce the chance of the vulnerability being exploited are also proposed. These actions are sometimes called potential mitigating controls. The purpose of a risk assessment is to identify the probability that a vulnerability will be exploited allowing a cost-benefit trade-off.

Vulnerability assessments can be further sub-categorized into technical reviews, device scans and penetration tests. Technical reviews attempt through interviews with key staff, review of technical documents and drawings and visual inspection of devices to identify weaknesses in the security architecture, policies and procedures of the overall system or telecom environment. Device scans either load scripts onto devices or use remote vulnerability scanners to identify technical vulnerabilities – such as improper configurations or missing patches – on specific devices. Penetration tests go one step further and attempt to exploit a vulnerability to gain unauthorized access to systems or data.

While vulnerability scans and penetration tests are commonly used in the corporate environment, it is often not prudent to use these in the control systems environment, at least not on production devices. There are many documented cases where port and vulnerability scans of control devices have caused control system devices to malfunction, re-boot or shut down entirely. Because of the deterministic and highly critical nature of control systems devices and software, penetration tests also are likely to interfere with control functions. In the absence of test systems that can be used for vulnerability scans and penetration tests, technical reviews are the most recommended method to identify vulnerabilities in the control system environment.

Typical vulnerabilities that are found in SCADA architectures include:

Remote Access
The remote access methods usually introduce security vulnerabilities ranging anywhere from unauthorized access to introduction of malicious code from infected clients. In general, remote access into a SCADA environment should be tightly controlled and secured.

IP Connections
Most IP connections, while necessary, are usually constructed for ease of use and not for security.

Modems
Often dial in modems are installed to provide vendors with maintenance capability.

Applications and Data Exchange
Quite often applications are written with file shares that bridge between the corporate and the SCADA environment and permissions that allow too much access to the SCADA environment. Data flowing or out of the SCADA environment must be validated and the mechanisms for exchanging data must not introduce vulnerabilities to hackers or malicious code.

Change Management
Change Management must be robust and strictly enforced in order to protect the SCADA environment.

Incident Response
Incident Response plans should be in place and tested in the event a cyber security incident occurs.

User Accounts
User accounts must be administered and constructed in such a way to discourage unauthorized use.

Monitoring
Even if no know vulnerabilities exist, the environment must be monitored both for unauthorized activity and the presence of malicious code.

Risk assessments identify the probability that a vulnerability can be exploited by first determining the probability that a threat (hacker, error, etc.) will attempt to exploit the vulnerability and then determining the probability that the attempt will be successful. For each mitigating control in place, the probability of success is reduced. In addition to the probability of occurrence, an estimate of impact – sometimes financial – is also made. The resulting overall probability and impact can then be used to rank the vulnerabilities by priority.

In situations where the probabilities are well defined from statistical evidence, they can be used to compute a ballpark number for the financial value of implementing the mitigating controls by multiplying the probability by the financial impact. Unfortunately, exact probabilities for security incidents are difficult and a documented sample of incidents involving control systems does not exist.

However, relative probabilities can be determined – the probability of an external hacker trying to gain access is lower than a malicious insider which is lower than accidental incidents. Using relative probabilities, it is possible to prioritize the risks.

An additional benefit of risk assessments is buy-in from the stakeholders of the subject systems. By participating in the risk assessment workshops, they gain a better understanding of the vulnerabilities and threats, and assignment of the probabilities and impact is a group process usually resulting in group ownership of the results.

Penetration Tests attempt to exploit discovered vulnerabilities to establish unauthorized access to the SCADA environment and to accomplish unauthorized manipulation of the environment. Penetration tests are much like invasive vulnerability assessments in that they can cause unintended disruption to the SCADA environment. For this reason, if they are pursued, they should be performed by persons knowledgeable of SCADA systems or supervised by someone who is knowledgeable.

Typically, the passive Vulnerability Assessment is the first step in the process. From that point, the client can stop there or choose any combination of the other options: invasive vulnerability assessment, risk assessment and/or penetration test. "The Cyber Security Assessment Methodology" illustrates the methodology that KEMA uses for a passive vulnerability assessment, followed by a risk assessment.

Jay Abshier, CBCP CISSP, is semi-retired from ChevronTexaco and can be reached at [email protected].