RISK ASSESSMENTS identify the probability that a vulnerability can be exploited by first determining the probability that a threat (hacker, error, etc.) will attempt to exploit the vulnerability and then determining the probability that the attempt will be successful. For each mitigating control in place, the probability of success is reduced. In addition to the probability of occurrence, an estimate of impact – sometimes financial – is also made. The resulting overall probability and impact can then be used to rank the vulnerabilities by priority.
In situations where the probabilities are well defined from statistical evidence, they can be used to compute a ballpark number for the financial value of implementing the mitigating controls by multiplying the probability by the financial impact. Unfortunately, exact probabilities for security incidents are difficult and a documented sample of incidents involving control systems does not exist.
However, relative probabilities can be determined – the probability of an external hacker trying to gain access is lower than a malicious insider which is lower than accidental incidents. Using relative probabilities, it is possible to prioritize the risks.
An additional benefit of risk assessments is buy-in from the stakeholders of the subject systems. By participating in the risk assessment workshops, they gain a better understanding of the vulnerabilities and threats, and assignment of the probabilities and impact is a group process usually resulting in group ownership of the results.
Penetration Tests attempt to exploit discovered vulnerabilities to establish unauthorized access to the SCADA environment and to accomplish unauthorized manipulation of the environment. Penetration tests are much like invasive vulnerability assessments in that they can cause unintended disruption to the SCADA environment. For this reason, if they are pursued, they should be performed by persons knowledgeable of SCADA systems or supervised by someone who is knowledgeable.
Typically, the passive Vulnerability Assessment is the first step in the cyber security risk assessment methodology process. From that point, the client can stop there or choose any combination of other options, including invasive vulnerability assessment, risk assessment and/or a penetration test. the following illustrates the methodology that KEMA uses for a passive vulnerability assessment, followed by a risk assessment.
Incident response plans have to be tailored to the unique organizational structures and management philosophies of each company. Even so, there are some common tasks that need to be examined and included when drafting your plan.
Who Is In Charge?
Most plans tend to identify technical staff with specific skills to be in charge during an incident. However, my experience has indicated that the manager with the most skin to lose is the one who takes charge. One solution might be to have a technical response coordinator and an overall incident manager from the business unit that is responsible for the systems that are compromised.
It is important to identify what can and cannot be done by system administrators when an incident is discovered. Keep in mind that the incident may be discovered at 1 a.m. and by the time management is available to make a decision, the incident is out of control. While it may be prudent to give system administrators emergency authority to shut down or cut off email servers or file servers, it may not be prudent to give the same authority for shutting down or cutting off control system devices. These issues need to be explored and clearly communicated to front line responders what they can do on their own authority and what must be approved regardless of the emergency situation.
Incident Response Team Membership
The team members required for incident response must be identified for each type of incident. For a virus or worm infection, membership from information security, system administration and IT Management may be sufficient. But, if it is suspected that a security breach has occurred that compromised customer credit cards, then Legal, HR and Public Relations might be needed. Also, it is always a good idea to have one person – usually someone who is both technical and familiar with management – to be the management liaison.
The last thing the technical staff that is responding to a virus outbreak needs is to be interrupted every 10 minutes by different managers asking about the status. The role of the management liaison is to regularly solicit updates from the technical response team, and report these to management.
A good practice is to develop forms that address all possible questions management may have and file these on a pre-approved schedule from the time the incident is declared until the incident is resolved. These forms will also provide good documentation on what was done, when it was done, what worked, and what could have been done differently.
Table Top Drills
Table top drills that explore what people would do to respond to different types of incidents are invaluable. Make a list of all the different threats (identified in vulnerability and risk assessments) and discuss with the each target what they would need to do to recover. For instance, if you discovered that a malicious hacker had been in your control system telecom environment for 10 days, what would you do to ensure that all your systems were configured and working properly?
Formalize Your Incident Response Plan
A plan that no one knows about is worthless. Communicate the existence of the plan and how it works to everyone who could possible be involved in an incident response team. Make sure that management knows about the plan, how incidents will be handled and how they will be kept up to date in the event of an incident.
Expect The Plan To Be Ignored
Responses to most incidents are studies in controlled chaos. Don’t expect the people who have the most to lose to quietly sit down and review what the Incident Response Plan says before they take action. General Eisenhower, when commenting on the planning for the Normandy Invasion in World War II, said “Plans are useless, but planning is essential”.
What he meant was that when the troops hit the beaches the situations they faced were totally different from what they had planned and practiced. But, the fact that they had planned and practiced gave them the knowledge and experience they needed to adapt. The same is true for Incident Response.
Configuration and Patch Management
Configuration management deals with knowing what is supposed to be installed and running on a device and being able, at any point in time, to determine whether the device has the approved configuration. Manual and automatic processes and tools can be used to check the configurations, which is useful to determine if unauthorized changes have been made to a system. Configuration management is also integral to reconstructing a system or machine in the event of a disaster.
Patch management is the process for testing released patches and applying them to appropriate systems. This is difficult in all environments, but particularly so in control system environments. Patches must be tested to ensure they do not introduce unintended problems as well as their interrelationship with other patches, system and application software. Quite often, updates to system or application software must be tested and installed before patches will work properly. Patch management software systems are closely related to configuration management systems in that they can track the installed base of system and application software with a record of patches that are installed, incompatible, required, etc.
Isolation of the control system telecom environment can reduce the risk of unpatched vulnerabilities in operating systems and applications being exploited, but the danger cannot be eliminated. Effective patch management requires extensive test systems that can replicate every configuration in the control system environment, and implementing patches as quickly as is prudent.
While continuous or at least regular monitoring of systems for infections with malicious code is a needed precaution, it is counter productive to implement a security tool or procedure that prevents a system from fulfilling its business objective. Unfortunately, systems that must complete transactions in a small, discrete portion of time are quite often impacted by anti-virus software.
Usually, restricting the anti-virus software from running on files that are used in real time is a sufficient precaution. With control systems, however, tests must be made that ensure that the system resources consumed by the anti-virus software do not interfere with the process control software.
One of the problems encountered when isolating the control system telecom environment behind a firewall is automatically providing virus definition update files to devices in the control environment. One solution to this is to place a server containing tested updates in the control system DMZ and configuring anti-virus software in the control environment to retrieve updates from this DMZ server.
Log File Analysis
Many of the devices in the control system environment do not support log file creation. But, since most of these systems do not require any type of authentication or authorization, there really isn’t much to log. However, the control systems that use Windows or different variations of UNIX do support logging of security events and these should be utilized.
What to log really should be determined by the security analysts at each company. But, most agree that at a minimum failed logon attempts and failed attempts to access files should be logged. The major problem common to most companies is not keeping the logs long enough and not analyzing them.
One of the benefits that logs can provide is reconstructing what happened in the event of an incident. Thought should be given to the types of information that might be needed and the possible time lag between the time an incident occurs and when it is discovered. Log files that are maintained for only a week and that contain the bare minimum information may not be worth maintaining.
Regular review of the logs is essential. Information may be gleaned that is not detected by other tools such as NIDS. However, since log files are usually very large and there are so many systems, a company should consider the value of purchasing a log file collection and analysis tool to assist the system administrators and security analysts.
Host Intrusion Detection/Prevention (HIDS)
HIDS technology is excellent for detecting unauthorized activity at the server. However, HIDS agents usually consume approximately 5% of a server’s resources. Before implementing HIDS, it is recommended that extensive testing be performed to ensure that the HIDS agent does not interfere with control functionality. Also, many of the events of most interest – attempts at unauthorised access and failed logon attempts – can be detected using NIDS.
Network Intrusion Detection/Prevention (NIDS)
NIDS can be considered a ‘passive’ control in the sense that the agents run on dedicated devices which analyse data packets that flow through the network without introducing measurable latency. Many new intrusion prevention devices have anomaly detection capabilities. Since the network traffic in a control environment is basically static (there may be more or less of it, but the types of traffic and network behaviour rarely changes) NIDS with anomaly detection capabilities may be ideal. Initial tests by several NIDS suppliers and control system vendors are confirming this assumption.
There is no “silver bullet” tool or technique for securing any computing system. No matter what steps you take, vulnerabilities will still exist. However, pursuing a comprehensive security program that is constantly monitored, updated and reviewed by third parties does constitute due diligence and will keep you as secure as possible.
Jay Abshier, CBCP CISSP, is semi-retired from ChevronTexaco and can be reached at firstname.lastname@example.org.