Chương 2 - Risk Management
Chương 2 - Risk Management
Risk Management
CHAPTER
2
This chapter presents the following:
• Risk management (assessing risks, responding to risks, monitoring risks)
• Supply chain risk management
• Business continuity
A ship in harbor is safe, but that is not what ships are built for.
—William G.T. Shedd
We next turn our attention to the concept that should underlie every decision made
when defending our information systems: risk. Risk is so important to understand as a
cybersecurity professional that we not only cover it in detail in this chapter (one of the
longest in the book) but also return to it time and again in the rest of the book. We start
off narrowly by focusing on the vulnerabilities in our organizations and the threats that
would exploit them to cause us harm. That sets the stage for an in-depth discussion of
the main components of risk management: framing, assessing, responding to, and moni-
toring risks. We pay particular attention to supply chain risks, since these represent a big
problem to which many organizations pay little or no attention. Finally, we’ll talk about
business continuity because it is so closely linked to risk management. We’ll talk about
disaster recovery, a closely related concept, in later chapters.
53
• Physical damage Fire, water, vandalism, power loss, and natural disasters
• Human interaction Accidental or intentional action or inaction that can
disrupt productivity
• Equipment malfunction Failure of systems and peripheral devices
• Inside and outside attacks Hacking, cracking, and attacking
• Misuse of data Sharing trade secrets, fraud, espionage, and theft
• Loss of data Intentional or unintentional loss of information to unauthorized
parties
• Application error Computation errors, input errors, and software defects
Threats must be identified, classified by category, and evaluated to calculate their
damage potential to the organization. Real risk is hard to measure, but prioritizing the
potential risks in the order of which ones must be addressed first is obtainable.
PART I
Technology (NIST) Special Publication (SP) 800-39, Managing Information Security
Risk, defines three tiers to risk management:
These tiers are dependent on each other, as shown in Figure 2-1. Risk management
starts with decisions made at the organization tier, which flow down to the other two
tiers. Feedback on the effects of these decisions flows back up the hierarchy to inform
the next set of decisions to be made. Carrying out risk management properly means
that you have a holistic understanding of your organization, the threats it faces, the
countermeasures that can be put into place to deal with those threats, and continuous
monitoring to ensure the acceptable risk level is being met on an ongoing basis.
Figure 2-1 The three tiers of risk management (Source: NIST SP 800-39)
PART I
• The development of metrics and performance indicators so as to measure and
manage various types of risks
• The ability to identify and assess new risks as the environment and organization
change
• The integration of ISRM and the organization’s change control process to ensure
that changes do not introduce new vulnerabilities
Obviously, this list is a lot more than just buying a new shiny firewall and calling the
organization safe.
The ISRM team, in most cases, is not made up of employees with the dedicated
task of risk management. It consists of people who already have a full-time job in the
organization and are now tasked with something else. Thus, senior management support
is necessary so proper resource allocation can take place.
Of course, all teams need a leader, and ISRM is no different. One individual should be
singled out to run this rodeo and, in larger organizations, this person should be spending
50 to 70 percent of their time in this role. Management must dedicate funds to making
sure this person receives the necessary training and risk analysis tools to ensure it is a
successful endeavor.
• Frame risk Risk framing defines the context within which all other risk
activities take place. What are our assumptions and constraints? What are the
organizational priorities? What is the risk tolerance of senior management?
• Assess risk Before we can take any action to mitigate risk, we have to assess
it. This is perhaps the most critical aspect of the process, and one that we will
discuss at length. If your risk assessment is spot-on, then the rest of the process
becomes pretty straightforward.
• Respond to risk By now, we’ve done our homework. We know what we should,
must, and can’t do (from the framing component), and we know what we’re up
against in terms of threats, vulnerabilities, and attacks (from the assess component).
Responding to the risk becomes a matter of matching our limited resources with
our prioritized set of controls. Not only are we mitigating significant risk, but,
more importantly, we can tell our bosses what risk we can’t do anything about
because we’re out of resources.
Frame
Monitor Respond
• Monitor risk No matter how diligent we’ve been so far, we probably missed
something. If not, then the environment likely changed (perhaps a new
threat source emerged or a new system brought new vulnerabilities). In order
to stay one step ahead of the bad guys, we need to continuously monitor the
effectiveness of our controls against the risks for which we designed them.
You will notice that our discussion of risk so far has dealt heavily with the whole
framing process. In the preceding sections, we’ve talked about the organization (top to
bottom), the policies, and the team. The next step is to assess the risk, and what better
way to start than by understanding threats and the vulnerabilities they might exploit.
PART I
particular, are riddled with vulnerabilities even in the best-defended cases. One need only
read news accounts of the compromise of the highly protected and classified systems of
defense contractors and even governments to see that this universal principle is true. To
properly analyze vulnerabilities, it is useful to recall that information systems consist of
information, processes, and people that are typically, but not always, interacting with
computer systems. Since we discuss computer system vulnerabilities in detail in Chapter 6,
we will briefly discuss the other three components here.
Information In almost every case, the information at the core of our information
systems is the most valuable asset to a potential adversary. Information within a computer
information system (CIS) is represented as data. This information may be stored (data
at rest), transported between parts of our system (data in transit), or actively being used
by the system (data in use). In each of its three states, the information exhibits different
vulnerabilities, as listed in the following examples:
• Data at rest Data is copied to a thumb drive and given to unauthorized parties
by an insider, thus compromising its confidentiality.
• Data in transit Data is modified by an external actor intercepting it on the
network and then relaying the altered version (known as a man-in-the-middle or
MitM attack), thus compromising its integrity.
• Data in use Data is deleted by a malicious process exploiting a “time-of-check
to time-of-use” (TOC/TOU) or “race condition” vulnerability, thus compromising
its availability.
Processes Most organizations implement standardized processes to ensure the
consistency and efficiency of their services and products. It turns out, however, that
efficiency is pretty easy to hack. Consider the case of shipping containers. Someone
wants to ship something from point A to point B, say a container of bananas from
Brazil to Belgium. Once the shipping order is placed and the destination entered, that
information flows from the farm to a truck carrier, to the seaport of origin to the ocean
carrier, to the destination seaport, to another truck carrier, and finally to its destination
at some distribution center in Antwerp. In most cases, nobody pays a lot of attention
to the address once it is entered. But what if an attacker knew this and changed the
address while the shipment was at sea? The attacker could have the shipment show up
at a different destination and even control the arrival time. This technique has actually
been used by drug and weapons smuggling gangs to get their “bananas” to where they
need them.
This sort of attack is known as business process compromise (BPC) and is commonly
targeted at the financial sector, where transaction amounts, deposit accounts, or other
parameters are changed to funnel money to the attackers’ pockets. Since business processes
are almost always instantiated in software as part of a CIS, process vulnerabilities can be
thought of as a specific kind of software vulnerability. As security professionals, however,
Threats
As you identify the vulnerabilities that are inherent to your organization and its systems,
it is important to also identify the sources that could attack them. The International
Organization for Standardization and the International Electrotechnical Commission in
their joint ISO/IEC standard 27000 define a threat as a “potential cause of an unwanted
incident, which can result in harm to a system or organization.” While this may sound
somewhat vague, it is important to include the full breadth of possibilities. When a threat
is one or more humans, we typically use the term threat actor or threat agent. Let’s start
with the most obvious: malicious humans.
Cybercriminals Cybercriminals are the most common threat actors encountered by
individuals and organizations. Most cybercriminals are motivated by greed, but some
just enjoy breaking things. Their skills run the gamut, from so-called script kiddies with
just a basic grasp of hacking (but access to someone else’s scripts or tools) to sophisticated
cybercrime gangs who develop and sometimes sell or rent their services and tools to
others. Cybercrime is the fastest-growing sector of criminal activity in many countries.
One of the factors that makes cybercrime so pervasive is that every connected device
is a target. Some devices are immediately monetizable, such as your personal smartphone
or home computer containing credentials, payment card information, and access to your
financial institutions. Other targets provide bigger payouts, such as the finance systems
in your place of work. Even devices that are not, by themselves, easily monetizable can
be hijacked and joined into a botnet to spread malware, conduct distributed denial-of-
service (DDoS) attacks, or serve as staging bases from which to attack other targets.
Nation-State Actors Whereas cybercriminals tend to cast a wide net in an effort to
maximize their profits, nation-state actors (or simply state actors) are very selective in
PART I
property, etc.) for extended periods. After their presence is established, state actors may
use prepositioned assets to trigger devastating effects in response to world events. Though
their main motivations tend to be espionage and gaining persistent access to critical
infrastructure, some state actors maintain good relations with cybercrime groups in their
own country, mostly for the purposes of plausible deniability. By collaborating with these
criminals, state actors can make it look as if an attack against another nation was a crime
and not an act of war. At least one country is known to use its national offensive cyber
capabilities for financial profit, stealing millions of dollars all over the world.
Many security professionals consider state actors a threat mostly to government
organizations, critical infrastructure like power plants, and anyone with sophisticated
research and development capabilities. In reality, however, these actors can and do target
other organizations, typically to use them as a springboard into their ultimate targets. So,
even if you work for a small company that seems uninteresting to a foreign nation, you
could find your company in a state actor’s crosshairs.
Hacktivists Hacktivists use cyberattacks to effect political or social change. The term
covers a diverse ecosystem, encompassing individuals and groups of various skillsets
and capabilities. Hacktivists’ preferred objectives are highly visible to the public or
yield information that, when made public, aims to embarrass government entities or
undermine public trust in them.
Internal Actors Internal actors are people within the organization, such as employees,
former employees, contractors, or business associates, who have inside information
concerning the organization’s security practices, data, and computer systems. Broadly
speaking, there are two types of insider threats: negligent and malicious. A negligent insider
is one who fails to exercise due care, which puts their organization at risk. Sometimes,
these individuals knowingly violate policies or disregard procedures, but they are not doing
so out of malicious intent. For example, an employee could disregard a policy requiring
visitors to be escorted at all times because someone shows up wearing the uniform of a
telecommunications company and claiming to be on site to fix an outage. This insider trusts
the visitor, which puts the organization at risk, particularly if that person is an impostor.
The second type of insider threat is characterized by malicious intent. Malicious
insiders use the knowledge they have about their organization either for their own
advantage (e.g., to commit fraud) or to directly cause harm (e.g., by deleting sensitive
files). While some malicious insiders plan their criminal activity while they are employees
in good standing, others are triggered by impending termination actions. Knowing (or
suspecting) that they’re about to be fired, they may attempt to steal sensitive data (such as
customer contacts or design documents) before their access is revoked. Other malicious
insiders may be angry and plant malware or destroy assets in an act of revenge. This
insider threat highlights the need for the “zero trust” secure design principle (discussed in
Chapter 9). It is also a really good reason to practice the termination processes discussed
in Chapter 1.
PART I
are inputting values incorrectly into programs, misusing technology, or modifying data
in an inappropriate manner.
After the ISRM team has identified the vulnerabilities and associated threats, it must
investigate the ramifications of any of those vulnerabilities being exploited. Risks have
loss potential, meaning that the organization could lose assets or revenues if a threat agent
actually exploited a vulnerability. The loss may be corrupted data, destruction of systems
and/or the facility, unauthorized disclosure of confidential information, a reduction in
employee productivity, and so on. When performing a risk assessment, the team also
must look at delayed loss when assessing the damages that can occur. Delayed loss is
secondary in nature and takes place well after a vulnerability is exploited. Delayed loss
may include damage to the organization’s reputation, loss of market share, accrued late
penalties, civil suits, the delayed collection of funds from customers, resources required
to reimage other compromised systems, and so forth.
For example, if a company’s web servers are attacked and taken offline, the immediate
damage (loss potential) could be data corruption, the man-hours necessary to place
the servers back online, and the replacement of any code or components required. The
company could lose revenue if it usually accepts orders and payments via its website. If
getting the web servers fixed and back online takes a full day, the company could lose
a lot more sales and profits. If getting the web servers fixed and back online takes a full
week, the company could lose enough sales and profits to not be able to pay other bills
and expenses. This would be a delayed loss. If the company’s customers lose confidence
in it because of this activity, the company could lose business for months or years. This
is a more extreme case of delayed loss.
These types of issues make the process of properly quantifying losses that specific
threats could cause more complex, but they must be taken into consideration to ensure
reality is represented in this type of analysis.
Assessing Risks
A risk assessment, which is really a tool for risk management, is a method of identify-
ing vulnerabilities and threats and assessing the possible impacts to determine where to
implement security controls. After parts of a risk assessment are carried out, the results
are analyzed. Risk analysis is a detailed examination of the components of risk that is used
to ensure that security is cost-effective, relevant, timely, and responsive to threats. It is
easy to apply too much security, not enough security, or the wrong security controls and
to spend too much money in the process without attaining the necessary objectives. Risk
analysis helps organizations prioritize their risks and shows management the amount of
resources that should be applied to protecting against those risks in a sensible manner.
EXAM TIP The terms risk assessment and risk analysis, depending on who
you ask, can mean the same thing, or one must follow the other, or one is
a subpart of the other. Here, we treat risk assessment as the broader effort,
which is reinforced by specific risk analysis tasks as needed. This is how you
should think of it for the CISSP exam.
Risk analysis provides a cost/benefit comparison, which compares the annualized cost of
controls to the potential cost of loss. A control, in most cases, should not be implemented
unless the annualized cost of loss exceeds the annualized cost of the control itself. This
means that if a facility is worth $100,000, it does not make sense to spend $150,000
trying to protect it.
It is important to figure out what you are supposed to be doing before you dig right in
and start working. Anyone who has worked on a project without a properly defined scope
can attest to the truth of this statement. Before an assessment is started, the team must
carry out project sizing to understand what assets and threats should be evaluated. Most
assessments are focused on physical security, technology security, or personnel security.
Trying to assess all of them at the same time can be quite an undertaking.
One of the risk assessment team’s tasks is to create a report that details the asset
valuations. Senior management should review and accept the list and use these values
to determine the scope of the risk management project. If management determines
at this early stage that some assets are not important, the risk assessment team should
not spend additional time or resources evaluating those assets. During discussions with
management, everyone involved must have a firm understanding of the value of the
security CIA triad—confidentiality, integrity, and availability—and how it directly
relates to business needs.
Management should outline the scope of the assessment, which most likely will be
dictated by organizational compliance requirements as well as budgetary constraints.
Many projects have run out of funds, and consequently stopped, because proper project
sizing was not conducted at the onset of the project. Don’t let this happen to you.
A risk assessment helps integrate the security program objectives with the organization’s
business objectives and requirements. The more the business and security objectives are
in alignment, the more successful both will be. The assessment also helps the organization
draft a proper budget for a security program and its constituent security components.
Once an organization knows how much its assets are worth and the possible threats those
assets are exposed to, it can make intelligent decisions about how much money to spend
protecting those assets.
A risk assessment must be supported and directed by senior management if it is to be
successful. Management must define the purpose and scope of the effort, appoint a team
to carry out the assessment, and allocate the necessary time and funds to conduct it. It is
essential for senior management to review the outcome of the risk assessment and to act
on its findings. After all, what good is it to go through all the trouble of a risk assessment
and not react to its findings? Unfortunately, this does happen all too often.
PART I
we must understand the value of an asset that could be impacted by a threat. The value
placed on information is relative to the parties involved, what work was required to
develop it, how much it costs to maintain, what damage would result if it were lost or
destroyed, how much money enemies would pay for it, and what liability penalties could
be endured. If an organization does not know the value of the information and the other
assets it is trying to protect, it does not know how much money and time it should spend
on protecting them. If the calculated value of your company’s secret formula is x, then the
total cost of protecting it should be some value less than x. Knowing the value of our infor-
mation allows us to make quantitative cost/benefit comparisons as we manage our risks.
The preceding logic applies not only to assessing the value of information and protecting
it but also to assessing the value of the organization’s other assets, such as facilities, systems,
and even intangibles like the value of the brand, and protecting them. The value of the
organization’s facilities must be assessed, along with all printers, workstations, servers,
peripheral devices, supplies, and employees. You do not know how much is in danger of
being lost if you don’t know what you have and what it is worth in the first place.
The actual value of an asset is determined by the importance it has to the organization
as a whole. The value of an asset should reflect all identifiable costs that would arise if
the asset were actually impaired. If a server cost $4,000 to purchase, this value should
not be input as the value of the asset in a risk assessment. Rather, the cost of replacing or
repairing it, the loss of productivity, and the value of any data that may be corrupted or
lost must be accounted for to properly capture the amount the organization would lose
if the server were to fail for one reason or another.
The following issues should be considered when assigning values to assets:
Understanding the value of an asset is the first step to understanding what security
mechanisms should be put in place and what funds should go toward protecting it. A very
important question is how much it could cost the organization to not protect the asset.
PART I
productivity and how it would affect the organization overall. If the risk assessment team
is unable to include members from various departments, it should, at the very least, make
sure to interview people in each department so it fully understands and can quantify
all threats.
The risk assessment team must also include people who understand the processes that
are part of their individual departments, meaning individuals who are at the right levels
of each department. This is a difficult task, since managers sometimes delegate any sort
of risk assessment task to lower levels within the department. However, the people who
work at these lower levels may not have adequate knowledge and understanding of the
processes that the risk assessment team may need to deal with.
NIST SP 800-30
NIST SP 800-30, Revision 1, Guide for Conducting Risk Assessments, is specific to infor-
mation systems threats and how they relate to information security risks. It lays out the
following steps:
1. Prepare for the assessment.
2. Conduct the assessment:
a. Identify threat sources and events.
b. Identify vulnerabilities and predisposing conditions.
c. Determine likelihood of occurrence.
d. Determine magnitude of impact.
e. Determine risk.
3. Communicate results.
4. Maintain assessment.
FRAP
Facilitated Risk Analysis Process (FRAP) is a second type of risk assessment methodology.
The crux of this qualitative methodology is to focus only on the systems that really need
assessing, to reduce costs and time obligations. FRAP stresses prescreening activities so
that the risk assessment steps are only carried out on the item(s) that needs it the most.
FRAP is intended to be used to analyze one system, application, or business process at a
time. Data is gathered and threats to business operations are prioritized based upon their
criticality. The risk assessment team documents the controls that need to be put into place
to reduce the identified risks along with action plans for control implementation efforts.
This methodology does not support the idea of calculating exploitation probability
numbers or annualized loss expectancy values. The criticalities of the risks are
determined by the team members’ experience. The author of this methodology (Thomas
Peltier) believes that trying to use mathematical formulas for the calculation of risk is too
confusing and time consuming. The goal is to keep the scope of the assessment small and
the assessment processes simple to allow for efficiency and cost-effectiveness.
OCTAVE
The Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) meth-
odology was created by Carnegie Mellon University’s Software Engineering Institute
(SIE). OCTAVE is intended to be used in situations where people manage and direct
the risk evaluation for information security within their organization. This places the
people who work inside the organization in the power positions of being able to make
the decisions regarding what is the best approach for evaluating the security of their
organization. OCTAVE relies on the idea that the people working in these environments
best understand what is needed and what kind of risks they are facing. The individuals
who make up the risk assessment team go through rounds of facilitated workshops. The
facilitator helps the team members understand the risk methodology and how to apply it
to the vulnerabilities and threats identified within their specific business units. OCTAVE
stresses a self-directed team approach.
The scope of an OCTAVE assessment is usually very wide compared to the more
focused approach of FRAP. Where FRAP would be used to assess a system or application,
OCTAVE would be used to assess all systems, applications, and business processes within
the organization.
The OCTAVE methodology consists of the seven processes (or steps) listed here:
PART I
6. Perform infrastructure vulnerability evaluation.
7. Conduct multidimensional risk analysis.
8. Develop protection strategy.
FMEA
Failure Modes and Effect Analysis (FMEA) is a method for determining functions, iden-
tifying functional failures, and assessing the causes of failure and their failure effects
through a structured process. FMEA is commonly used in product development and
operational environments. The goal is to identify where something is most likely going to
break and either fix the flaws that could cause this issue or implement controls to reduce
the impact of the break. For example, you might choose to carry out an FMEA on your
organization’s network to identify single points of failure. These single points of failure
represent vulnerabilities that could directly affect the productivity of the network as a
whole. You would use this structured approach to identify these issues (vulnerabilities),
assess their criticality (risk), and identify the necessary controls that should be put into
place (reduce risk).
The FMEA methodology uses failure modes (how something can break or fail) and
effects analysis (impact of that break or failure). The application of this process to a
chronic failure enables the determination of where exactly the failure is most likely
to occur. Think of it as being able to look into the future and locate areas that have
the potential for failure and then applying corrective measures to them before they do
become actual liabilities.
By following a specific order of steps, the best results can be maximized for an FMEA:
Table 2-2 is an example of how an FMEA can be carried out and documented.
Although most organizations will not have the resources to do this level of detailed work
for every system and control, an organization can carry it out on critical functions and
systems that can drastically affect the organization.
FMEA was first developed for systems engineering. Its purpose is to examine the
potential failures in products and the processes involved with them. This approach
proved to be successful and has been more recently adapted for use in evaluating risk
management priorities and mitigating known threat vulnerabilities.
70
CISSP All-in-One Exam Guide
Prepared by:
Approved by:
Date:
All-In-One / CISSP® All-in-One Exam Guide, Ninth Edition / Maymí / 737-6 / Chapter 2
Revision:
Failure Effect on . . .
Component Failure
Failure Failure or Functional Next Higher Detection
Item Identification Function Mode Cause Assembly Assembly System Method
IPS application Inline Fails to Traffic Single point of IPS blocks IPS is Health check
content filter perimeter close overload failure Denial of ingress traffic brought status sent
protection service stream down to console
and e-mail
to security
administrator
Central antivirus Push updated Fails to Central Individual Network is Central Heartbeat
signature update signatures to provide server node’s antivirus infected with server can status check
engine all servers and adequate, goes software is not malware be infected sent to central
workstations timely down updated and/or console,
protection infect other and e-mail
against systems to network
malware administrator
Fire suppression Suppress fire Fails to Water None Building 1 Fire Suppression
water pipes in building close in pipes has no suppression sensors tied
1 in 5 zones freezes suppression system pipes directly into
agent break fire system
available central console
Etc.
Table 2-2 How an FMEA Can Be Carried Out and Documented
15/09/21 12:35 PM
All-In-One / CISSP® All-in-One Exam Guide, Ninth Edition / Maymí / 737-6 / Chapter 2
PART I
This methodical way of identifying potential pitfalls is coming into play more as the need
for risk awareness—down to the tactical and operational levels—continues to expand.
• False alarms
• Insufficient error handling
• Sequencing or order
• Incorrect timing outputs
• Valid but unexpected outputs
EXAM TIP A risk assessment is used to gather data. A risk analysis examines
the gathered data to produce results that can be acted upon.
The next thing we need to figure out is if our risk analysis approach should be
quantitative or qualitative in nature. A quantitative risk analysis is used to assign monetary
and numeric values to all elements of the risk analysis process. Each element within the
analysis (asset value, threat frequency, severity of vulnerability, impact damage, safeguard
costs, safeguard effectiveness, uncertainty, and probability items) is quantified and
entered into equations to determine total and residual risks. It is more of a scientific or
mathematical approach (objective) to risk analysis compared to qualitative. A qualitative
risk analysis uses a “softer” approach to the data elements of a risk analysis. It does not
quantify that data, which means that it does not assign numeric values to the data so
that it can be used in equations. As an example, the results of a quantitative risk analysis
could be that the organization is at risk of losing $100,000 if a buffer overflow were
exploited on a web server, $25,000 if a database were compromised, and $10,000 if a
file server were compromised. A qualitative risk analysis would not present these findings
in monetary values, but would assign ratings to the risks, as in Red, Yellow, and Green.
A quantitative analysis uses risk calculations that attempt to predict the level of
monetary losses and the probability for each type of threat. Qualitative analysis does not
PART I
Quantitative and qualitative approaches have their own pros and cons, and each applies
more appropriately to some situations than others. An organization’s management and
risk analysis team, and the tools they decide to use, will determine which approach is best.
In the following sections we will dig into the depths of quantitative analysis and then
revisit the qualitative approach. We will then compare and contrast their attributes.
EXAM TIP Remember that vulnerability assessments are different from risk
assessments. A vulnerability assessment just finds the vulnerabilities (the
holes). A risk assessment calculates the probability of the vulnerabilities
being exploited and the associated business impact.
The objective of these tools is to reduce the manual effort of these tasks, perform
calculations quickly, estimate future expected losses, and determine the effectiveness and
benefits of the security countermeasures chosen. Most automatic risk analysis products
port information into a database and run several types of scenarios with different
parameters to give a panoramic view of what the outcome will be if different threats
come to bear. For example, after such a tool has all the necessary information inputted,
it can be rerun several times with different parameters to compute the potential outcome
if a large fire were to take place; the potential losses if a virus were to damage 40 percent
of the data on the main file server; how much the organization would lose if an attacker
were to steal all the customer credit card information held in three databases; and so on.
Running through the different risk possibilities gives an organization a more detailed
understanding of which risks are more critical than others, and thus which ones to
address first.
Uncertainty
In risk analysis, uncertainty refers to the degree to which you lack confidence
in an estimate. This is expressed as a percentage, from 0 to 100 percent. If you
have a 30 percent confidence level in something, then it could be said you have a
70 percent uncertainty level. Capturing the degree of uncertainty when carrying out
a risk analysis is important, because it indicates the level of confidence the team and
management should have in the resulting figures.
PART I
Facility Fire $230,000 0.1 $23,000
Trade secret Stolen $40,000 0.01 $400
File server Failed $11,500 0.1 $1,150
Business data Ransomware $283,000 0.1 $28,300
Customer Stolen $300,000 3.0 $900,000
credit card info
Table 2-3 Breaking Down How SLE and ALE Values Are Used
Now that we have all these numbers, what do we do with them? Let’s look at the
example in Table 2-3, which shows the outcome of a quantitative risk analysis. With this
data, the organization can make intelligent decisions on what threats must be addressed
first because of the severity of the threat, the likelihood of it happening, and how much
could be lost if the threat were realized. The organization now also knows how much
money it should spend to protect against each threat. This will result in good business
decisions, instead of just buying protection here and there without a clear understanding
of the big picture. Because the organization’s risk from a ransomware incident is $28,300,
it would be justified in spending up to this amount providing ransomware preventive
measures such as offline file backups, phishing awareness training, malware detection
and prevention, or insurance.
When carrying out a quantitative analysis, some people mistakenly think that the
process is purely objective and scientific because data is being presented in numeric
values. But a purely quantitative analysis is hard to achieve because there is still some
subjectivity when it comes to the data. How do we know that a fire will only take place
once every 10 years? How do we know that the damage from a fire will be 25 percent
of the value of the asset? We don’t know these values exactly, but instead of just pulling
them out of thin air, they should be based upon historical data and industry experience.
In quantitative risk analysis, we can do our best to provide all the correct information,
and by doing so we will come close to the risk values, but we cannot predict the future
and how much future incidents will cost us or the organization.
Consequences
Likelihood
Insignificant Minor Moderate Major Severe
Almost certain M H H E E
Likely M M H H E
Possible L M M H E
Unlikely L M M M H
Rare L L M M H
PART I
The Delphi technique is a group decision method used to ensure that each member
gives an honest opinion of what he or she thinks the result of a particular threat
will be. This avoids a group of individuals feeling pressured to go along with others’
thought processes and enables them to participate in an independent and anony-
mous way. Each member of the group provides his or her opinion of a certain threat
and turns it in to the team that is performing the analysis. The results are compiled
and distributed to the group members, who then write down their comments anon-
ymously and return them to the analysis group. The comments are compiled and
redistributed for more comments until a consensus is formed. This method is used
to obtain an agreement on cost, loss values, and probabilities of occurrence without
individuals having to agree verbally.
IT manager 4 2 4 4 3 2
Database 4 4 4 3 4 1
administrator
Application 2 3 3 4 2 1
programmer
System 3 4 3 4 2 1
operator
Operational 5 4 4 4 4 2
manager
Results 3.6 3.4 3.6 3.8 3 1.4
Table 2-4 Example of a Qualitative Analysis
Quantitative Cons:
• Calculations can be complex. Can management understand how these values
were derived?
• Without automated tools, this process is extremely laborious.
• More preliminary work is needed to gather detailed information about the
environment.
• Standards are not available. Each vendor has its own way of interpreting the
processes and their results.
PART I
• Eliminates the opportunity to create a dollar value for cost/benefit discussions.
• Developing a security budget from the results is difficult because monetary values
are not used.
• Standards are not available. Each vendor has its own way of interpreting the
processes and their results.
Responding to Risks
Once an organization knows the amount of total and residual risk it is faced with, it must
decide how to handle it. Risk can be dealt with in four basic ways: transfer it, avoid it,
reduce it, or accept it.
Many types of insurance are available to organizations to protect their assets. If an
organization decides the total risk is too high to gamble with, it can purchase insurance,
which would transfer the risk to the insurance company.
If an organization decides to terminate the activity that is introducing the risk, this is
known as risk avoidance. For example, if a company allows employees to use instant messaging
(IM), there are many risks surrounding this technology. The company could decide not to
allow any IM activity by employees because there is not a strong enough business need for
its continued use. Discontinuing this service is an example of risk avoidance.
Another approach is risk mitigation, where the risk is reduced to a level considered
acceptable enough to continue conducting business. The implementation of firewalls,
training, and intrusion/detection protection systems or other control types represent
types of risk mitigation efforts.
The last approach is to accept the risk, which means the organization understands the
level of risk it is faced with, as well as the potential cost of damage, and decides to just
live with it and not implement the countermeasure. Many organizations will accept risk
when the cost/benefit ratio indicates that the cost of the countermeasure outweighs the
potential loss value.
A crucial issue with risk acceptance is understanding why this is the best approach
for a specific situation. Unfortunately, today many people in organizations are accepting
risk and not understanding fully what they are accepting. This usually has to do with
the relative newness of risk management in the security field and the lack of education
and experience in those personnel who make risk decisions. When business managers are
charged with the responsibility of dealing with risk in their department, most of the time
MANAGEMENT
PART I
acceptable level. As stated earlier, no system or environment is 100 percent secure, which
means there is always some risk left over to deal with. This is called residual risk.
Residual risk is different from total risk, which is the risk an organization faces if
it chooses not to implement any type of safeguard. An organization may choose to
take on total risk if the cost/benefit analysis results indicate this is the best course of
action. For example, if there is a small likelihood that an organization’s web servers can
be compromised and the necessary safeguards to provide a higher level of protection
cost more than the potential loss in the first place, the organization will choose not to
implement the safeguard, choosing to deal with the total risk.
There is an important difference between total risk and residual risk and which type of
risk an organization is willing to accept. The following are conceptual formulas:
threats × vulnerability × asset value = total risk
(threats × vulnerability × asset value) × controls gap = residual risk
You may also see these concepts illustrated as the following:
total risk – countermeasures = residual risk
NOTE The previous formulas are not constructs you can actually plug
numbers into. They are instead used to illustrate the relation of the
different items that make up risk in a conceptual manner. This means no
multiplication or mathematical functions actually take place. It is a means
of understanding what items are involved when defining either total or
residual risk.
During a risk assessment, the threats and vulnerabilities are identified. The possibility of
a vulnerability being exploited is multiplied by the value of the assets being assessed, which
results in the total risk. Once the controls gap (protection the control cannot provide)
is factored in, the result is the residual risk. Implementing countermeasures is a way of
mitigating risks. Because no organization can remove all threats, there will always be some
residual risk. The question is what level of risk the organization is willing to accept.
Control Selection
A security control must make good business sense, meaning it is cost-effective (its benefit
outweighs its cost). This requires another type of analysis: a cost/benefit analysis. A com-
monly used cost/benefit calculation for a given safeguard (control) is
(ALE before implementing safeguard) – (ALE after implementing safeguard) –
(annual cost of safeguard) = value of safeguard to the organization
For example, if the ALE of the threat of a hacker bringing down a web server is
$12,000 prior to implementing the suggested safeguard, and the ALE is $3,000 after
implementing the safeguard, while the annual cost of maintenance and operation of the
safeguard is $650, then the value of this safeguard to the organization is $8,350 each year.
Recall that the ALE has two factors, the single loss expectancy and the annual rate of
occurrence, so safeguards can decrease either or both. The countermeasure referenced
in the previous example could aim to reduce the costs associated with restoring the web
server, or make it less likely that it is brought down, or both. All too often, we focus our
attention on making the threat less likely, while, in some cases, it might be less expensive
to make it easier to recover.
The cost of a countermeasure is more than just the amount filled out on the purchase
order. The following items should be considered and evaluated when deriving the full
cost of a countermeasure:
• Product costs
• Design/planning costs
• Implementation costs
• Environment modifications (both physical and logical)
• Compatibility with other countermeasures
• Maintenance requirements
• Testing requirements
• Repair, replacement, or update costs
• Operating and support costs
• Effects on productivity
• Subscription costs
• Extra staff-hours for monitoring and responding to alerts
PART I
tools automate tasks, many organizations were not even carrying out these tasks before,
so they do not save on staff-hours, but many times require more hours. For example,
Company A decides that to protect many of its resources, purchasing an intrusion
detection system is warranted. So, the company pays $5,500 for an IDS. Is that the
total cost? Nope. This software should be tested in an environment that is segmented
from the production environment to uncover any unexpected activity. After this testing
is complete and the security group feels it is safe to insert the IDS into its production
environment, the security group must install the monitoring management software,
install the sensors, and properly direct the communication paths from the sensors to
the management console. The security group may also need to reconfigure the routers
to redirect traffic flow, and it definitely needs to ensure that users cannot access the IDS
management console. Finally, the security group should configure a database to hold all
attack signatures and then run simulations.
Costs associated with an IDS alert response should most definitely be considered.
Now that Company A has an IDS in place, security administrators may need additional
alerting equipment such as smartphones. And then there are the time costs associated
with a response to an IDS event.
Anyone who has worked in an IT group knows that some adverse reaction almost
always takes place in this type of scenario. Network performance can take an unacceptable
hit after installing a product if it is an inline or proactive product. Users may no longer
be able to access a server for some mysterious reason. The IDS vendor may not have
explained that two more service patches are necessary for the whole thing to work
correctly. Staff time will need to be allocated for training and to respond to all of the
alerts (true or false) the new IDS sends out.
So, for example, the cost of this countermeasure could be $23,500 for the product
and licenses; $2,500 for training; $3,400 for testing; $2,600 for the loss in user
productivity once the product is introduced into production; and $4,000 in labor for
router reconfiguration, product installation, troubleshooting, and installation of the two
service patches. The real cost of this countermeasure is $36,000. If our total potential
loss was calculated at $9,000, we went over budget by 300 percent when applying this
countermeasure for the identified risk. Some of these costs may be hard or impossible to
identify before they are incurred, but an experienced risk analyst would account for many
of these possibilities.
Types of Controls
In our examples so far, we’ve focused on countermeasures like firewalls and IDSs, but
there are many more options. Controls come in three main categories: administrative,
technical, and physical. Administrative controls are commonly referred to as “soft con-
trols” because they are more management oriented. Examples of administrative controls
are security documentation, risk management, personnel security, and training. Technical
controls (also called logical controls) are software or hardware components, as in firewalls,
IDS, encryption, and identification and authentication mechanisms. And physical controls
• Fence
• Locked external doors
• Closed-circuit TV (CCTV)
• Security guard
• Locked internal doors
• Locked server room
• Physically secured computers (cable locks)
Patch management
Account management
Secure architecture
Asset
Demilitarized zones (DMZs)
Firewalls
Physical security
PART I
• Firewalls
• Intrusion detection system
• Intrusion prevention system
• Antimalware
• Access control
• Encryption
The types of controls that are actually implemented must map to the threats the
organization faces, and the number of layers that are put into place must map to the
sensitivity of the asset. The rule of thumb is the more sensitive the asset, the more layers
of protection that must be put into place.
So the different categories of controls that can be used are administrative, technical,
and physical. But what do these controls actually do for us? We need to understand what
the different control types can provide us in our quest to secure our environments.
The different types of security controls are preventive, detective, corrective, deterrent,
recovery, and compensating. By having a better understanding of the different control
types, you will be able to make more informed decisions about what controls will be best
used in specific situations. The six different control types are as follows:
Preventive: Administrative
• Policies and procedures
• Effective hiring practices
• Pre-employment background checks
• Controlled termination processes
• Data classification and labeling
• Security awareness
Preventive: Physical
• Badges, swipe cards
• Guards, dogs
• Fences, locks, mantraps
Preventive: Technical
• Passwords, biometrics, smart cards
• Encryption, secure protocols, call-back systems, database views, constrained user
interfaces
• Antimalware software, access control lists, firewalls, IPS
Table 2-6 shows how these types of control mechanisms perform different security
functions. Many students get themselves wrapped around the axle when trying to get
their mind around which control provides which functionality. This is how this train
of thought usually takes place: “A security camera system is a detective control, but if
an attacker sees its cameras, it could be a deterrent.” Let’s stop right here. Do not make
this any harder than it has to be. When trying to map the functionality requirement to
a control, think of the main reason that control would be put into place. A firewall tries
to prevent something bad from taking place, so it is a preventive control. Auditing logs
is done after an event took place, so it is detective. A data backup system is developed so
that data can be recovered; thus, this is a recovery control. Computer images are created
so that if software gets corrupted, they can be reloaded; thus, this is a corrective control.
Note that some controls can serve different functions. Security guards can deter
would-be attackers, but if they don’t deter all of them, they can also stop (prevent)
PART I
Category:
Physical
Fences X
Locks X
Badge system X
Security guard X X X X
Mantrap doors X
Lighting X
Motion X
detectors
Closed-circuit X
TVs
Offsite facility X X
Administrative
Security policy X X
Monitoring and X X
supervising
Separation of X
duties
Job rotation X X
Information X
classification
Investigations X
Security X
awareness
training
Technical
ACLs X
Encryption X
Audit logs X
IDS X
Antimalware X X
software
Workstation X
images
Smart cards X
Data backup X
Table 2-6 Control Categories and Types
Control Assessments
Once you select the administrative, technical, and physical controls that you think will
reduce your risks to acceptable levels, you have to ensure that this is actually the case.
PART I
without adversely affecting other mechanisms.
Provides uniform protection A security level is applied in a standardized method to all
mechanisms the control is designed to protect.
Provides override functionality An administrator can override the restriction if necessary.
Defaults to least privilege When installed, the control defaults to a lack of
permissions and rights instead of installing with everyone
having full control.
Independence of control and the The given control can protect multiple assets, and a given
asset it is protecting asset can be protected by multiple controls.
Flexibility and security The more security the control provides, the better. This
functionality should come with flexibility, which enables
you to choose different functions instead of all or none.
Usability The control does not needlessly interfere with users’ work.
Asset protection The asset is still protected even if the countermeasure
needs to be reset.
Easily upgraded Software continues to evolve, and updates should be able
to happen painlessly.
Auditing functionality The control includes a mechanism that provides auditing
at various levels of verbosity.
Minimizes dependence on other The control should be flexible and not have strict requirements
components about the environment into which it will be installed.
Must produce output in usable The control should present important information in a format
and understandable format easy for humans to understand and use for trend analysis.
Testable The control should be able to be tested in different
environments under different situations.
Does not introduce other The control should not provide any covert channels or
compromises back doors.
System and user performance System and user performance should not be greatly
affected by the control.
Proper alerting The control should have the capability for thresholds to be
set as to when to alert personnel of a security breach, and
this type of alert should be acceptable.
Does not affect assets The assets in the environment should not be adversely
affected by the control.
Table 2-7 Characteristics to Consider When Assessing Security Controls
PART I
for personal purposes while they are on breaks. The same organization has implemented
Transport Layer Security (TLS) proxies that decrypt all network traffic in order to conduct
deep packet analysis and mitigate the risk that a threat actor is using encryption to hide
her malicious deeds. Normally, the process is fully automated and no other staff members
look at the decrypted communications. Periodically, however, security staff manually
check the system to ensure everything is working properly. Now, suppose an employee
reveals some very private health information to a friend over her personal webmail and
that traffic is monitored and observed by a security staffer. That breach of privacy could
cause a multitude of ethical, regulatory, and even legal problems for the organization.
When implementing security controls, it is critical to consider their privacy implications.
If your organization has a chief privacy officer (or other privacy professional), that person
should be part of the process of selecting and implementing security controls to ensure
they don’t unduly (or even illegally) violate employee privacy.
Monitoring Risks
We really can’t just build a risk management program (or any program, for that matter),
call it good, and go home. We need a way to assess the effectiveness of our work, identify
deficiencies, and prioritize the things that still need work. We need a way to facilitate
decision making, performance improvement, and accountability through collection,
analysis, and reporting of the necessary information. More importantly, we need to be
able to identify changes in the environment and be able to understand their impacts on
our risk posture. All this needs to be based on facts and metrics. As the saying goes, “You
can’t manage something you can’t measure.”
Risk monitoring is the ongoing process of adding new risks, reevaluating existing
ones, removing moot ones, and continuously assessing the effectiveness of our controls
at mitigating all risks to tolerable levels. Risk monitoring activities should be focused
on three key areas: effectiveness, change, and compliance. The risk management team
should continually look for improvement opportunities, periodically analyze the data
gathered from each key area, and report its findings to senior management. Let’s take a
closer look at how we might go about monitoring and measuring each area.
Effectiveness Monitoring
There are many reasons why the effectiveness of our security controls decreases. Techni-
cal controls may not adapt quickly to changing threat actor behaviors. Employees may
lose awareness of (or interest in) administrative controls. Physical controls may not keep
up with changing behaviors as people move in and through our facilities. How do we
measure this decline in the effectiveness of our controls and, more importantly, the rising
risks to our organizations? This is the crux of effectiveness monitoring.
One approach is to keep track of the number of security incidents by severity.
Let’s say that we implemented controls to reduce the risk of ransomware attacks. We
redesigned our security awareness training, deployed a new endpoint detection and
NOTE The Center for Internet Security (CIS) publishes a helpful (and free)
document titled “CIS Controls Measures and Metrics,” currently in its seventh
version. It provides specific measures for each control as well as goals for
their values in your organization.
Change Monitoring
Even if you keep track of known threats and the risks they pose, it is likely that changes in
your organization’s environment will introduce new risks. There are two major sources of
change that impact your overall risk: information systems and business. The first is per-
haps the most obvious to cybersecurity professionals. A new system is introduced, an old
one retired, or an existing one updated or reconfigured. Any of these changes can produce
new risks or change those you are already tracking. Another source of changes that intro-
duce risks is the business itself. Over time, your organization will embark on new ven-
tures, change internal processes, or perhaps merge with or acquire another organization.
PART I
Monitoring changes to your environment and dealing with the risks they could
introduce is part of a good change management process. Typically, organizations will
have a change advisory board (CAB) or a similarly named standing group that reviews
and approves any changes such as the development of new policies, systems, and business
processes. The CAB measures changes through a variety of metrics that also are used to
monitor risks, such as the following:
Compliance Monitoring
Something else that could change in your organization and affect your risk are legal,
regulatory, and policy requirements. Compliance monitoring is a bit easier than effec-
tiveness monitoring and change monitoring, because compliance tends to change fairly
infrequently. Laws and external regulations usually take years to change, while internal
regulations and policies should be part of the change management process we discussed
previously. Though the frequency of compliance changes is fairly low, these changes can
have significant impacts in the organization. A great example of this is the General Data
Protection Regulation (GDPR) that came into effect in May 2018. It was years in the
making, but it has had huge effects on any organization that stores or processes data
belonging to a person from the European Union (EU).
Another aspect of compliance monitoring is responding to audit findings. Whether it is
an external or internal audit, any findings dealing with compliance need to be addressed.
If the audit reveals risks that are improperly mitigated, the risk team needs to respond to
them. Failure to do so could result in significant fines or even criminal charges.
So, what can we measure to monitor our compliance? It varies among organizations,
but here are some common metrics to consider:
Risk Reporting
Risk reporting is an essential component of risk management in general and risk moni-
toring in particular. (Recall that risk management encompasses framing, assessing,
responding to, and monitoring the risks.) Reporting enables organizational decision-
making, security governance, and day-to-day operations. It is also important for compli-
ance purposes.
So, how should we report risks? There is no set formula for reporting, but there are a
couple of guiding principles. The first one is to understand the audience. There are at
least three groups at which you may target risk reports: executives (and board members),
managers, and risk owners. Each requires a different approach.
3 7
High 5
4 6
Impact
8 10
Medium
9
12
Low 11
13
Very Low 15
14
PART I
Managers
Managers across the organization will need much more detailed reports because they are
responsible for, well, managing the risks. They will want to know current risks and how
they’ve been trending over time. Are risks decreasing or increasing? Either way, why?
Where does progress seem to be stuck? These are some of the questions managers will
want the report to answer. They will also want to be able to drill into specific items of
interest to get into the details, such as who owns the risk, how we are responding to the
risk, and why the current approach may not be working.
Many organizations rely on risk management dashboards for this level of reporting.
These dashboards may be part of a risk management tool, in which case they’d be
interactive and allow drilling into specific items in the report. Organizations without
these automated tools typically use spreadsheets to generate graphs (showing trends over
time) or even manually developed slides. Whatever the approach, the idea is to present
actionable information allowing business unit managers to track their progress over time
with respect to risks.
Risk Owners
This is the internal audience that needs the most detailed reporting, because the risk
owners are the staff members responsible for managing individual risks. They take direc-
tion from management as they respond to specific risks. For example, if the organization
decides to transfer a given risk, the risk owner will be responsible for ensuring the insur-
ance policy is developed and acquired effectively. This will include performance indica-
tors, such as cost, coverage, and responsiveness. Cybersecurity insurance companies often
require that certain controls be in place in order to provide coverage, so the risk owner
must also ensure that these conditions are met so that the premiums are not being paid
in vain.
Continuous Improvement
Only by reassessing the risks on a periodic basis can the risk management team’s state-
ments on security control performance be trusted. If the risk has not changed and the
safeguards implemented are functioning in good order, then it can be said that the risk is
being properly mitigated. Regular risk management monitoring will support the infor-
mation security risk ratings.
Vulnerability analysis and continued asset identification and valuation are also
important tasks of risk management monitoring and performance. The cycle of
continued risk analysis is a very important part of determining whether the safeguard
controls that have been put in place are appropriate and necessary to safeguard the assets
and environment.
Continuous improvement is the practice of identifying opportunities, mitigating
threats, improving quality, and reducing waste as an ongoing effort. It is the hallmark of
mature and effective organizations.
PART I
The basic processes you’ll need to implement to manage risk in your supply chain
are the same ones you use in the rest of your risk management program. The differences
are mainly in what you look at (that is, the scope of your assessments) and what you
can do about it (legally and contractually). A good resource to help integrate supply
chain risk into your risk management program is NIST SP 800-161, Supply Chain Risk
Management Practices for Federal Information Systems and Organizations.
One of the first things you’ll need to do is to create a supply chain map for your
organization. This is essentially a network diagram of who supplies what to whom, down
to your ultimate customers. Figure 2-8 depicts a simplified systems integrator company
(“Your Company”). It has a hardware components manufacturer that supplies it hardware
and is, in turn, supplied by a materials producer. Your Company receives software from a
developer and receives managed security from an external service provider. The hardware
and software components are integrated and configured into Your Company’s product,
which is then shipped to its distributor and on to its customers. In this example, the
company has four suppliers on which to base its supply chain risk assessment. It is also
considered a supplier to its distributor.
Now, suppose the software developer in Figure 2-8 is attacked and the threat actors
insert malicious code into the developer’s software product. Anyone who receives that
application from Your Company, or perhaps through an otherwise legitimate software
update, also gets a very stealthy piece of malware that “phones home” to these actors,
telling them where the malware is and what its host network looks like. These are
sophisticated, nation-state spies intent on remaining undetected while they penetrate
some very specific targets. If an infected organization is of interest to them, they’ll deliver
the next stage of malware with which to quietly explore and steal files. Otherwise, they’ll
Figure 2-8
Simplified supply
chain
Materials Components
Supplier Manufacturer
10
10101
010
Security
Provider
Hardware
One of the major supply chain risks is the addition of hardware Trojans to electronic
components. A hardware Trojan is an electronic circuit that is added to an existing device
in order to compromise its security or provide unauthorized functionality. Depending
on the attacker’s access, these mechanisms can be inserted at any stage of the hardware
development process (specification, design, fabrication, testing, assembly, or packaging).
It is also possible to add them after the hardware is packaged by intercepting shipments
in the supply chain. In this case, the Trojan may be noticeable if the device is opened and
visually inspected. The earlier in the supply chain that hardware Trojans are inserted, the
more difficult they are to detect.
Another supply chain risk to hardware is the substitution of counterfeit components.
The problems with these clones are many, but from a security perspective one of the most
important is that they don’t go through the same quality controls that the real ones do.
This leads to lower reliability and abnormal behavior. It could also lead to undetected
hardware Trojans (perhaps inserted by the illicit manufacturers themselves). Obviously,
using counterfeits could have legal implications and will definitely be a problem when
you need customer support from the manufacturer.
PART I
chain, particularly if it is custom-made for your organization. This could happen if your
supplier reuses components (like libraries) developed elsewhere and to which the attacker
has access. It can also be done by a malicious insider working for the supplier or by a
remote attacker who has gained access to the supplier’s software repositories. Failing all
that, the software could be intercepted in transit to you, modified, and then sent on its
way. This last approach could be made more difficult for the adversary by using code
signing or hashes, but it is still possible.
Services
More organizations are outsourcing services to allow them to focus on their core busi-
ness functions. Organizations use hosting companies to maintain websites and e-mail
servers, service providers for various telecommunication connections, disaster recovery
companies for co-location capabilities, cloud computing providers for infrastructure or
application services, developers for software creation, and security companies to carry out
vulnerability management. It is important to realize that while you can outsource func-
tionality, you cannot outsource risk. When your organization is using these third-party
service providers, it can still be ultimately responsible if something like a data breach
takes place. The following are some things an organization should do to reduce its risk
when outsourcing:
If any requirements are missing, ambiguously stated, or otherwise vitiated, the supplier
agreement can become void, voidable, or unenforceable. So, how do you verify that your
supplier is complying with all contractual requirements dealing with risk? Third-party
assessments are considered best practice and may be required for compliance (e.g., with
PCI DSS). The following are some examples of external evaluations that would indicate
a supplier’s ability to comply with its contractual obligations:
PART I
vider guarantees a certain level of service. If the service is not delivered at the agreed-upon
level (or better), then there are consequences (typically financial) for the service provider.
SLAs provide a mechanism to mitigate some of the risk from service providers in the
supply chain. For example, an Internet service provider (ISP) may sign an SLA of 99.999
percent (commonly called “five nines”) uptime to the Internet backbone. That means
that the ISP guarantees less than 26 seconds of downtime per month.
Business Continuity
Though we strive to drive down the risks of negative effects in our organizations, we can
be sure that sooner or later an event will slip through and cause negative impacts. Ideally,
the losses are contained and won’t affect the major business efforts. However, as security
professionals we need to have plans in place for when the unthinkable happens. Under
those extreme (and sometimes unpredictable) conditions, we need to ensure that our
organizations continue to operate at some minimum acceptable threshold capacity and
quickly bounce back to full productivity.
Business continuity (BC) is an organization’s ability to maintain business functions
or quickly resume them in the event that risks are realized and result in disruptions.
The events can be pretty mundane, such as a temporary power outage, loss of network
connectivity, or a critical employee (such as a systems administrator) suddenly becoming
ill. These events could also be major disasters, such as an earthquake, explosion, or energy
grid failure. Disaster recovery (DR), by contrast to BC, is the process of minimizing the
effects of a disaster or major disruption. It means taking the necessary steps to ensure that
the resources, personnel, and business processes are safe and able to resume operation in a
timely manner. So, DR is part of BC and the disaster recovery plan (DRP) covers a subset
of events compared to the broader business continuity plan (BCP).
EXAM TIP A business continuity plan (BCP) and a disaster recovery plan
(DRP) are related but different. The DRP is a subset of the BCP and is focused
on the immediate aftermath of a disaster. The BCP is much broader and
covers any disruption including (but not limited to) disasters.
A BCP can include getting critical systems to another environment while repair of
the original facilities is underway, getting the right people to the right places during this
time, and performing business in a different mode until regular conditions are back in
place. A BCP also involves dealing with customers, partners, and shareholders through
different channels until everything returns to normal. So, disaster recovery deals with,
Business Continuity
Planning
Senior
management
IT Disaster Recovery
Planning Business lines
Application availability
Property management
While disaster recovery and business continuity planning are directed at the
development of plans, business continuity management (BCM) is the holistic management
process that should cover both of them. BCM provides a framework for integrating
resilience with the capability for effective responses in a manner that protects the
interests of an organization’s key stakeholders. The main objective of BCM is to allow
the organization to continue to perform business operations under various conditions.
PART I
be considered not only in everyday procedures but also in those procedures undertaken
immediately after a disaster or disruption. For instance, it may not be appropriate to leave
a server that holds confidential information in one building while everyone else moves to
another building. Equipment that provides secure VPN connections may be destroyed
and the team might respond by focusing on enabling remote access functionality while
forgetting about the needs of encryption. In most situations the organization is purely
focused on getting back up and running, thus focusing on functionality. If security is not
integrated and implemented properly, the effects of the physical disaster can be amplified
as threat actors come in and steal sensitive information. Many times an organization is
much more vulnerable after a disaster hits, because the security services used to protect it
may be unavailable or operating at a reduced capacity. Therefore, it is important that if
the organization has secret stuff, it stays secret.
Availability is one of the main themes behind business continuity planning, in that
it ensures that the resources required to keep the business going will continue to be
available to the people and systems that rely upon them. This may mean backups need
to be done religiously and that redundancy needs to be factored into the architecture
of the systems, networks, and operations. If communication lines are disabled or if a
service is rendered unusable for any significant period of time, there must be a quick and
tested way of establishing alternative communications and services. We will be diving
into the many ways organizations can implement availability solutions for continuity and
recovery purposes throughout this section.
When looking at business continuity planning, some organizations focus mainly on
backing up data and providing redundant hardware. Although these items are extremely
important, they are just small pieces of the organization’s overall operations pie. Hardware
and computers need people to configure and operate them, and data is usually not useful
unless it is accessible by other systems and possibly outside entities. Thus, a larger picture
1. Develop the continuity planning policy statement. Write a policy that provides the
guidance necessary to develop a BCP and that assigns authority to the necessary
roles to carry out these tasks.
2. Conduct the business impact analysis (BIA). Identify critical functions and systems
and allow the organization to prioritize them based on necessity. Identify
vulnerabilities and threats, and calculate risks.
3. Identify preventive controls. Once threats are recognized, identify and implement
controls and countermeasures to reduce the organization’s risk level in an
economical manner.
4. Create contingency strategies. Formulate methods to ensure systems and critical
functions can be brought online quickly.
5. Develop an information system contingency plan. Write procedures and guidelines
for how the organization can still stay functional in a crippled state.
6. Ensure plan testing, training, and exercises. Test the plan to identify deficiencies
in the BCP, and conduct training to properly prepare individuals on their
expected tasks.
7. Ensure plan maintenance. Put in place steps to ensure the BCP is a living
document that is updated regularly.
PART I
Continuity Identify preventive Create contingency
BIA
policy controls strategies
Why are there so many sets of best practices and which is the best for your organization?
If your organization is part of the U.S. government or a government contracting
organization, then you need to comply with the NIST standards. If your organization
is in Europe or your organization does business with other organizations in Europe,
then you might need to follow the European Union Agency for Cybersecurity (ENISA)
requirements. While we are not listing all of them here, there are other country-based
BCM standards that your organization might need to comply with if it is residing in or
does business in one of those specific countries. If your organization needs to get ISO
certified, then ISO/IEC 27031 and ISO 22301 could be the standards to follow. While
the first of these is focused on IT, the second is broader in scope and addresses the needs
of the entire organization.
PART I
An organization has no real hope of rebuilding itself and its processes after a disaster
if it does not have a good understanding of how its organization works in the first
place. This notion might seem absurd at first. You might think, “Well, of course an
organization knows how it works.” But you would be surprised at how difficult it is
to fully understand an organization down to the level of detail required to rebuild
it. Each individual may know and understand his or her little world within the
organization, but hardly anyone at any organization can fully explain how each and
every business process takes place.
processes. Instead of being considered an outsider, BCP should be “part of the team.”
Further, final responsibility for BCP should belong not to the BCP team or its leader,
but to a high-level executive manager, preferably a member of the executive board. This
will reinforce the image and reality of continuity planning as a function seen as vital to
the organizational chiefs.
By analyzing and planning for potential disruptions to the organization, the BCP
team can assist other business disciplines in their own efforts to effectively plan for and
respond effectively and with resilience to emergencies. Given that the ability to respond
depends on operations and management personnel throughout the organization, such
capability should be developed organization-wide. It should extend throughout every
location of the organization and up the employee ranks to top-tier management.
As such, the BCP program needs to be a living entity. As an organization goes through
changes, so should the program, thereby ensuring it stays current, usable, and effective.
When properly integrated with change management processes, the program stands a much
better chance of being continually updated and improved upon. Business continuity is a
foundational piece of an effective security program and is critical to ensuring relevance
in time of need.
A very important question to ask when first developing a BCP is why it is being
developed. This may seem silly and the answer may at first appear obvious, but that is
not always the case. You might think that the reason to have these plans is to deal with
an unexpected disaster and to get people back to their tasks as quickly and as safely as
possible, but the full story is often a bit different. Why are most companies in business?
To make money and be profitable. If these are usually the main goals of businesses, then
any BCP needs to be developed to help achieve and, more importantly, maintain these
goals. The main reason to develop these plans in the first place is to reduce the risk of
financial loss by improving the company’s ability to recover and restore operations. This
encompasses the goals of mitigating the effects of the disaster.
Not all organizations are businesses that exist to make profits. Government agencies,
military units, nonprofit organizations, and the like exist to provide some type of
protection or service to a nation or society. Whereas a company must create its BCP
to ensure that revenue continues to come in so that the company can stay in business,
PART I
required to successfully survive when an earthquake (or a similar disaster) does hit. The
point of making these plans is to try to think of all the possible disasters that could take
place, estimate the potential damage and loss, categorize and prioritize the potential
disasters, and develop viable alternatives in case those events do actually happen.
A business impact analysis (BIA) is considered a functional analysis, in which a team
collects data through interviews and documentary sources; documents business functions,
activities, and transactions; develops a hierarchy of business functions; and finally applies
a classification scheme to indicate each individual function’s criticality level. But how do
we determine a classification scheme based on criticality levels?
The BCP committee must identify the threats to the organization and map them to
the following characteristics:
The committee will not truly understand all business processes, the steps that must
take place, or the resources and supplies these processes require. So the committee must
gather this information from the people who do know—department managers and
specific employees throughout the organization. The committee starts by identifying
the people who will be part of the BIA data-gathering sessions. The committee needs to
identify how it will collect the data from the selected employees, be it through surveys,
interviews, or workshops. Next, the team needs to collect the information by actually
conducting surveys, interviews, and workshops. Data points obtained as part of the
information gathering will be used later during analysis. It is important that the team
members ask about how different tasks—whether processes, transactions, or services,
along with any relevant dependencies—get accomplished within the organization. The
team should build process flow diagrams, which will be used throughout the BIA and
plan development stages.
Upon completion of the data collection phase, the BCP committee needs to conduct a
BIA to establish which processes, devices, or operational activities are critical. If a system
stands on its own, doesn’t affect other systems, and is of low criticality, then it can be
classified as a tier-two or tier-three recovery step. This means these resources will not be
dealt with during the recovery stages until the most critical (tier one) resources are up and
running. This analysis can be completed using a standard risk assessment as illustrated
in Figure 2-9.
Risk Assessment
To achieve success, the organization should systematically plan and execute a formal
BCP-related risk assessment. The assessment fully takes into account the organization’s
Risk evaluation
Risk treatment
tolerance for continuity risks. The risk assessment also makes use of the data in the BIA
to supply a consistent estimate of exposure.
As indicators of success, the risk assessment should identify, evaluate, and record all
relevant items, which may include
PART I
• Identifying and documenting single points of failure
• Making a prioritized list of threats to the particular business processes of the
organization
• Putting together information for developing a management strategy for risk
control and for developing action plans for addressing risks
• Documenting acceptance of identified risks, or documenting acknowledgment of
risks that will not be addressed
The risk assessment is assumed to take the form of the equation Risk = Threat ×
Impact × Probability. However, the BIA adds the dimension of time to this equation. In
other words, risk mitigation measures should be geared toward those things that might
most rapidly disrupt critical business processes and commercial activities.
The main parts of a risk assessment are
The specific scenarios and damage types can vary from organization to organization.
BIA Steps
The more detailed and granular steps of a BIA are outlined here:
PART I
• Violations of legal and regulatory requirements
• Delayed-income costs
• Loss in revenue
• Loss in productivity
These costs can be direct or indirect and must be properly accounted for.
For instance, if the BCP team is looking at the threat of a terrorist bombing, it is
important to identify which business function most likely would be targeted, how all
business functions could be affected, and how each bulleted item in the loss criteria
would be directly or indirectly involved. The timeliness of the recovery can be critical for
business processes and the company’s survival. For example, it may be acceptable to have
the customer-support functionality out of commission for two days, whereas five days
may leave the company in financial ruin.
After identifying the critical functions, it is necessary to find out exactly what is
required for these individual business processes to take place. The resources that are
required for the identified business processes are not necessarily just computer systems,
but may include personnel, procedures, tasks, supplies, and vendor support. It must be
understood that if one or more of these support mechanisms is not available, the critical
function may be doomed. The team must determine what type of effect unavailable
resources and systems will have on these critical functions.
The BIA identifies which of the organization’s critical systems are needed for survival
and estimates the outage time that can be tolerated by the organization as a result of
various unfortunate events. The outage time that can be endured by an organization is
referred to as the maximum tolerable downtime (MTD) or maximum tolerable period of
disruption (MTPD), which is illustrated in Figure 2-10.
Figure 2-10
Maximum Irreparable
tolerable losses
downtime
Point at which the impact
becomes unacceptable
Serious but
survivable
losses
No loss
Time
MTD
• Nonessential 30 days
• Normal 7 days
• Important 72 hours
• Urgent 24 hours
• Critical Minutes to hours
Each business function and asset should be placed in one of these categories, depending
upon how long the organization can survive without it. These estimates will help the
organization determine what backup solutions are necessary to ensure the availability of
these resources. The shorter the MTD, the higher priority of recovery for the function in
question. Thus, the items classified as Urgent should be addressed before those classified
as Normal.
For example, if being without a T1 communication line for three hours would cost
the company $130,000, the T1 line could be considered Critical, and thus the company
should put in a backup T1 line from a different carrier. If a server going down and being
unavailable for ten days will only cost the company $250 in revenue, this would fall into
the Normal category, and thus the company may not need to have a fully redundant
server waiting to be swapped out. Instead, the company may choose to count on its
vendor’s SLA, which may promise to have it back online in eight days.
Sometimes the MTD will depend in large measure on the type of organization in
question. For instance, a call center—a vital link to current and prospective clients—
will have a short MTD, perhaps measured in minutes instead of weeks. A common
solution is to split up the calls through multiple call centers placed in differing locales.
If one call center is knocked out of service, the other one can temporarily pick up the
load. Manufacturing can be handled in various ways. Examples include subcontracting
the making of products to an outside vendor, manufacturing at multiple sites, and
warehousing an extra supply of products to fill gaps in supply in case of disruptions to
normal manufacturing.
The BCP team must try to think of all possible events that might occur that could
turn out to be detrimental to an organization. The BCP team also must understand it
cannot possibly contemplate all events, and thus protection may not be available for
every scenario introduced. Being properly prepared specifically for a flood, earthquake,
terrorist attack, or lightning strike is not as important as being properly prepared to
respond to anything that damages or disrupts critical business functions.
All of the previously mentioned disasters could cause these results, but so could a
meteor strike, a tornado, or a wing falling off a plane passing overhead. So the moral of
the story is to be prepared for the loss of any or all business resources, instead of focusing
on the events that could cause the loss.
PART I
operational loss in the event of a disaster or disruption. It identifies the
organization’s critical systems needed for survival and estimates the outage
time that can be tolerated by the organization as a result of a disaster or
disruption.
Quick Review
• Risk management is the process of identifying and assessing risk, reducing it to
an acceptable level, and ensuring it remains at that level.
• An information systems risk management (ISRM) policy provides the foundation
and direction for the organization’s security risk management processes and
procedures and should address all issues of information security.
• A threat is a potential cause of an unwanted incident, which may result in harm
to a system or organization.
• Four risk assessment methodologies with which you should be familiar are NIST
SP 800-30; Facilitated Risk Analysis Process (FRAP); Operationally Critical
Threat, Asset, and Vulnerability Evaluation (OCTAVE); and Failure Modes and
Effect Analysis (FMEA).
• Failure Modes and Effect Analysis (FMEA) is a method for determining functions,
identifying functional failures, and assessing the causes of failure and their effects
through a structured process.
• A fault tree analysis is a useful approach to detect failures that can take place
within complex environments and systems.
• A quantitative risk analysis attempts to assign monetary values to components
within the analysis.
• A purely quantitative risk analysis is not possible because qualitative items cannot
be quantified with precision.
• Qualitative risk analysis uses judgment and intuition instead of numbers.
• Qualitative risk analysis involves people with the requisite experience and
education evaluating threat scenarios and rating the probability, potential loss,
and severity of each threat based on their personal experience.
• Single loss expectancy × frequency per year = annualized loss expectancy
(SLE × ARO = ALE)
PART I
threats, and provide an economic balance between the impact of the risk and the
cost of the safeguards.
• Capturing the degree of uncertainty when carrying out a risk analysis is
important, because it indicates the level of confidence the team and management
should have in the resulting figures.
• Automated risk analysis tools reduce the amount of manual work involved in the
analysis. They can be used to estimate future expected losses and calculate the
benefits of different security measures.
• The risk management team should include individuals from different departments
within the organization, not just technical personnel.
• Risk can be transferred, avoided, reduced, or accepted.
• Threats × vulnerability × asset value = total risk.
• (Threats × vulnerability × asset value) × controls gap = residual risk.
• When choosing the right safeguard to reduce a specific risk, the cost, functionality,
and effectiveness must be evaluated and a cost/benefit analysis performed.
• There are three main categories of controls: administrative, technical, and
physical.
• Controls can also be grouped by types, depending on their intended purpose, as
preventive, detective, corrective, deterrent, recovery, and compensating.
• A control assessment is an evaluation of one or more controls to determine the
extent to which they are implemented correctly, operating as intended, and
producing the desired outcome.
• Security control verification answers the question “did we implement the control
right?” while validation answers the question “did we implement the right control?”
• Risk monitoring is the ongoing process of adding new risks, reevaluating existing
ones, removing moot ones, and continuously assessing the effectiveness of your
controls at mitigating all risks to tolerable levels.
• Change management processes deal with monitoring changes to your
environment and dealing with the risks they could introduce.
• Continuous improvement is the practice of identifying opportunities, mitigating
threats, improving quality, and reducing waste as an ongoing effort. It is the
hallmark of mature and effective organizations.
• A supply chain is a sequence of suppliers involved in delivering some product.
• Business continuity management (BCM) is the overarching approach to managing
all aspects of BCP and DRP.
• A business continuity plan (BCP) contains strategy documents that provide
detailed procedures that ensure critical business functions are maintained and
that help minimize losses of life, operations, and systems.
Questions
Please remember that these questions are formatted and asked in a certain way for a reason.
Keep in mind that the CISSP exam is asking questions at a conceptual level. Questions may
not always have the perfect answer, and the candidate is advised against always looking for
the perfect answer. Instead, the candidate should look for the best answer in the list.