Chapter 09: The Art of Ensuring Integrity
Integrity ensures that data remains unchanged and trustworthy by anyone or anything over its entire life
cycle. Data integrity is a critical component to the design, implementation and usage of any system that
stores, processes, or transmits data. This chapter begins by discussing the types of data integrity controls
used such as hashing algorithms, salting, and keyed-hash message authentication code (HMAC). The use
of digital signatures and certificates incorporates the data integrity controls to provide users a way of
verifying the authenticity of messages and documents. The chapter concludes with a discussion of
database integrity enforcement. Having a well-controlled and well-defined data integrity system increases
the stability, performance, and maintainability of a database system.
Types of Data Integrity Controls
Hashing Algorithm
Users need to know that their data remains unchanged while at rest or in transit. Hashing is a tool that
ensures data integrity by taking binary data (the message) and producing a fixed-length representation
called the hash value or message digest, as shown in the figure.
The hash tool uses a cryptographic hashing function to verify and ensure data integrity. It can also verify
authentication. Hash functions replace clear text password or encryption keys because hash functions are
one-way functions. This means that if a password is hashed with a specific hashing algorithm, it will always
result in the same hash digest. It is considered one-way because with hash functions, it is computationally
infeasible for two different sets of data to come up with the same hash digest or output.
Every time the data is changed or altered, the hash value also changes. Because of this, cryptographic
hash values are often called digital fingerprints. They can detect duplicate data files, file version changes,
and similar applications. These values guard against an accidental or intentional change to the data and
accidental data corruption. Hashing is also very efficient. A large file or the content of an entire disk drive
results in a hash value with the same size.
Hashing Properties
Hashing is a one-way mathematical function that is relatively easy to compute, but significantly harder to
reverse. Grinding coffee is a good analogy of a one-way function. It is easy to grind coffee beans, but it is
almost impossible to put all of the tiny pieces back together to rebuild the original beans.
A cryptographic hash function has the following properties:
• The input can be any length.
• The output has a fixed length.
• The hash function is one way and is not reversible.
• Two different input values will almost never result in the same hash values.
Hashing Algorithms
Hash functions are helpful to ensure that a user or communication error does not change the data
accidentally. For instance, a sender may want to make sure that no one alters a message on its way to the
recipient. The sending device inputs the message into a hashing algorithm and computes its fixed-length
digest or fingerprint.
Simple Hash Algorithm (8-bit Checksum)
The 8-bit checksum is one of the first hashing algorithms, and it is the simplest form of a hash function.
An 8-bit checksum calculates the hash by converting the message into binary numbers and then organizing
the string of binary numbers into 8-bit chucks. The algorithm adds up the 8-bit values. The final step is to
convert the result using a process called 2’s complement. The 2’s complement converts a binary to its
opposite value, and then it adds one. This means that a zero converts to a one, and a one converts to a
zero. The final step is to add 1 resulting in an 8-bit hash value.
Modern Hashing Algorithms
There are many modern hashing algorithms widely used today. Two of the most popular are MD5 and
SHA.
Message Digest 5 (MD5) Algorithm
Ron Rivest developed the MD5 hashing algorithm, and several Internet applications use it today. MD5 is
a one-way function that makes it easy to compute a hash from the given input data but makes it very
difficult to compute input data given only a hash value.
MD5 produces a 128-bit hash value. The Flame malware compromised the security of MD5 in 2012. The
authors of the Flame malware used an MD5 collision to forge a Windows code-signing certificate.
Secure Hash Algorithm (SHA)
The U.S. National Institute of Standards and Technology (NIST) developed SHA, the algorithm specified in
the Secure Hash Standard (SHS). NIST published SHA-1 in 1994. SHA-2 replaced SHA-1 with four additional
hash functions to make up the SHA family:
• SHA-224 (224 bit)
• SHA-256 (256 bit)
• SHA-384 (384 bit)
• SHA-512 (512 bit)
SHA-2 is a stronger algorithm, and it is replacing MD5. SHA-256, SHA-384, and SHA-512 are the next-
generation algorithms.
Hashing Files and Digital Media
Integrity ensures that data and information is complete and unaltered at the time of its acquisition. This
is important to know when a user downloads a file from the Internet or a forensic examiner is looking for
evidence on digital media.
To verify the integrity of all IOS images, Cisco provides MD5 and SHA checksums at Cisco’s Download
Software website. The user can make a comparison of this MD5 digest against the MD5 digest of an IOS
image installed on a device, as shown in the figure. The user can now feel confident that no one has
tampered or modified the IOS image file.
The field of digital forensics uses hashing to verify all digital media that contain files. For example, the
examiner creates a hash and a bit-for-bit copy of the media containing the files to produce a digital clone.
The examiner compares the hash of the original media with the copy. If the two values match, the copies
are identical. The fact that one set of bits is identical to the original set of bits establishes fixity. Fixity helps
to answer several questions:
• Does the examiner have the files he expects?
• Is the data corrupted or changed?
• Can the examiner prove that the files are not corrupt?
Now the forensics expert can examine the copy for any digital evidence while leaving the original intact
and untouched.
Hashing Passwords
Hashing algorithms turn any amount of data into a fixed-length fingerprint or digital hash. A criminal
cannot reverse a digital hash to discover the original input. If the input changes at all, it results in a
different hash. This works for protecting passwords. A system needs to store a password in a form that
protects it and can still verify that a user’s password is correct.
The figure shows the workflow for user account registration and authentication using a hash-based
system. The system never writes the password to the hard drive, it only stores the digital hash.
Applications
Use cryptographic hash functions in the following situations:
• To provide proof of authenticity when it is used with a symmetric secret authentication key, such
as IP Security (IPsec) or routing protocol authentication
• To provide authentication by generating one-time and one-way responses to challenges in
authentication protocols
• To provide message integrity check proof, such as those used in digitally signed contracts, and
public key infrastructure (PKI) certificates, like those accepted when accessing a secure site using
a browser
When choosing a hashing algorithm, use SHA-256 or higher as they are currently the most secure. Avoid
SHA-1 and MD5 due to the discovery of security flaws. In production networks, implement SHA-256 or
higher.
While hashing can detect accidental changes, it cannot guard against deliberate changes. There is no
unique identifying information from the sender in the hashing procedure. This means that anyone can
compute a hash for any data, as long as they have the correct hash function. For example, when a message
traverses the network, a potential attacker could intercept the message, change it, recalculate the hash,
and append the hash to the message. The receiving device will only validate against whatever hash is
appended. Therefore, hashing is vulnerable to man-in-the-middle attacks and does not provide security
to transmitted data.
Cracking Hashes
To crack a hash, an attacker must guess the password. The top two attacks used to guess passwords are
dictionary and brute-force attacks.
A dictionary attack uses a file containing common words, phrases, and passwords. The file has the hashes
calculated. A dictionary attack compares the hashes in the file with the password hashes. If a hash
matches, the attacker will know a group of potentially good passwords.
A brute-force attack attempts every possible combination of characters up to a given length. A brute-force
attack takes a lot of processor time, but it is just a matter of time before this method discovers the
password. Passwords need to be long enough to make the time it takes to execute a brute-force attack
too long to be worthwhile. Hashing passwords makes it more difficult for the criminal to retrieve those
passwords.
Salting
Salting makes password hashing more secure. If two users have the same password, they will also have
the same password hashes. A salt, which is a random string of characters, is an additional input to the
password before hashing. This creates a different hash result for the two passwords as shown in the figure.
A database stores both the hash and the salt.
In the figure, the same password generates a different hash because the salt in each instance is different.
The salt does not have to be secret since it is a random number.
Preventing Attacks
Salting prevents an attacker from using a dictionary attack to try to guess passwords. Salting also makes
it impossible to use lookup tables and rainbow tables to crack a hash.
Lookup Tables
A lookup table stores the pre-computed hashes of passwords in a password dictionary along with the
corresponding password. A lookup table is a data structure that processes hundreds of hash lookups per
second. Click here to see how fast a lookup table can crack a hash.
Reverse Lookup Tables
This attack allows the cybercriminal to launch a dictionary or brute-force attack on many hashes without
the pre-computed lookup table. The cybercriminal creates a lookup table that plots each password hash
from the breached account database to a list of users. The cybercriminal hashes each password guess and
uses the lookup table to get a list of users whose password matched the cybercriminal’s guess, as shown
in the figure. Since many users have the same password, the attack works well.
Rainbow Tables
Rainbow tables sacrifice hash-cracking speed to make the lookup tables smaller. A smaller table means
that the table can store the solutions to more hashes in the same amount of space.
Implementing Salting
A Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) is the best choice to generate
salt. CSPRNGs generate a random number that has a high level of randomness and is completely
unpredictable, so it is cryptographically secure.
To implement salting successfully, use the following recommendations:
• The salt needs to be unique for every user password.
• Never reuse a salt.
• The length of the salt should match the length of the hash function’s output.
• Always hash on the server in a web application.
Using a technique called key stretching will also help to protect against attack. Key stretching makes the
hash function very slowly. This prevents high-end hardware that can compute billions of hashes per
second less effective.
The steps a database application uses to store and validate a salted password are shown in the figure.
HMAC
The next step in preventing a cybercriminal from launching a dictionary or brute-force attack on a hash is
to add a secret key to the hash. Only the person who knows the hash can validate a password. One way
to do this is to include the secret key in the hash using a hash algorithm called keyed-hash message
authentication code (HMAC or KHMAC). HMACs use an additional secret key as input to the hash function.
The use of HMAC goes a step further than just integrity assurance by adding authentication. An HMAC
uses a specific algorithm that combines a cryptographic hash function with a secret key, as shown in the
figure.
Only the sender and the receiver know the secret key, and the output of the hash function now depends
on the input data and the secret key. Only parties who have access to that secret key can compute the
digest of an HMAC function. This characteristic defeats man-in-the-middle attacks and provides
authentication of the data origin.
HMAC Operation
Consider an example where a sender wants to ensure that a message remains unchanged in transit and
wants to provide a way for the receiver to authenticate the origin of the message.
As shown in Figure, the sending device inputs data (such as Terry Smith’s pay of $100 and the secret key)
into the hashing algorithm and calculates the fixed-length HMAC digest or fingerprint. The receiver gets
the authenticated fingerprint attached to the message.
In Figure, the receiving device removes the fingerprint from the message and uses the plaintext message
with its secret key as input to the same hashing function. If the receiving device calculates a fingerprint
equal to the fingerprint sent, the message is still in its original form. Additionally, the receiver knows the
origin of the message because only the sender possesses a copy of the shared secret key. The HMAC
function proved the authenticity of the message.
Application of HMAC
HMACs can also authenticate a web user. Many web services use basic authentication, which does not
encrypt the username and password during transmission. Using HMAC, the user sends a private key
identifier and an HMAC. The server looks up the user’s private key and creates an HMAC. The user’s HMAC
must match the one calculated by the server.
VPNs using IPsec rely on HMAC functions to authenticate the origin of every packet and provide data
integrity checking.
Digital Signatures
Signature and the Law
Handwritten signatures and stamped seals prove authorship of the contents of a document. Digital
signatures can provide the same functionality as handwritten signatures.
Unprotected digital documents are very easy for anyone to change. A digital signature can determine if
someone edits a document after the user signs it. A digital signature is a mathematical method used to
check the authenticity and integrity of a message, digital document, or software.
In many countries, digital signatures have the same legal importance as a manually signed document.
Electronic signatures are binding for contracts, negotiations, or any other document requiring a
handwritten signature. An audit trail tracks the electronic document’s history for regulatory and legal
defense purposes.
A digital signature helps to establish authenticity, integrity, and non-repudiation. Digital signatures have
specific properties that enable entity authentication and data integrity as shown in the figure.
Digital signatures are an alternative to HMAC.
Non-Repudiation
To repudiate means to deny. Non-repudiation is a way to ensure that the sender of a message or
document cannot deny having sent the message or document and that the recipient cannot deny having
received the message or document.
A digital signature ensures that the sender electronically signed the message or document. Since a digital
signature is unique to the individual creating it, that person cannot later deny that he or she provided the
signature.
Processes of Creating a Digital Signature
Asymmetric cryptography is the basis for digital signatures. A public key algorithm like RSA generates two
keys: one private and the other public. The keys are mathematically related.
Alice wants to send Bob an email that contains important information for the rollout of a new product.
Alice wants to make sure that Bob knows that the message came from her, and that the message did not
change after she sent it.
Alice creates the message along with a digest of the message. She then encrypts this digest with her
private key as shown in Figure 1. Alice bundles the message, the encrypted message digest, and her public
key together to create the signed document. Alice sends this to Bob.
Bob receives the message and reads it. To make sure that the message came from Alice, he creates a
message digest of the message. He takes the encrypted message digest received from Alice and decrypts
it using Alice’s public key. Bob compares the message digest received from Alice with the one he
generated. If they match, Bob knows that he can trust that no one tampered with the message.
How Digital Signature Technology Works
Using Digital Signatures
Signing a hash instead of the whole document provides efficiency, compatibility, and integrity.
Organizations may want to replace paper documents and ink signatures with a solution that assures the
electronic document meets all legal requirements.
The following two situations provide examples of using digital signatures:
• Code signing - Used to verify the integrity of executable files downloaded from a vendor website.
Code signing also uses signed digital certificates to authenticate and verify the identity of the site.
• Digital certificates - Used to verify the identity of an organization or individual to authenticate a
vendor website and establish an encrypted connection to exchange confidential data.
Comparing Digital Signature Algorithms
The three common digital signature algorithms are Digital Signature Algorithm (DSA), Rivest-Shamir-
Adleman (RSA), and Elliptic Curve Digital Signature Algorithm (ECDSA). All three generate and verify digital
signatures. These algorithms depend upon asymmetrical encryption and public key techniques. Digital
signatures require two operations:
1. Key generation
2. Key verification
Both operations require key encryption and decryption.
DSA uses large number factorization. Governments use DSA for signing to create digital signatures. DSA
does not extend beyond the signature to the message itself.
RSA is the most common public key cryptography algorithm in use today. RSA is named after the
individuals who created it in 1977: Ron Rivest, Adi Shamir, and Leonard Adleman. RSA depends on
asymmetrical encryption. RSA covers signing and also encrypts the content of the message.
DSA is faster than RSA as a signing services for a digital document. RSA is best suited for applications
requiring the signing and verification of electronic documents and message encryption.
Like most areas of cryptography, the RSA algorithm is based on two mathematical principles; modulus and
prime number factorization. Click here to learn more about how RSA uses both modulus and prime
number factorization.
ECDSA is the newest digital signature algorithm that is gradually replacing RSA. The advantage of this new
algorithm is that it can uses much smaller key sizes for the same security and requires less computation
than RSA.
Certificates
A digital certificate is equivalent to an electronic passport. They enable users, hosts, and organizations to
exchange information securely over the Internet. Specifically, a digital certificate authenticates and
verifies that users sending a message are who they claim to be. Digital certificates can also provide
confidentiality for the receiver with the means to encrypt a reply.
Digital certificates are similar to physical certificates. For example, the paper-based Cisco Certified
Network Associate Security (CCNA-S) certificate in Figure 1 identifies the individual, the Certificate
Authority (who authorized the certificate), and for how long the certificate is valid. Notice how the digital
certificate in Figure 2 also identifies similar elements.
Using Digital Certificates
Refer to the figure to help understand how digital certificates are used to authenticate a person or entity.
In this scenario, Bob is purchasing an online item from Alice's website. Bob's browser will use Alice's digital
certificate to ensure he is actually shopping on Alice's website and to secure traffic between them.
Step 1: Bob decides to buy something on Alice's website and clicks on "Proceed to Checkout". His browser
initiates a secure connection with Alice's web server and displays a lock icon in the security status bar.
Step 2: Alice's web server receives the request and replies by sending its digital certificate containing the
web server public key and other information to Bob's browser.
Step 3: Bob's browser checks the digital certificate against stored certificates and confirms that he is
indeed connected to Alice's web server. Only trusted certificates permit the transaction to go forward. If
the certificate is not valid, then communication fails.
Step 4: Bob's web browser creates a unique session key that will be used for secure communication with
Alice's web server.
Step 5: Bob's browser encrypts the session key using Alice's public key and sends it to Alice's web server.
Step 6: Alice's web server receives the encrypted message from Bob's browser. It uses its private key to
decrypt the message and discover the session key. Future exchange between the web server and browser
will now use the session key to encrypt data.
What is a Certificate Authority?
On the Internet, continually exchanging identification between all parties would be impractical. Therefore,
individuals agree to accept the word of a neutral third party. Presumably, the third party does an in-depth
investigation prior to the issuance of credentials. After this in-depth investigation, the third party issues
credentials that are difficult to forge. From that point forward, all individuals who trust the third party
simply accept the credentials that the third party issues.
For example, in the figure Alice applies for a driver’s license. In this process, she submits evidence of her
identity, such as birth certificate, picture ID, and more to a government-licensing bureau. The bureau
validates Alice’s identity and permits Alice to complete a driver’s examination. Upon successful
completion, the licensing bureau issues Alice a driver license. Later, Alice needs to cash a check at the
bank. Upon presenting the check to the bank teller, the bank teller asks her for ID. The bank, because it
trusts the government-licensing bureau, verifies her identity and cashes the check.
A certificate authority (CA) functions the same as the licensing bureau. The CA issues digital certificates
that authenticate the identity of organizations and users. These certificates also sign messages to ensure
that no one tampered with the messages.
What is Inside a Digital Certificate?
As long as a digital certificate follows a standard structure, any entity can read and understand it
regardless of the issuer. X.509 is a standard for a public key infrastructure (PKI) to manage digital
certificates. PKI is the policies, roles, and procedures required to create, manage, distribute, use, store,
and revoke digital certificates. The X.509 standard specifies that digital certificates contain the standard
information listed below.
• Version Number
• Serial Number
• Certificate Algorithm Identifier
• Issuer Name
• Validity Period
• Subject Name
• Subject Public Key Information
• Issuer Unique Identifier
• Subject Unique Identifier
• Extensions
• CA’s Digital Signature
The Validation Process
Browsers and applications perform a validation check before they trust a certificate to ensure they are
valid. The three processes include the following:
• Certificate Discovery validates the certification path by checking each certificate starting at the
beginning with the root CA’s certificate
• Path Validation selects a certificate of the issuing CA for each certificate in the chain
• Revocation determines whether the certificate was revoked and why
The Certificate Path
An individual gets a certificate for a public key from a commercial CA. The certificate belongs to a chain of
certificates called the chain of trust. The number of certificates in the chain depends on the hierarchical
structure of the CA.
The figure shows a certificate chain for a two tier CA. There is an offline Root CA and an online subordinate
CA. The reason for the two-tier structure is that X.509 signing allows for easier recovery in the event of a
compromise. If there is an offline CA, it can sign the new online CA certificate. If there is not an offline CA,
a user has to install a new root CA certificate on every client machine, phone, or tablet.
Database Integrity Enforcement
Databases provide an efficient way to store, retrieve, and analyze data. As data collection increases and
data becomes more sensitive, it is important for cybersecurity professionals to protect the growing
number of databases. Think of a database as an electronic filing system. Data integrity refers to the
accuracy, consistency, and reliability of data stored in a database. The responsibility of data integrity falls
on database designers, developers, and the organization’s management.
The four data integrity rules or constraints are as follows:
• Entity Integrity: All rows must have a unique identifier called a Primary Key.
• Domain Integrity: All data stored in a column must follow the same format and definition.
• Referential Integrity: Table relationships must remain consistent. Therefore, a user cannot delete
a record which is related to another one.
• User-defined Integrity: A set of rules defined by a user which does not belong to one of the other
categories. For example, a customer places a new order, as shown in Figure 4. The user first checks
to see if this is a new customer. If it is, the user adds the new customer to the customers table.
Data Entry Controls
Data entry involves inputting data to a system. A set of controls ensures that users enter the correct data.
Drop Down Master Data Controls
Have a drop down option for master tables instead of asking individuals to enter the data. An example of
drop down master data controls is using the locations list from the U.S. postal address system to
standardize addresses.
Data Field Validation Controls
Set up rules for basic checks including:
• Mandatory input ensures that a required field contains data
• Input masks prevent users from entering invalid data or help ensure that they enter data
consistently (like a phone number, for example)
• Positive dollar amounts
• Data ranges ensure that a user enters data within a given range (like a date of birth entered as
01-18-1820, for example)
• Mandatory second person approval (a bank teller receives a deposit or withdraw request greater
than a specified value triggers a second or third approval)
• Maximum record modification trigger (the number of records modified exceeds a predetermined
number within a specific period of time blocks a user until a manager identifies whether or not
the transactions were legitimate)
• Unusual activity trigger (a system locks when it recognizes unusual activity)
Validation Rules
A validation rule checks that data falls within the parameters defined by the database designer. A
validation rule helps to ensure the completeness, accuracy and consistency of data. The criteria used in a
validation rule include the following:
• Size – checks the number of characters in a data item
• Format – checks that the data conforms to a specified format
• Consistency – checks for the consistency of codes in related data items
• Range – checks that data lies within a minimum and maximum value
• Check digit – provides for an extra calculation to generate a check digit for error detection
Data Type Validation
Data type validation is the simplest data validation and verifies that a user entering data is consistent with
the type of characters expected. For example, a phone number would not contain alpha characters.
Databases allow three data types: integer, string, and decimal.
Input Validation
One of the most vulnerable aspects of database integrity management is controlling the data input
process. Many well-known attacks run against a database and insert malformed data. The attack can
confuse, crash, or make the application divulge too much information to the attacker. Attackers use
automated input attacks.
For example, users fill out a form via a Web application to subscribe to a newsletter. A database
application automatically generates and sends email confirmations. When users receive their email
confirmations with a URL link to confirm their subscription, attackers modify the URL link. These
modifications include changing the username, email address, or subscription status. The email returns
back to the server hosting the application. If the Web server did not verify that the email address or other
account information submitted matched the subscription information, the server received bogus
information. Hackers can automate the attack to flood the Web application with thousands of invalid
subscribers to the newsletter database.
Anomaly Verification
Anomaly detection refers to identifying patterns in data that do not conform to expected behavior. These
non-conforming patterns are anomalies, outliers, exceptions, aberrations, or surprises in different
database applications. Anomaly detection and verification is an important countermeasure or safeguard
in identifying fraud detection. Database anomaly detection can identify credit card and insurance fraud.
Database anomaly detection can protect data from massive destruction or changes.
Anomaly verification requires verification data requests or modifications when a system detects unusual
or surprising patterns. An example of this is a credit card with two transactions in vastly different request
locations in a short time. If a transaction request from New York City occurs at 10:30 a.m. and a second
request comes from Chicago at 10:35 a.m., the system triggers a verification of the second transaction.
A second example occurs when an unusual number of email address modifications happen in an unusual
number of database records. Since email data launch DoS attacks, the email modification of hundreds of
records could indicate that an attacker is using an organization’s database as a tool for his or her DoS
attack.
Entity Integrity
A database is like an electronic filing system. Maintaining proper filing is critical in maintaining the
trustworthiness and usefulness of the data within the database. Tables, records, fields, and data within
each field make up a database. In order to maintain the integrity of the database filing system, users must
follow certain rules. Entity integrity is an integrity rule, which states that every table must have a primary
key and that the column or columns chosen to be the primary key must be unique and not NULL. Null in
a database signifies missing or unknown values. Entity integrity enables proper organization of data for
that record as shown in the figure.
Referential Integrity
Another important concept is the relationship between different filing systems or tables. The basis of
referential integrity is foreign keys. A foreign key in one table references a primary key in a second table.
The primary key for a table uniquely identifies entities (rows) in the table. Referential integrity maintains
the integrity of foreign keys.
Domain Integrity
Domain integrity ensures that all the data items in a column fall within a defined set of valid values. Each
column in a table has a defined set of values, such as the set of all numbers for credit card numbers, social
security numbers, or email addresses. Limiting the value assigned to an instance of that column (an
attribute) enforces domain integrity. Domain integrity enforcement can be as simple as choosing the
correct data type, length and or format for a column.
Summary
This chapter discussed how the art of integrity ensures that data remains unchanged by anyone or
anything over its entire life cycle. This chapter began by discussing the types of data integrity controls.
Hashing algorithms, password salting, and keyed-hash message authentication code (HMAC) are
important concepts for the cyber heroes to use in the implementation of digital signatures and
certificates. These tools provide a way for cybersecurity specialists to verify the authenticity of messages
and documents. The chapter concluded with a discussion of database integrity enforcement. Having a
well-controlled and well-defined data integrity system increases the stability, performance, and
maintainability of a database system.