Chapter 10.
Security and
Privacy
Security is vital to the practice of data engineering. This should be
blindingly obvious, but we’re constantly amazed at how often data
engineers view security as an afterthought. We believe that security is the
first thing a data engineer needs to think about in every aspect of their job
and every stage of the data engineering lifecycle. You deal with sensitive
data, information, and access daily. Your organization, customers, and
business partners expect these valuable assets to be handled with the utmost
care and concern. One security breach or a data leak can leave your
business dead in the water; your career and reputation are ruined if it’s your
fault.
Security is a key ingredient for privacy. Privacy has long been critical to
trust in the corporate information technology space; engineers directly or
indirectly handle data related to people’s private lives. This includes
financial information, data on private communications (emails, texts, phone
calls), medical history, educational records, and job history. A company that
leaked this information or misused it could find itself a pariah when the
breach came to light.
Increasingly, privacy is a matter of significant legal importance. For
example, the Family Educational Rights and Privacy Act (FERPA) went
into effect in the US in the 1970s; the Health Insurance Portability and
Accountability Act (HIPAA) followed in the 1990s; GDPR was passed in
Europe in the mid-2010s. Several US-based privacy bills have passed or
will soon. This is just a tiny sampling of privacy-related statutes (and we
believe just the beginning). Still, the penalties for violation of any of these
laws can be significant, even devastating, to a business. And because data
systems are woven into the fabric of education, health care, and business,
data engineers handle sensitive data related to each of these laws.
A data engineer’s exact security and privacy responsibilities will vary
significantly between organizations. At a small startup, a data engineer may
do double duty as a data security engineer. A large tech company will have
armies of security engineers and security researchers. Even in this situation,
data engineers will often be able to identify security practices and
technology vulnerabilities within their own teams and systems that they can
report and mitigate in collaboration with dedicated security personnel.
Because security and privacy are critical to data engineering (security being
an undercurrent), we want to spend some more time covering security and
privacy. In this chapter, we lay out some things data engineers should
consider around security, particularly in people, processes, and technology
(in that order). This isn’t a complete list, but lays out the major things we’d
wish would improve based on our experience.
People
The weakest link in security and privacy is you. Security is often
compromised at the human level, so conduct yourself as if you’re always a
target. A bot or human actor is trying to infiltrate your sensitive credentials
and information at any given time. This is our reality, and it’s not going
away. Take a defensive posture with everything you do online and offline.
Exercise the power of negative thinking and always be paranoid.
The Power of Negative Thinking
In a world obsessed with positive thinking, negative thinking is distasteful.
However, American surgeon Atul Gawande wrote a 2007 op-ed in the New
York Times on precisely this subject. His central thesis is that positive
thinking can blind us to the possibility of terrorist attacks or medical
emergencies and deter preparation. Negative thinking allows us to consider
disastrous scenarios and act to prevent them.
Data engineers should actively think through the scenarios for data
utilization and collect sensitive data only if there is an actual need
downstream. The best way to protect private and sensitive data is to avoid
ingesting this data in the first place.
Data engineers should think about the attack and leak scenarios with any
data pipeline or storage system they utilize. When deciding on security
strategies, ensure that your approach delivers proper security and not just
the illusion of safety.
Always Be Paranoid
Always exercise caution when someone asks you for your credentials.
When in doubt—and you should always be in extreme doubt when asked
for credentials—hold off and get second opinions from your coworkers and
friends. Confirm with other people that the request is indeed legitimate. A
quick chat or phone call is cheaper than a ransomware attack triggered
through an email click. Trust nobody at face value when asked for
credentials, sensitive data, or confidential information, including from your
coworkers.
You are also the first line of defense in respecting privacy and ethics. Are
you uncomfortable with sensitive data you’ve been tasked to collect? Do
you have ethical questions about the way data is being handled in a project?
Raise your concerns with colleagues and leadership. Ensure that your work
is both legally compliant and ethical.
Processes
When people follow regular security processes, security becomes part of the
job. Make security a habit, regularly practice real security, exercise the
principle of least privilege, and understand the shared responsibility model
in the cloud.
Security Theater Versus Security Habit
With our corporate clients, we see a pervasive focus on compliance (with
internal rules, laws, recommendations from standards bodies), but not
enough attention to potentially bad scenarios. Unfortunately, this creates an
illusion of security but often leaves gaping holes that would be evident with
a few minutes of reflection.
Security needs to be simple and effective enough to become habitual
throughout an organization. We’re amazed at the number of companies with
security policies in the hundreds of pages that nobody reads, the annual
security policy review that people immediately forget, all in checking a box
for a security audit. This is security theater, where security is done in the
letter of compliance (SOC-2, ISO 27001, and related) without real
commitment.
Instead, pursue the spirit of genuine and habitual security; bake a security
mindset into your culture. Security doesn’t need to be complicated. For
example, at our company, we run security training and policy review at least
once a month to ingrain this into our team’s DNA and update each other on
security practices we can improve. Security must not be an afterthought for
your data team. Everyone is responsible and has a role to play. It must be
the priority for you and everyone else you work with.
Active Security
Returning to the idea of negative thinking, active security entails thinking
about and researching security threats in a dynamic and changing world.
Rather than simply deploying scheduled simulated phishing attacks, you
can take an active security posture by researching successful phishing
attacks and thinking through your organizational security vulnerabilities.
Rather than simply adopting a standard compliance checklist, you can think
about internal vulnerabilities specific to your organization and incentives
employees might have to leak or misuse private information.
We have more to say about active security in “Technology”.
The Principle of Least Privilege
The principle of least privilege means that a person or system should be
given only the privileges and data they need to complete the task at hand
and nothing more. Often, we see an antipattern in the cloud: a regular user
is given administrative access to everything, when that person may need
just a handful of IAM roles to do their work. Giving someone carte blanche
administrative access is a huge mistake and should never happen under the
principle of least privilege.
Instead, provide the user (or group they belong to) the IAM roles they need
when they need them. When these roles are no longer needed, take them
away. The same rule applies to service accounts. Treat humans and
machines the same way: give them only the privileges and data they need to
do their jobs, and only for the timespan when needed.
Of course, the principle of least privilege is also critical to privacy. Your
users and customers expect that people will look at their sensitive data only
when necessary. Make sure that this is the case. Implement column, row,
and cell-level access controls around sensitive data; consider masking PII
and other sensitive data and create views that contain only the information
the viewer needs to access. Some data must be retained, but should be
accessed only in an emergency. Put this data behind a broken glass process:
users can access it only after going through an emergency approval process
to fix a problem, query critical historical information, etc. Access is
revoked immediately once the work is done.
Shared Responsibility in the Cloud
Security is a shared responsibility in the cloud. The cloud vendor is
responsible for ensuring the physical security of its data center and
hardware. At the same time, you are responsible for the security of the
applications and systems you build and maintain in the cloud. Most cloud
security breaches continue to be caused by end users, not the cloud.
Breaches occur because of unintended misconfigurations, mistakes,
oversights, and sloppiness.
Always Back Up Your Data
Data disappears. Sometimes it’s a dead hard drive or server; in other cases,
someone might accidentally delete a database or an object storage bucket. A
bad actor can also lock away data. Ransomware attacks are widespread
these days. Some insurance companies are reducing payouts in the event of
an attack, leaving you on the hook both to recover your data and pay the
bad actor who’s holding it hostage. You need to back up your data regularly,
both for disaster recovery and continuity of business operations, if a version
of your data is compromised in a ransomware attack. Additionally, test the
restoration of your data backups on a regular basis.
Data backup doesn’t strictly fit under security and privacy practices; it goes
under the larger heading of disaster prevention, but it’s adjacent to security,
especially in the era of ransomware attacks.
An Example Security Policy
This section presents a sample security policy regarding credentials,
devices, and sensitive information. Notice that we don’t overcomplicate
things; instead, we give people a short list of practical actions they can take
immediately.