Considerations for
Data Access in the
Lakehouse
Zachary Friedman
Product Manager at Immuta
Agenda
Introduction to
Lakehouse Concepts for
Governance
Role-Based Access
Control (RBAC) vs.
Attribute-Based Access
Control (ABAC)
Enterprise-Grade
Authorization in
Databricks SQL Analytics
Data Governance meets the Lakehouse
What is a Lakehouse?
What is a Lakehouse?
■ Let’s do a (brief) history lesson
■ Late 1980’s: the Data Warehouse
■ Early 2010’s: the Data Lake
■ The roaring 20’s: the Data Lakehouse
Key Features of the Lakehouse
Transaction support
Schema enforcement and
governance
BI support
Separate storage from compute
Support for diverse workloads
Scalable security and access control
management
Additional data governance
capabilities such as auditing and
lineage
Data discovery tools such as data
catalogs
Enterprise-Grade Features
Basic Key Attributes
Diving Deeper
Key Concepts for Authorization in the Lakehouse
● Role-Based Access Control (RBAC)
● Attribute-Based Access Control (ABAC)
● Enforcement Point
Role-Based Access Control
Role-Based Access Control (RBAC)
To manage access to resources, group permissions into roles, and assign those roles to users
■ User-Role relationships
■ Role-Permission relationships
Role-Based Access Control (RBAC)
Define a User-Role relationship in Databricks SQL Analytics
■ Manage groups using the Admin Console, Groups API,
or SCIM API
■ Add users to groups and remove them
Role-Based Access Control (RBAC)
Define a Role-Permission relationship in Databricks SQL Analytics
■ Define the access that a role grants to a user
■ At a high level this can be implemented in terms of
the is_member() function
Attribute-Based Access Control
Attribute-Based Access Control (ABAC)
Represent fine-grained or dynamic permissions based on who the user is and their relationship to the
resource they want to access.
■ User relationship to the resource can be expressed
as a JOIN on user attributes and values of a resource
column
Access Control Dimensions in SQL Analytics
Access Control Dimensions
A user can access sales data,
but not financial data
A user can access a particular
sales opportunity, or a sales
opportunity matching certain
conditions
Row
Table
A user can access only certain
fields of a record, and we can
mask the values of a column
depending on the user trying
to access
Column
Me
Just now
You’re going to need a
framework to manage all of
these access controls across
your Enterprise.
Requirements for Enterprise-Grade Access Controls
Framework
Individuals can be granted
access to query tables and
views by virtue of:
● membership in a group
(role-based)
● possession of an attribute
(attribute-based)
● request and approval by an
admin
● public access
● individual user selection
● access for a specified
period of time
● access only for a specific
purpose
Individuals can be allowed to see
rows in a dataset based on:
● membership in a group with a
corresponding column value
with that group
● possession of an attribute with
a corresponding column value
with that attribute
● filter based on a time column,
so users are entitled to query
only rows with a specific
recency requirement
Row-level policies
Table-level policies
Different users see different
values in specific columns by
virtue of the above discussed
roles, attributes, and purposes;
examples include:
● Masking a column to NULL
● Masking a column using
hashing
● Masking a column to a
constant string
● Other advanced PETs and
Differential Privacy
Column-level policies
Users who are part of the Active Directory
group called finance are allowed to read
profit loss data.
Provided we’ve kept our groups in sync
between our corporate directory and
Databricks, using either the Admin Console,
Groups API, or SCIM API, then we can solve
this requirement simply with:
GRANT SELECT ON TABLE
accounting.profit_loss_statement
TO finance;
Framework for Managing Table-level Access Controls
Users with the attribute executive are
allowed to read sales data.
This one is a bit more complex. First, we
need to store a (user, name, value) triple
in some sort of attributes table.
Next, we’ll actually need to create a
secure view on top of the original table,
since we can’t pass a WHERE clause as a
principle, only user or group.
ABAC
RBAC
Solving for ABAC in our Framework
Users with the attribute executive are allowed to read sales data.
Solving for ABAC in our Framework
Restrict the user to only be able to view their own personal attributes.
Solving for ABAC in our Framework
Putting it all together. Users with the attribute executive are allowed to read sales data.
Managing Row-level Access Controls
A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
■ Let’s consider a sales dataset that has a territory
column, and we only want users with the attribute
territory to be able to see rows with the
corresponding value in the territory column
fct_sales
sale_id amount territory
1 1000000 US-EAST
2 150000 US-EAST
3 175000 EU
4 800000 APAC
5 50000 US-WEST
6 75000 US-CENTRAL
7 50000 US-EAST
Row-level ABAC
A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
Row-level ABAC
A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
sec_fct_sales
visible sale_id amount territory
YES 1 1000000 US-EAST
YES 2 150000 US-EAST
NO 3 175000 EU
NO 4 800000 APAC
NO 5 50000 US-WEST
NO 6 75000 US-CENTRAL
YES 7 50000 US-EAST
Column-level Masking
Only executives can see the amount of a sale.
sec_fct_sales
visible sale_id amount territory
YES 1 1000000 US-EAST
YES 2 150000 US-EAST
NO 3 175000 EU
NO 4 800000 APAC
NO 5 50000 US-WEST
NO 6 75000 US-CENTRAL
YES 7 50000 US-EAST
sec_fct_sales (for user without the executive attribute)
visible sale_id amount territory
YES 1 NULL US-EAST
YES 2 NULL US-EAST
NO 3 NULL EU
NO 4 NULL APAC
NO 5 NULL US-WEST
NO 6 NULL US-CENTRAL
YES 7 NULL US-EAST
Thanks for coming to my
talk. My name is Zachary
and I’m a product
manager at Immuta,
which provides an
Enterprise-grade access
controls platform to Data
teams just like this. AMA!
Thank You!
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

PPTX
Introduction to Azure Databricks
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Lakehouse in Azure
PDF
Apache Iceberg: An Architectural Look Under the Covers
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
From Data Warehouse to Lakehouse
PDF
Building an open data platform with apache iceberg
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Introduction to Azure Databricks
The Parquet Format and Performance Optimization Opportunities
Lakehouse in Azure
Apache Iceberg: An Architectural Look Under the Covers
A Thorough Comparison of Delta Lake, Iceberg and Hudi
From Data Warehouse to Lakehouse
Building an open data platform with apache iceberg
Building Lakehouses on Delta Lake with SQL Analytics Primer

What's hot (20)

PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PPTX
Free Training: How to Build a Lakehouse
PDF
Parquet performance tuning: the missing guide
PDF
Airbyte @ Airflow Summit - The new modern data stack
PDF
Intro to databricks delta lake
PPTX
iceberg introduction.pptx
PDF
Iceberg: a fast table format for S3
PPTX
Snowflake Datawarehouse Architecturing
PDF
Getting Started with Databricks SQL Analytics
PDF
Intro to Delta Lake
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
PDF
Apache Airflow
PDF
Data engineering design patterns
PDF
3D: DBT using Databricks and Delta
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
PDF
The delta architecture
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Modularized ETL Writing with Apache Spark
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Building Reliable Lakehouses with Apache Flink and Delta Lake
Free Training: How to Build a Lakehouse
Parquet performance tuning: the missing guide
Airbyte @ Airflow Summit - The new modern data stack
Intro to databricks delta lake
iceberg introduction.pptx
Iceberg: a fast table format for S3
Snowflake Datawarehouse Architecturing
Getting Started with Databricks SQL Analytics
Intro to Delta Lake
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Apache Airflow
Data engineering design patterns
3D: DBT using Databricks and Delta
Apache Iceberg - A Table Format for Hige Analytic Datasets
Iceberg: A modern table format for big data (Strata NY 2018)
The delta architecture
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Modularized ETL Writing with Apache Spark
Ad

Similar to Considerations for Data Access in the Lakehouse (20)

PDF
Attribute based access control
PPTX
IBM db2 Row and Access Control & Masking (Enforcing Governance where the data...
PDF
Using PostgreSQL for Data Privacy
PPTX
01 database security ent-db
PDF
Access Control: Principles and Practice
PDF
Migrate and Modernize Hadoop-Based Security Policies for Databricks
PPTX
multilevel security Database
PDF
Data base Access Control a look at Fine grain Access method
PPTX
database Security for data security .pptx
PDF
database-security-access-control-models-a-brief-overview-IJERTV2IS50406.pdf
PDF
Security Issues Surrounding Data Manipulation in a Relational Database
PDF
Chapter 6 Database Security and Authorization (4).pdf
PPTX
Database modeling and security
PPT
Dstca
PPTX
Modern Data Security for the Enterprises – SQL Server & Azure SQL Database
PDF
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
PPTX
009 Authentication and Access Control.pptx
PDF
Iaetsd database intrusion detection using
PPTX
Database security and security in networks
PPTX
Data Warehosing -Security
Attribute based access control
IBM db2 Row and Access Control & Masking (Enforcing Governance where the data...
Using PostgreSQL for Data Privacy
01 database security ent-db
Access Control: Principles and Practice
Migrate and Modernize Hadoop-Based Security Policies for Databricks
multilevel security Database
Data base Access Control a look at Fine grain Access method
database Security for data security .pptx
database-security-access-control-models-a-brief-overview-IJERTV2IS50406.pdf
Security Issues Surrounding Data Manipulation in a Relational Database
Chapter 6 Database Security and Authorization (4).pdf
Database modeling and security
Dstca
Modern Data Security for the Enterprises – SQL Server & Azure SQL Database
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
009 Authentication and Access Control.pptx
Iaetsd database intrusion detection using
Database security and security in networks
Data Warehosing -Security
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PDF
Stochastic Programming problem presentationLuedtke.pdf
PPTX
ISO 9001-2015 quality management system presentation
PDF
The-Physical-Self.pdf college students1-4
PPTX
An Introduction to Lean Six Sigma for Bilginer
PPTX
Overview_of_Computing_Presentation.pptxxx
PPTX
4. Sustainability.pptxxxxxxxxxxxxxxxxxxx
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PPTX
reflex-210317162019.pptxjy5i767i6i67i67i67i76
PPT
Handout for Lean and Six Sigma application
PDF
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
PPTX
Fkrjrkrkekekekeekkekswkjdjdjddwkejje.pptx
PPTX
BDA_Basics of Big data Unit-1.pptx Big data
PPTX
The future of AIThe future of AIThe future of AI
PPTX
REAL of PPT_P1_5019211081 (1).pdf_20250718_084609_0000.pptx
PPTX
1.Introduction to orthodonti hhhgghhcs.pptx
PPTX
DataGovernancePrimer_Hosch_2018_11_04.pptx
PPTX
Bussiness Plan S Group of college 2020-23 Final
PPTX
Sistem Informasi Manejemn-Sistem Manajemen Database
PPT
Drug treatment of Malbbbbbhhbbbbhharia.ppt
PPTX
Understanding AI: Basics on Artificial Intelligence and Machine Learning
Stochastic Programming problem presentationLuedtke.pdf
ISO 9001-2015 quality management system presentation
The-Physical-Self.pdf college students1-4
An Introduction to Lean Six Sigma for Bilginer
Overview_of_Computing_Presentation.pptxxx
4. Sustainability.pptxxxxxxxxxxxxxxxxxxx
Nucleic-Acids_-Structure-Typ...-1.pdf 011
reflex-210317162019.pptxjy5i767i6i67i67i67i76
Handout for Lean and Six Sigma application
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
Fkrjrkrkekekekeekkekswkjdjdjddwkejje.pptx
BDA_Basics of Big data Unit-1.pptx Big data
The future of AIThe future of AIThe future of AI
REAL of PPT_P1_5019211081 (1).pdf_20250718_084609_0000.pptx
1.Introduction to orthodonti hhhgghhcs.pptx
DataGovernancePrimer_Hosch_2018_11_04.pptx
Bussiness Plan S Group of college 2020-23 Final
Sistem Informasi Manejemn-Sistem Manajemen Database
Drug treatment of Malbbbbbhhbbbbhharia.ppt
Understanding AI: Basics on Artificial Intelligence and Machine Learning

Considerations for Data Access in the Lakehouse

  • 1. Considerations for Data Access in the Lakehouse Zachary Friedman Product Manager at Immuta
  • 2. Agenda Introduction to Lakehouse Concepts for Governance Role-Based Access Control (RBAC) vs. Attribute-Based Access Control (ABAC) Enterprise-Grade Authorization in Databricks SQL Analytics
  • 3. Data Governance meets the Lakehouse
  • 4. What is a Lakehouse?
  • 5. What is a Lakehouse? ■ Let’s do a (brief) history lesson ■ Late 1980’s: the Data Warehouse ■ Early 2010’s: the Data Lake ■ The roaring 20’s: the Data Lakehouse
  • 6. Key Features of the Lakehouse Transaction support Schema enforcement and governance BI support Separate storage from compute Support for diverse workloads Scalable security and access control management Additional data governance capabilities such as auditing and lineage Data discovery tools such as data catalogs Enterprise-Grade Features Basic Key Attributes
  • 8. Key Concepts for Authorization in the Lakehouse ● Role-Based Access Control (RBAC) ● Attribute-Based Access Control (ABAC) ● Enforcement Point
  • 10. Role-Based Access Control (RBAC) To manage access to resources, group permissions into roles, and assign those roles to users ■ User-Role relationships ■ Role-Permission relationships
  • 11. Role-Based Access Control (RBAC) Define a User-Role relationship in Databricks SQL Analytics ■ Manage groups using the Admin Console, Groups API, or SCIM API ■ Add users to groups and remove them
  • 12. Role-Based Access Control (RBAC) Define a Role-Permission relationship in Databricks SQL Analytics ■ Define the access that a role grants to a user ■ At a high level this can be implemented in terms of the is_member() function
  • 14. Attribute-Based Access Control (ABAC) Represent fine-grained or dynamic permissions based on who the user is and their relationship to the resource they want to access. ■ User relationship to the resource can be expressed as a JOIN on user attributes and values of a resource column
  • 15. Access Control Dimensions in SQL Analytics
  • 16. Access Control Dimensions A user can access sales data, but not financial data A user can access a particular sales opportunity, or a sales opportunity matching certain conditions Row Table A user can access only certain fields of a record, and we can mask the values of a column depending on the user trying to access Column
  • 17. Me Just now You’re going to need a framework to manage all of these access controls across your Enterprise.
  • 18. Requirements for Enterprise-Grade Access Controls Framework Individuals can be granted access to query tables and views by virtue of: ● membership in a group (role-based) ● possession of an attribute (attribute-based) ● request and approval by an admin ● public access ● individual user selection ● access for a specified period of time ● access only for a specific purpose Individuals can be allowed to see rows in a dataset based on: ● membership in a group with a corresponding column value with that group ● possession of an attribute with a corresponding column value with that attribute ● filter based on a time column, so users are entitled to query only rows with a specific recency requirement Row-level policies Table-level policies Different users see different values in specific columns by virtue of the above discussed roles, attributes, and purposes; examples include: ● Masking a column to NULL ● Masking a column using hashing ● Masking a column to a constant string ● Other advanced PETs and Differential Privacy Column-level policies
  • 19. Users who are part of the Active Directory group called finance are allowed to read profit loss data. Provided we’ve kept our groups in sync between our corporate directory and Databricks, using either the Admin Console, Groups API, or SCIM API, then we can solve this requirement simply with: GRANT SELECT ON TABLE accounting.profit_loss_statement TO finance; Framework for Managing Table-level Access Controls Users with the attribute executive are allowed to read sales data. This one is a bit more complex. First, we need to store a (user, name, value) triple in some sort of attributes table. Next, we’ll actually need to create a secure view on top of the original table, since we can’t pass a WHERE clause as a principle, only user or group. ABAC RBAC
  • 20. Solving for ABAC in our Framework Users with the attribute executive are allowed to read sales data.
  • 21. Solving for ABAC in our Framework Restrict the user to only be able to view their own personal attributes.
  • 22. Solving for ABAC in our Framework Putting it all together. Users with the attribute executive are allowed to read sales data.
  • 23. Managing Row-level Access Controls A user can access a particular sales opportunity, or a sales opportunity matching certain conditions. ■ Let’s consider a sales dataset that has a territory column, and we only want users with the attribute territory to be able to see rows with the corresponding value in the territory column
  • 24. fct_sales sale_id amount territory 1 1000000 US-EAST 2 150000 US-EAST 3 175000 EU 4 800000 APAC 5 50000 US-WEST 6 75000 US-CENTRAL 7 50000 US-EAST
  • 25. Row-level ABAC A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
  • 26. Row-level ABAC A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
  • 27. sec_fct_sales visible sale_id amount territory YES 1 1000000 US-EAST YES 2 150000 US-EAST NO 3 175000 EU NO 4 800000 APAC NO 5 50000 US-WEST NO 6 75000 US-CENTRAL YES 7 50000 US-EAST
  • 28. Column-level Masking Only executives can see the amount of a sale.
  • 29. sec_fct_sales visible sale_id amount territory YES 1 1000000 US-EAST YES 2 150000 US-EAST NO 3 175000 EU NO 4 800000 APAC NO 5 50000 US-WEST NO 6 75000 US-CENTRAL YES 7 50000 US-EAST
  • 30. sec_fct_sales (for user without the executive attribute) visible sale_id amount territory YES 1 NULL US-EAST YES 2 NULL US-EAST NO 3 NULL EU NO 4 NULL APAC NO 5 NULL US-WEST NO 6 NULL US-CENTRAL YES 7 NULL US-EAST
  • 31. Thanks for coming to my talk. My name is Zachary and I’m a product manager at Immuta, which provides an Enterprise-grade access controls platform to Data teams just like this. AMA! Thank You!
  • 32. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.