Databricks SQL:
Why the Best Serverless
Data Warehouse is
a Lakehouse?
Matthieu Lamairesse - Sr. Solutions Architect
Reda Khouani - Sr. Specialist Solutions Architect
©2023 Databricks Inc. — All rights reserved
Introductions
Reda Khouani Matthieu Lamairesse
Sr. Specialist Solutions Architect Sr. Solutions Architect
©2023 Databricks Inc. — All rights reserved
Agenda
1. Why the Best Data Warehouse is a Lakehouse (with Serverless)
2. New Features :
- Querying
- Ingestion and Transformation
- Performance and Data Management
3. Select new feature Demo
©2023 Databricks Inc. — All rights reserved
Every company
wants to become a
Data+AI company
©2023 Databricks Inc. — All rights reserved
Two incompatible architectures get in the way
Data Warehouse Data Lake
for BI for AI
Competitive Advantage
Automated
Decision
Making
What happened?
Prescriptive
Analytics
Predictive
Modeling
Data What will happen?
Exploration
Ad Hoc
Reports Queries
Clean Data
Data + AI Maturity Curve
©2023 Databricks Inc. — All rights reserved
Customers started using our Spark clusters for SQL
The origins of the lakehouse
SQL Usage Growth on Databricks (2018-2020)
1 Delta Lake enabled robust data
management on the data lake.
2 Customers increasingly use SQL to
directly query data lake data.
3 All the data is going to the lake. Only a
portion gets into the data warehouse.
©2023 Databricks Inc. — All rights reserved
Data Warehousing on the Lakehouse
Powered by Databricks SQL
Databricks SQL (DB SQL) is a serverless data warehouse on the Databricks Lakehouse Platform that
lets you run all your SQL and BI applications at scale with up to 12x better price/performance,
a unified governance model, open formats and APIs, and your tools of choice - no lock-in.
Best price/performance Built-in governance Rich ecosystem Break down silos
©2023 Databricks Inc. — All rights reserved
Databricks SQL Momentum
©2023 Databricks Inc. — All rights reserved
Built from the ground up for best performance
Lightning fast analytics for all queries
100TB TPC-DS Price/Performance 10GB TPC-DS @ 32 Concurrent Streams (Queries/Hr)
Lower is better Higher is better
Databricks sets official data warehousing performance record
©2023 Databricks Inc. — All rights reserved
Learn more: https://dbricks.co/benchmark
Photon
The query engine for lakehouse systems
The SIGMOD 2022 Best
Industry Paper Award
is awarded annually
to one paper based on
“the combination of
real-world impact,
innovation, and quality
of the presentation.”
©2023 Databricks Inc. — All rights reserved
One source of truth for all your data
Open format Delta Lake as the foundation
Delta Lake adds quality, reliability, and
performance to your existing data lakes, and
provides one common data management
framework for data, analytics and AI use cases.
©2023 Databricks Inc. — All rights reserved
Seamless integration with Unity Catalog
Unified governance for data and AI
Securely discover, access and
collaborate on trusted data and AI assets,
leveraging AI to boost productivity and
unlock the full potential of the lakehouse
environment.
More info in the session : What’s New in Unity Catalog
©2023 Databricks Inc. — All rights reserved
15h30 – 16h10 Salon St Monge
Best of the Lake and the Warehouse
Go from BI to AI effortlessly to uncover new insights
Build and train state of the art LLMs &
machine learning models on your most
complete data, remove silos, and
democratize AI across your organization.
©2023 Databricks Inc. — All rights reserved
The best data warehouse is a lakehouse
Powered by Databricks SQL
1 Data, analytics, and AI in one place
2 World-class performance with data lake economics
3 One source of truth for all your data
©2023 Databricks Inc. — All rights reserved
New Features
©2023 Databricks Inc. — All rights reserved
Querying
©2023 Databricks Inc. — All rights reserved
Ingest, transform, and query with any tool
A first-class SQL experience
Self-served data ingestion from
cloud storage, local files, or
business critical applications
Ingest
Query the freshest data in SQL, and Govern
build apps and dashboards with
any tools powered by the lakehouse
&
Que Manage A familiar toolkit to discover and
m
transform all your data in-place
or
ry
using standard SQL
sf
{REST:API}
an
Tr
©2023 Databricks Inc. — All rights reserved
Data Consumption
Query from any tool
Connect existing BI tools and dashboards
or brand new ones to the freshest data
using OAuth or PAT tokens
Leverage your favorite SQL workbenches or
IDE to find new insights
Build custom data apps powered by the
lakehouse with tools and languages you
already know
{REST:API}
And more…
©2023 Databricks Inc. — All rights reserved
Integrated Analytical Tooling
Collaboratively query, explore, and transform data in-place
• Discover data, explore database schema,
and query data using ANSI SQL
• Save, share, and reuse queries across teams
to get to results faster
• Up next: Integrated SQL authoring assistant
• Build interactive visualizations and
dashboards
• Stay up to date with alerts and automatic
refresh schedules
©2023 Databricks Inc. — All rights reserved
w
vie
re
Dashboards vNext
P
te
iva
Pr
Simple and Beautiful
Simplified content model, new visualization
library, and SQL-optional UX experience
Optimized for Distribution
Publish and Share to Org
Platform Integration
Unity Catalog powered dataset search and
lineage
©2023 Databricks Inc. — All rights reserved
w
vie
re
Python User Defined Functions (UDFs)
P
te
iva
Pr
Run Python UDFs from an isolated execution environment
Integrate Machine Learning models,
CREATE FUNCTION redact(a STRING)
custom logic & bring the flexibility of
RETURNS STRING
Python right into Databricks SQL! LANGUAGE PYTHON
AS $$
import json
keys = ["email", "phone"]
obj = json.loads(a)
for k in obj:
if k in keys:
obj[k] = "REDACTED"
return json.dumps(obj)
$$;
©2023 Databricks Inc. — All rights reserved
w
ie
ev
Write SQL to get insight from
Pr
ic
bl
Pu
unstructured text data via LLMs
Sample use cases
● Extract top product issues from
call center transcripts—without
manual tagging!
● Tag customers as a potential
churn risk based on customer
support chat logs
● Generate customized
product descriptions for ad
campaigns—automatically
● Read product reviews to
understand buying
decision criteria
…many more…
©2023 Databricks Inc. — All rights reserved
w
vie
reP
te
iva
Databricks Assistant
Pr
AI assistant with contextual
understanding of your data—
natively within Notebook, SQL
editor and file editor
Generates and auto-completes code and queries
Explains and fixes issues
Integrates with Unity Catalog, offering contextual
results relevant to your data assets
©2023 Databricks Inc. — All rights reserved
w
vie
reP
Lakehouse Federation
te
iva
Pr
Discover, query, and govern all your data - no matter where it lives
● Unified view into all your data
● Unified engine for all your data
and use cases Users
● Unified governance across all
data sources Dashboards
Sign up @ databricks.com/qfpreview
©2023 Databricks Inc. — All rights reserved
Wednesday, June 28 @4:30 PM | Lakehouse Federation: Access and Governance of External Data Sources from Unity Catalog
Ingestion et Transformation
©2023 Databricks Inc. — All rights reserved
w
ie
ev
Data Ingest: Streaming Tables
Pr
ic
bl
Pu
Efficiently and continuously land data in the bronze layer
Enable the continuous, scalable ingestion from any
data source including cloud storage, message buses
(EventHub, Kafka, Kinesis) and more
report
streaming table
CREATE STREAMING TABLE report
AS SELECT SUM(profit)
FROM cloud_files(prod.sales
©2023 Databricks Inc. — All rights reserved
w
ie
ev
Materialized Views
Pr
ic
bl
Pu
Speed up queries with pre-computed results
Accelerate end-user queries and reduce sales_report
city total_rev
infrastructure costs with efficient, SF 24
incremental computation NY 7
• Accelerate BI dashboards and ETL
queries
• Streaming: build MVs on top of live
sales store_info
tables
prod loc txn price loc mgr city
• Easy ELT: Simplify reporting by cleaning, i1 l1 tx1 11 l1 Alice WA
enriching, denormalizing the base tables i2 l2 tx2 24 l2 Bob SF
i3 l3 tx3 7 l3 Annie NY
©2023 Databricks Inc. — All rights reserved
Materialized Views and Streaming Tables
The best data warehouse gets the best of data engineering
Enable your analysts. SQL and data analysts can
easily ingest, clean, and enrich data to quickly
meet the needs of your business.
Speed up BI dashboards. Create MV’s to
accelerate SQL analytics and BI reports by
pre-computing results ahead of time.
Move to real-time analytics. Combine MV’s with
streaming tables to create fully incremental data
pipelines for real-time use cases.
©2023 Databricks Inc. More inforeserved
— All rights in the session : Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
DBSQL MVs & STs on Databricks Lakehouse
How do MVs and STs fit in the lakehouse architecture?
MVs and STs are
monitored in 3 MVs and STs are refreshed
Workflows 2 on DLT clusters
MVs and STs are
created in 1
Databricks SQL
MVs and STs are
4 managed by Unity
Catalog
MVs and STs store
5
data in Delta Lake
©2023 Databricks Inc. — All rights reserved
Workflows + Databricks SQL
Orchestrate your SQL queries, dashboards, alerts, and more
Schedule and automate your
Databricks SQL production workloads
• Easily orchestrate sophisticated
workflows with multiple dependencies
• Enhanced monitoring and observability
with proven reliability in production
• Up next: Native integration with
dashboards, SQL queries and alerts
©2023 Databricks Inc. — All rights reserved
Performance
©2023 Databricks Inc. — All rights reserved
w
Pr nd
ie
te ic a
ev
Intelligent Workload Management
iva bl
Pr Pu
Efficient compute utilization
Workload Management is about efficient
Mixed Workloads Query Latency (Seconds)
compute utilization – when and where to run a Lower is better
query, when to scale up or down, controls for
cancelling an execution, etc
• Statement timeouts at workspace and query
level already available
• Additional ongoing investment in
• intelligent auto-scaling, adaptive routing &
remote result cache
• Automatic Data Layout Optimization
©2023 Databricks Inc. — All rights reserved
w
ie
ev
Pr
DBSQL System Tables
te
iva
Pr
Bronze tables providing visibility into platform activity
What statements were run > warehouse_events
by whom & when? > warehouses
How & when did warehouses scale?
> statement_history
What was I billed for?
> billing
Sign up @ tinyurl.com/sys-tables
©2023 Databricks Inc. — All rights reserved
Demo :
- Lakeview ( Dashboards VNext )
- Databricks Assistant
- LLM Functions
©2023 Databricks Inc. — All rights reserved
Ressources
Bill Immon
©2023 Databricks Inc. — All rights reserved
Q/A
©2023 Databricks Inc. — All rights reserved 36
Thank You !
©2023 Databricks Inc. — All rights reserved 37
©2023 Databricks Inc. — All rights reserved