0% found this document useful (0 votes)
258 views38 pages

Best Serverless Data Warehouse: Lakehouse

The document discusses how Databricks SQL provides a serverless data warehouse on the Databricks Lakehouse Platform. It highlights key benefits like built-in governance, rich ecosystem, and breaking down data silos. New features that are highlighted include seamless integration with tools for querying, ingesting, and transforming data, as well as integrated analytical tooling and Python user-defined functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
258 views38 pages

Best Serverless Data Warehouse: Lakehouse

The document discusses how Databricks SQL provides a serverless data warehouse on the Databricks Lakehouse Platform. It highlights key benefits like built-in governance, rich ecosystem, and breaking down data silos. New features that are highlighted include seamless integration with tools for querying, ingesting, and transforming data, as well as integrated analytical tooling and Python user-defined functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Databricks SQL:

Why the Best Serverless


Data Warehouse is
a Lakehouse?
Matthieu Lamairesse - Sr. Solutions Architect

Reda Khouani - Sr. Specialist Solutions Architect

©2023 Databricks Inc. — All rights reserved


Introductions

Reda Khouani Matthieu Lamairesse


Sr. Specialist Solutions Architect Sr. Solutions Architect

©2023 Databricks Inc. — All rights reserved


Agenda

1. Why the Best Data Warehouse is a Lakehouse (with Serverless)

2. New Features :
- Querying
- Ingestion and Transformation
- Performance and Data Management

3. Select new feature Demo

©2023 Databricks Inc. — All rights reserved


Every company
wants to become a
Data+AI company

©2023 Databricks Inc. — All rights reserved


Two incompatible architectures get in the way

Data Warehouse Data Lake


for BI for AI
Competitive Advantage

Automated
Decision
Making
What happened?
Prescriptive
Analytics

Predictive
Modeling

Data What will happen?


Exploration
Ad Hoc
Reports Queries
Clean Data

Data + AI Maturity Curve


©2023 Databricks Inc. — All rights reserved
Customers started using our Spark clusters for SQL
The origins of the lakehouse

SQL Usage Growth on Databricks (2018-2020)

1 Delta Lake enabled robust data


management on the data lake.

2 Customers increasingly use SQL to


directly query data lake data.

3 All the data is going to the lake. Only a


portion gets into the data warehouse.

©2023 Databricks Inc. — All rights reserved


Data Warehousing on the Lakehouse
Powered by Databricks SQL

Databricks SQL (DB SQL) is a serverless data warehouse on the Databricks Lakehouse Platform that
lets you run all your SQL and BI applications at scale with up to 12x better price/performance,
a unified governance model, open formats and APIs, and your tools of choice - no lock-in.

Best price/performance Built-in governance Rich ecosystem Break down silos

©2023 Databricks Inc. — All rights reserved


Databricks SQL Momentum

©2023 Databricks Inc. — All rights reserved


Built from the ground up for best performance
Lightning fast analytics for all queries

100TB TPC-DS Price/Performance 10GB TPC-DS @ 32 Concurrent Streams (Queries/Hr)


Lower is better Higher is better

Databricks sets official data warehousing performance record


©2023 Databricks Inc. — All rights reserved
Learn more: https://dbricks.co/benchmark
Photon
The query engine for lakehouse systems

The SIGMOD 2022 Best


Industry Paper Award
is awarded annually
to one paper based on
“the combination of
real-world impact,
innovation, and quality
of the presentation.”

©2023 Databricks Inc. — All rights reserved


One source of truth for all your data
Open format Delta Lake as the foundation

Delta Lake adds quality, reliability, and


performance to your existing data lakes, and
provides one common data management
framework for data, analytics and AI use cases.

©2023 Databricks Inc. — All rights reserved


Seamless integration with Unity Catalog
Unified governance for data and AI

Securely discover, access and


collaborate on trusted data and AI assets,
leveraging AI to boost productivity and
unlock the full potential of the lakehouse
environment.

More info in the session : What’s New in Unity Catalog


©2023 Databricks Inc. — All rights reserved
15h30 – 16h10 Salon St Monge
Best of the Lake and the Warehouse
Go from BI to AI effortlessly to uncover new insights

Build and train state of the art LLMs &


machine learning models on your most
complete data, remove silos, and
democratize AI across your organization.

©2023 Databricks Inc. — All rights reserved


The best data warehouse is a lakehouse
Powered by Databricks SQL

1 Data, analytics, and AI in one place

2 World-class performance with data lake economics

3 One source of truth for all your data

©2023 Databricks Inc. — All rights reserved


New Features

©2023 Databricks Inc. — All rights reserved


Querying

©2023 Databricks Inc. — All rights reserved


Ingest, transform, and query with any tool
A first-class SQL experience
Self-served data ingestion from
cloud storage, local files, or
business critical applications

Ingest

Query the freshest data in SQL, and Govern


build apps and dashboards with
any tools powered by the lakehouse
&
Que Manage A familiar toolkit to discover and

m
transform all your data in-place

or
ry
using standard SQL

sf
{REST:API}

an
Tr
©2023 Databricks Inc. — All rights reserved
Data Consumption
Query from any tool

Connect existing BI tools and dashboards


or brand new ones to the freshest data
using OAuth or PAT tokens

Leverage your favorite SQL workbenches or


IDE to find new insights

Build custom data apps powered by the


lakehouse with tools and languages you
already know
{REST:API}
And more…

©2023 Databricks Inc. — All rights reserved


Integrated Analytical Tooling
Collaboratively query, explore, and transform data in-place

• Discover data, explore database schema,


and query data using ANSI SQL
• Save, share, and reuse queries across teams
to get to results faster
• Up next: Integrated SQL authoring assistant
• Build interactive visualizations and
dashboards
• Stay up to date with alerts and automatic
refresh schedules

©2023 Databricks Inc. — All rights reserved


w
vie
re
Dashboards vNext
P
te
iva
Pr

Simple and Beautiful


Simplified content model, new visualization
library, and SQL-optional UX experience

Optimized for Distribution


Publish and Share to Org

Platform Integration
Unity Catalog powered dataset search and
lineage

©2023 Databricks Inc. — All rights reserved


w
vie
re
Python User Defined Functions (UDFs)
P
te
iva
Pr

Run Python UDFs from an isolated execution environment

Integrate Machine Learning models,


CREATE FUNCTION redact(a STRING)
custom logic & bring the flexibility of
RETURNS STRING
Python right into Databricks SQL! LANGUAGE PYTHON
AS $$
import json
keys = ["email", "phone"]
obj = json.loads(a)
for k in obj:
if k in keys:
obj[k] = "REDACTED"
return json.dumps(obj)
$$;

©2023 Databricks Inc. — All rights reserved


w
ie
ev
Write SQL to get insight from
Pr
ic
bl
Pu

unstructured text data via LLMs

Sample use cases


● Extract top product issues from
call center transcripts—without
manual tagging!
● Tag customers as a potential
churn risk based on customer
support chat logs
● Generate customized
product descriptions for ad
campaigns—automatically
● Read product reviews to
understand buying
decision criteria
…many more…

©2023 Databricks Inc. — All rights reserved


w
vie
reP
te
iva

Databricks Assistant
Pr

AI assistant with contextual


understanding of your data—
natively within Notebook, SQL
editor and file editor

Generates and auto-completes code and queries

Explains and fixes issues

Integrates with Unity Catalog, offering contextual


results relevant to your data assets

©2023 Databricks Inc. — All rights reserved


w
vie
reP

Lakehouse Federation
te
iva
Pr

Discover, query, and govern all your data - no matter where it lives

● Unified view into all your data


● Unified engine for all your data
and use cases Users

● Unified governance across all


data sources Dashboards

Sign up @ databricks.com/qfpreview

©2023 Databricks Inc. — All rights reserved


Wednesday, June 28 @4:30 PM | Lakehouse Federation: Access and Governance of External Data Sources from Unity Catalog
Ingestion et Transformation

©2023 Databricks Inc. — All rights reserved


w
ie
ev
Data Ingest: Streaming Tables
Pr
ic
bl
Pu

Efficiently and continuously land data in the bronze layer

Enable the continuous, scalable ingestion from any


data source including cloud storage, message buses
(EventHub, Kafka, Kinesis) and more
report
streaming table

CREATE STREAMING TABLE report


AS SELECT SUM(profit)
FROM cloud_files(prod.sales

©2023 Databricks Inc. — All rights reserved


w
ie
ev
Materialized Views
Pr
ic
bl
Pu

Speed up queries with pre-computed results

Accelerate end-user queries and reduce sales_report

city total_rev
infrastructure costs with efficient, SF 24

incremental computation NY 7

• Accelerate BI dashboards and ETL


queries
• Streaming: build MVs on top of live
sales store_info
tables
prod loc txn price loc mgr city
• Easy ELT: Simplify reporting by cleaning, i1 l1 tx1 11 l1 Alice WA

enriching, denormalizing the base tables i2 l2 tx2 24 l2 Bob SF

i3 l3 tx3 7 l3 Annie NY

©2023 Databricks Inc. — All rights reserved


Materialized Views and Streaming Tables
The best data warehouse gets the best of data engineering

Enable your analysts. SQL and data analysts can


easily ingest, clean, and enrich data to quickly
meet the needs of your business.
Speed up BI dashboards. Create MV’s to
accelerate SQL analytics and BI reports by
pre-computing results ahead of time.
Move to real-time analytics. Combine MV’s with
streaming tables to create fully incremental data
pipelines for real-time use cases.

©2023 Databricks Inc. More inforeserved


— All rights in the session : Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
DBSQL MVs & STs on Databricks Lakehouse
How do MVs and STs fit in the lakehouse architecture?

MVs and STs are


monitored in 3 MVs and STs are refreshed
Workflows 2 on DLT clusters

MVs and STs are


created in 1
Databricks SQL

MVs and STs are


4 managed by Unity
Catalog
MVs and STs store
5
data in Delta Lake

©2023 Databricks Inc. — All rights reserved


Workflows + Databricks SQL
Orchestrate your SQL queries, dashboards, alerts, and more

Schedule and automate your


Databricks SQL production workloads
• Easily orchestrate sophisticated
workflows with multiple dependencies
• Enhanced monitoring and observability
with proven reliability in production
• Up next: Native integration with
dashboards, SQL queries and alerts

©2023 Databricks Inc. — All rights reserved


Performance

©2023 Databricks Inc. — All rights reserved


w
Pr nd
ie
te ic a
ev
Intelligent Workload Management
iva bl
Pr Pu

Efficient compute utilization

Workload Management is about efficient


Mixed Workloads Query Latency (Seconds)
compute utilization – when and where to run a Lower is better

query, when to scale up or down, controls for


cancelling an execution, etc

• Statement timeouts at workspace and query


level already available
• Additional ongoing investment in
• intelligent auto-scaling, adaptive routing &
remote result cache
• Automatic Data Layout Optimization
©2023 Databricks Inc. — All rights reserved
w
ie
ev
Pr

DBSQL System Tables


te
iva
Pr

Bronze tables providing visibility into platform activity

What statements were run > warehouse_events


by whom & when? > warehouses

How & when did warehouses scale?


> statement_history

What was I billed for?


> billing

Sign up @ tinyurl.com/sys-tables

©2023 Databricks Inc. — All rights reserved


Demo :
- Lakeview ( Dashboards VNext )
- Databricks Assistant
- LLM Functions

©2023 Databricks Inc. — All rights reserved


Ressources

Bill Immon

©2023 Databricks Inc. — All rights reserved


Q/A

©2023 Databricks Inc. — All rights reserved 36


Thank You !

©2023 Databricks Inc. — All rights reserved 37


©2023 Databricks Inc. — All rights reserved

You might also like