Architecting Snowflake for High Concurrency
and High Performance
Robert Hardaway
Sr. Solutions Architect, Kyligence
Agenda
 Snowflake, Scaling, and Concurrency
 Kyligence Overview – Built for Scale
 Query Optimization - Pre-Compute
 Unified Semantic – Business Expression
 AI Augmentation – Adaptive Learning
 DEMO
© Kyligence Inc. 2020, Confidential.
Snowflake Is on Fire
Net Retention Rate 158%
NPS 71
Market Cap $105B
YOY Growth 121%
© Kyligence Inc. 2020, Confidential.
The First Cloud Native Analytics Platform
 Separates Storage from Compute
 Elastic Scale
 Scale-up
 Scale-out
 Load-and-go design
Why People Like Snowflake
© Kyligence Inc. 2020, Confidential.
Snowflake Coarse-Grained Hardware Scaling
Scale Up for High Data Volumes Scale Out for Higher Concurrency
© Kyligence Inc. 2020, Confidential.
What if…
…you could achieve predictable,
sub-second response times:
• For 90+% of your queries
• Against petabytes of data
• Supporting 100s -1,000s of concurrent users
© Kyligence Inc. 2020, Confidential.
Pre-computation delivers predictable results: Lookups vs. Computation
On-line Computation
O (N)
O (1)
Data Volume
Response Time
Precomputation
© Kyligence Inc. 2020, Confidential.
The Basic Process of Query
Sort
Agg
Filter
Table
Join
Table
No pre-calculation, all calculations
are made on-site
© Kyligence Inc. 2020, Confidential.
Querying Precomputed Results
Sort
Cube
Filter
Sort
Agg
Filter
Tables
Join
Pre-aggregated data
Table
Pre-calculation is available, and
results are based on less I/O,
less calculation, and lower
delay/latency.
Kyligence Overview
- Built for Scale
© Kyligence Inc. 2020, Confidential.
Apache Kylin
Top Level Project
 The only open-source distributed
OLAP platform
Award Winning
 InfoWorld’s Bossies 2015 & 2016
(Best of Open Source Software Awards)
Sub-Second Interactive Query
 Large scale, high concurrency, multi-
dimensional, sub-second query latency
1,000+ Organizations
 Adopted by thousands of
organizations globally
© Kyligence Inc. 2020, Confidential.
Kyligence Cloud Architecture
Unified Semantic Layer
Data as a Service
Machine Learning
Data as a Service
SaaS & Apps
CRM
HCM
SCM
Low Latency Queries
Any BI Tool
Any Cloud
Any Data Platform
Data Warehouse
Streaming Data
Data Lake
AI Augmented Engine
Precomputation Layer
Distributed Cubes
& Table Indexes
Multi-Dimensional
Modeling
Security &
Governance
Finance
Marketing
Sales
© Kyligence Inc. 2020, Confidential.
Modern OLAP
Large cubes in a single machine
Cubes distributed in
cluster
One logical cube
Processed by
distributed framework
© Kyligence Inc. 2020, Confidential.
Pre-computation: Flattening the Big-O Curve
On-line Computation
O (N)
O (1)
Pre-computation
Data Volume
Response Time
• MPP
• Inverted Index
• Memory DB
Functional Features:
• Ensures predictable query performance
• Does not falter with surge in data volume
• One-time computation for multiple use
• Optimizes resources usage
Functional Values:
• Provides stable high-performance and high-
concurrency services
• Saves resources and development costs
Query Optimization
- Pre-Compute
© Kyligence Inc. 2020, Confidential.
Query Routing & Prioritization
• Maximizes overall system performance
• Multiple data source support
• High-performance columnar storage engine
• Distributed query engine
• Comprehensive Raw Query(Table Index)
© Kyligence Inc. 2020, Confidential.
Introducing the Cuboid
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
• Cuboid = one combination of dimensions
• Cube = all combination of dimensions
(all cuboids)
OLAPCube
Unified Semantic
- Business Expression
© Kyligence Inc. 2020, Confidential.
Unified Business Definitions
Unified Data Sources Data Modeling
Business
Semantics
Self-Service
Analysis
 Integrate data sources from
Cloud DW, Data Lake,
RDBMS
 Intelligent Route Queries to
the best engine based on
queries
 Generate data model by Learn
from SQL pattern
 Intelligently recommend
dimensions and measures
 Identify critical business
scenarios and recommend
indexes
Sum(revenue)
Sales MTD
Sales YTD
Sales LM
Sales LY
Sales YOY
Sales MOM
 Business user can define
business metrics that
translate underlying data
into business language
Ad-hoc analysis
Business user
Marketing Sales Finance
 Business can analyze based on
unified dataset
 Query will fully reuse the pre-
calculation data model, boosting
analytic performance
© Kyligence Inc. 2020, Confidential.
Enterprise Governance
• Rich Vocabulary
• Single Source of Truth
• Accessible Audit
• Governance Platform Integration (REST)
AI Augmentation
- Adaptive Learning
© Kyligence Inc. 2020, Confidential.
Self-Learning
• Query history in summary
• Intelligent modeling and indexing
• One-click query acceleration
• Track usage to help manage models
• Make resource decisions based on facts
Learning to Improve Performance
© Kyligence Inc. 2020, Confidential.
Adaptive Model
• The model quickly adapts with business
• Unified semantic definitions
• Unlimited dimensions and measures
Learning to Reduce Operational Effort
DEMO
© Kyligence Inc. 2020, Confidential.
Kyligence Cloud: Extreme Analytics
Supports All Data Platforms
• Data Warehouse
• Data Lake
• Streaming Data
• Cloud Storage
Intelligent Precomputation
• Multi-dimensional cubes and
table Indexes
• AI-assisted query optimization,
data modeling
Supports All BI Tools
• Excel
• Power BI
• Tableau
• MicroStrategy
Massive Concurrency
• 1,000s+ concurrent queries
• 10s-100s analysts running
multi-threaded programs
Powered by Apache Kylin
© Kyligence Inc. 2020, Confidential.
Journey of Apache Kylin
Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016
Officially
Open Source
Project
Initiated
Apache
Incubator Project
InfoWorld
Best Open Source
Big Data Tool Award Kyligence Inc.
Founded
Apache Top-Level
Project
© Kyligence Inc. 2020, Confidential.
Kyligence = Kylin + Intelligence
Mission: Develop an AI-Assisted
SQL/OLAP Platform for Interactive
Analytics at Unprecedented Scale
• Founded in 2016 by the creators of Apache Kylin
• CRN Top-10 big data startups in 2018
• Global Presence: San Jose, Shanghai, Beijing, Seattle, New York
• VCs: Fidelity International, Shunwei Capital, Broadband Capital,
Coatue, Redpoint, Cisco
© Kyligence Inc. 2020, Confidential.
Contact Us
Kyligence Inc
 http://kyligence.io
 info@kyligence.io
 Twitter: @Kyligence
Apache Kylin
 http://kylin.apache.org
 dev@kylin.apache.org
 Twitter: @ApacheKylin
© Kyligence Inc. 2020, Confidential.

Architecting Snowflake for High Concurrency and High Performance

  • 1.
    Architecting Snowflake forHigh Concurrency and High Performance Robert Hardaway Sr. Solutions Architect, Kyligence
  • 2.
    Agenda  Snowflake, Scaling,and Concurrency  Kyligence Overview – Built for Scale  Query Optimization - Pre-Compute  Unified Semantic – Business Expression  AI Augmentation – Adaptive Learning  DEMO
  • 3.
    © Kyligence Inc.2020, Confidential. Snowflake Is on Fire Net Retention Rate 158% NPS 71 Market Cap $105B YOY Growth 121%
  • 4.
    © Kyligence Inc.2020, Confidential. The First Cloud Native Analytics Platform  Separates Storage from Compute  Elastic Scale  Scale-up  Scale-out  Load-and-go design Why People Like Snowflake
  • 5.
    © Kyligence Inc.2020, Confidential. Snowflake Coarse-Grained Hardware Scaling Scale Up for High Data Volumes Scale Out for Higher Concurrency
  • 6.
    © Kyligence Inc.2020, Confidential. What if… …you could achieve predictable, sub-second response times: • For 90+% of your queries • Against petabytes of data • Supporting 100s -1,000s of concurrent users
  • 7.
    © Kyligence Inc.2020, Confidential. Pre-computation delivers predictable results: Lookups vs. Computation On-line Computation O (N) O (1) Data Volume Response Time Precomputation
  • 8.
    © Kyligence Inc.2020, Confidential. The Basic Process of Query Sort Agg Filter Table Join Table No pre-calculation, all calculations are made on-site
  • 9.
    © Kyligence Inc.2020, Confidential. Querying Precomputed Results Sort Cube Filter Sort Agg Filter Tables Join Pre-aggregated data Table Pre-calculation is available, and results are based on less I/O, less calculation, and lower delay/latency.
  • 10.
  • 11.
    © Kyligence Inc.2020, Confidential. Apache Kylin Top Level Project  The only open-source distributed OLAP platform Award Winning  InfoWorld’s Bossies 2015 & 2016 (Best of Open Source Software Awards) Sub-Second Interactive Query  Large scale, high concurrency, multi- dimensional, sub-second query latency 1,000+ Organizations  Adopted by thousands of organizations globally
  • 12.
    © Kyligence Inc.2020, Confidential. Kyligence Cloud Architecture Unified Semantic Layer Data as a Service Machine Learning Data as a Service SaaS & Apps CRM HCM SCM Low Latency Queries Any BI Tool Any Cloud Any Data Platform Data Warehouse Streaming Data Data Lake AI Augmented Engine Precomputation Layer Distributed Cubes & Table Indexes Multi-Dimensional Modeling Security & Governance Finance Marketing Sales
  • 13.
    © Kyligence Inc.2020, Confidential. Modern OLAP Large cubes in a single machine Cubes distributed in cluster One logical cube Processed by distributed framework
  • 14.
    © Kyligence Inc.2020, Confidential. Pre-computation: Flattening the Big-O Curve On-line Computation O (N) O (1) Pre-computation Data Volume Response Time • MPP • Inverted Index • Memory DB Functional Features: • Ensures predictable query performance • Does not falter with surge in data volume • One-time computation for multiple use • Optimizes resources usage Functional Values: • Provides stable high-performance and high- concurrency services • Saves resources and development costs
  • 15.
  • 16.
    © Kyligence Inc.2020, Confidential. Query Routing & Prioritization • Maximizes overall system performance • Multiple data source support • High-performance columnar storage engine • Distributed query engine • Comprehensive Raw Query(Table Index)
  • 17.
    © Kyligence Inc.2020, Confidential. Introducing the Cuboid time, item time, item, location time, item, location, supplier time item location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) OLAPCube
  • 18.
  • 19.
    © Kyligence Inc.2020, Confidential. Unified Business Definitions Unified Data Sources Data Modeling Business Semantics Self-Service Analysis  Integrate data sources from Cloud DW, Data Lake, RDBMS  Intelligent Route Queries to the best engine based on queries  Generate data model by Learn from SQL pattern  Intelligently recommend dimensions and measures  Identify critical business scenarios and recommend indexes Sum(revenue) Sales MTD Sales YTD Sales LM Sales LY Sales YOY Sales MOM  Business user can define business metrics that translate underlying data into business language Ad-hoc analysis Business user Marketing Sales Finance  Business can analyze based on unified dataset  Query will fully reuse the pre- calculation data model, boosting analytic performance
  • 20.
    © Kyligence Inc.2020, Confidential. Enterprise Governance • Rich Vocabulary • Single Source of Truth • Accessible Audit • Governance Platform Integration (REST)
  • 21.
  • 22.
    © Kyligence Inc.2020, Confidential. Self-Learning • Query history in summary • Intelligent modeling and indexing • One-click query acceleration • Track usage to help manage models • Make resource decisions based on facts Learning to Improve Performance
  • 23.
    © Kyligence Inc.2020, Confidential. Adaptive Model • The model quickly adapts with business • Unified semantic definitions • Unlimited dimensions and measures Learning to Reduce Operational Effort
  • 24.
  • 25.
    © Kyligence Inc.2020, Confidential. Kyligence Cloud: Extreme Analytics Supports All Data Platforms • Data Warehouse • Data Lake • Streaming Data • Cloud Storage Intelligent Precomputation • Multi-dimensional cubes and table Indexes • AI-assisted query optimization, data modeling Supports All BI Tools • Excel • Power BI • Tableau • MicroStrategy Massive Concurrency • 1,000s+ concurrent queries • 10s-100s analysts running multi-threaded programs Powered by Apache Kylin
  • 26.
    © Kyligence Inc.2020, Confidential. Journey of Apache Kylin Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016 Officially Open Source Project Initiated Apache Incubator Project InfoWorld Best Open Source Big Data Tool Award Kyligence Inc. Founded Apache Top-Level Project
  • 27.
    © Kyligence Inc.2020, Confidential. Kyligence = Kylin + Intelligence Mission: Develop an AI-Assisted SQL/OLAP Platform for Interactive Analytics at Unprecedented Scale • Founded in 2016 by the creators of Apache Kylin • CRN Top-10 big data startups in 2018 • Global Presence: San Jose, Shanghai, Beijing, Seattle, New York • VCs: Fidelity International, Shunwei Capital, Broadband Capital, Coatue, Redpoint, Cisco
  • 28.
    © Kyligence Inc.2020, Confidential. Contact Us Kyligence Inc  http://kyligence.io  [email protected]  Twitter: @Kyligence Apache Kylin  http://kylin.apache.org  [email protected]  Twitter: @ApacheKylin
  • 29.
    © Kyligence Inc.2020, Confidential.