Query Compilation in Impala
Query Compilation in Impala
Alexander Behm | Software Engineer
May 2014 @ Impala User Group
Query Compilation in Impala
Compile Query
Execute Query
Client
Client
SQL Text
Executable Plan
Query Results
Impala Frontend
(Java)
Impala Backend
(C++)
Focus of this talk
Flow of a SQL Query
Query Compilation in Impala
Client
SQL Text
Executable Plan
Query Compilation
Query
Compiler
SQL
Parsing
Semantic
Analysis
Query
Planning
Parse Tree
Parse Tree + Analyzer
Query Compilation in Impala
Query Parsing
SELECT c1, SUM(c2)
FROM t1 JOIN t2 USING(id)
WHERE c3 > 10 GROUP BY c1
SelectList TableRefs WhereClause
SelectStmt
GroupByClause
ColRef AggExpr
ColRef
BinaryPredicate
ColRef IntLiteral
ColRefTableRef TableRef
UsingClause
ColRef
• Applies SQL grammar, reports syntax errors
• Produces parse tree capturing syntactic structure of query
Query Compilation in Impala
Semantic Analysis…
• Precondition: Query is syntactically valid. Analysis operates on parse tree.
• Consults table metadata
• Do t1 and t2 exist? Does c1 exist in t1 or t2 (or both  error)? Does id exist in t1 and t2?
• Does the user have privileges to SELECT from t1?
• Checks type compatibility of expressions, adds implicit casts
• c3 > 10  c3 > cast(10 as bigint)
• SQL rules (semantic, not syntactic)
• Does c1 appear in the GROUP BY clause?
SELECT c1, SUM(c2)
FROM t1 JOIN t2 USING(id)
WHERE c3 > 10 GROUP BY c1
Query Compilation in Impala
… Semantic Analysis
• Expression substitution for views
• Resolve column references against base tables
• Preparation for Planning
• Register state in analyzer for correct predicate assignment during planning
• Register predicates (WHERE, HAVING, ON, USING, etc.)
• Register outer-joined tables
• Compute value-transfer graph and equivalence classes for predicate inference
• (…)
• Postcondition: Query is valid. An executable plan can be produced.
SELECT c1, SUM(c2)
FROM (SELECT dept AS c1, revenue AS c2,
month AS c3 FROM t1) AS v
WHERE c3 > 10 GROUP BY c1
SELECT dept, SUM(revenue)
FROM t1
WHERE month > 10
GROUP BY dept
Query Compilation in Impala
• Generate executable plan (“tree” of operators)
• Maximize scan locality using DN block metadata
• Minimize data movement
• Full distribution of operators
• Query operators
• Scan, HashJoin, HashAggregation, Union, TopN,
Exchange
Query Planning: Goals
Query Compilation in Impala
Query Planning: Overview
Semantic
Analysis
Parse Tree + Analyzer
Query
Planner
Walk Parse Tree
Parallelize
& Fragment
Single-node Plan
Executable Plan
Query Compilation in Impala
Query Planning: Single-Node Plan
• Four major functions:
1. Parse Tree  Plan Tree
2. Assigns predicates to lowest plan node
3. Optimizes join order
4. Prunes irrelevant columns
Query Compilation in Impala
Parse Tree  Single-Node Plan Tree
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
Query Compilation in Impala
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
Predicate Assignment & Inference
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
COUNT(t2.revenue) > 10
t1.id2 = t3.id
t1.id1 = t2.id
id1 > 10
category = ‘Online’
id > 10
Inferred
Predicate
Query Compilation in Impala
Join-Order Optimization
• Inner joins are commutative and associative
• Query results correct independent of execution order
• Query execution costs vary dramatically!
• Hash table sizes, network transfers, #hash lookups
• Join-order optimization
• Impala only considers left-deep join trees
• (Right join input is a table, not another join)
• Find cheapest valid join order
• Relies heavily on table and column statistics
• Limitation: Choice of join order independent of join strategy
Query Compilation in Impala
Invalid Join Orders
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
No explicit or implicit
predicate between t2 and t3
Query Compilation in Impala
Join-Order Optimization
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
HashJoin
Scan: t1
Scan: t2
Scan: t3
HashJoin
HashJoin
Scan: t2
Scan: t3
Scan: t1
HashJoin
HashJoin
Scan: t2
Scan: t1
Scan: t3
HashJoin
HashJoin
Scan: t3
Scan: t2
Scan: t1
HashJoin
HashJoin
Scan: t3
Scan: t1
Scan: t2
HashJoin
Order:
t1, t2, t3
Order:
t1, t3, t2
Order:
t2, t1, t3
Order:
t2, t3, t1
Order:
t3, t1, t2
Order:
t3, t2, t1
Query Compilation in Impala
Join-Order Optimization
• Impala’s Implementation:
1. Heuristic
• Order tables descending by size
• Best plan typically has largest table on the left (if valid)
2. Plan enumeration & costing
• Generate all possible join orders starting from a given
left-most table (starting with largest one)
• Ignore invalid join orders
• Estimate intermediate result sizes (key!)
• Choose plan that minimizes intermediate result sizes
Query Compilation in Impala
Query Planning: Overview
Semantic
Analysis
Parse Tree + Analyzer
Query
Planner
Walk Parse Tree
Parallelize
& Fragment
Single-node Plan
Executable Plan
Query Compilation in Impala
Query Planning: Distributed Plans
• Distributed Aggregation
• Pre-aggregation where data is first materialized
• Merge-aggregation partitioned by grouping columns
• Distinct aggregation: additional level of pre- and merge aggregation
• Distributed Top-N
• Initial Top-N where data is first materialized
• Final Top-N at coordinator
• Distributed Union
• Pre-aggregation/top-n placed into plans of each union operand
• Union-operand plans executed in parallel, merged via exchange
• Above strategies are currently fixed in Impala
• Independent of column/table stats
Query Compilation in Impala
Query Planning: Distributed Joins
• Broadcast Join
• Join is co-located with left input
• Broadcast right input to all nodes executing join
• Build hash table on right input, streaming probe from left input
•  Preferred for small right side (relative to left side)
• Partitioned Join
• Both tables hash-partitioned on join columns
• Same build/probe procedure as above
•  Preferred for joins where both left and right side are large
• Cost-based decision based on table/column stats
• Minimize required network transfer
Query Compilation in Impala
Query Planning: Distributed Plans
HashJoinScan: t2
Scan: t3
Scan: t1
HashJoin
TopN
Pre-Agg
MergeAgg
TopN
Broadcast
Merge
hash t2.idhash t1.id1
hash
t1.custid
at HDFS DN
at HBase RS
at coordinator
HashJoin
Scan: t2
Scan: t3
Scan: t1
HashJoin
TopN
Agg
Single-Node
Plan
Query Compilation in Impala
Explain Example: TPCDS Q42
SELECT d.d_year, i.i_category_id, i.i_category, SUM(ss_ext_sales_price)
FROM store_sales ss
JOIN date_dim d
ON (ss.ss_sold_date_sk = d.d_date_sk)
JOIN item i
ON (ss.ss_item_sk = i.i_item_sk)
WHERE i.i_manager_id = 1 AND d.d_moy = 12 AND d.d_year = 1998
GROUP BY d.d_year, i.i_category_id, i.i_category
ORDER BY total_sales DESC, d_year, i_category_id, i_category
LIMIT 100
Query Compilation in Impala
Explain Example: TPCDS Q42
+-----------------------------------------------------+
| Explain String |
+-----------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0 |
| |
| 06:TOP-N [LIMIT=100] |
| 05:AGGREGATE [FINALIZE] |
| 04:HASH JOIN [INNER JOIN] |
| |--02:SCAN HDFS [tpcds1000gb.item i] |
| 03:HASH JOIN [INNER JOIN] |
| |--01:SCAN HDFS [tpcds1000gb.date_dim d] |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss] |
+-----------------------------------------------------+
set explain_level=0;
set num_nodes=1;
Query Compilation in Impala
Explain Example: TPCDS Q42
+---------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=3.76GB VCores=3 |
| |
| 12:TOP-N [LIMIT=100] |
| 11:EXCHANGE [PARTITION=UNPARTITIONED] |
| 06:TOP-N [LIMIT=100] |
| 10:AGGREGATE [MERGE FINALIZE] |
| 09:EXCHANGE [PARTITION=HASH(d.d_year,i.i_category_id,i.i_category)] |
| 05:AGGREGATE |
| 04:HASH JOIN [INNER JOIN, BROADCAST] |
| |--08:EXCHANGE [BROADCAST] |
| | 02:SCAN HDFS [tpcds1000gb.item i] |
| 03:HASH JOIN [INNER JOIN, BROADCAST] |
| |--07:EXCHANGE [BROADCAST] |
| | 01:SCAN HDFS [tpcds1000gb.date_dim d] |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss] |
+---------------------------------------------------------------------+
set explain_level=0;
set num_nodes=0;
Query Compilation in Impala
Explain Example: TPCDS Q42
| …
| 03:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: ss.ss_sold_date_sk = d.d_date_sk |
| | hosts=10 per-host-mem=511B |
| | tuple-ids=0,1 row-size=40B cardinality=8251124389 |
| | |
| |--07:EXCHANGE [BROADCAST] |
| | | hosts=3 per-host-mem=0B |
| | | tuple-ids=1 row-size=16B cardinality=29 |
| | | |
| | 01:SCAN HDFS [tpcds1000gb.date_dim d, PARTITION=RANDOM] |
| | partitions=1/1 size=9.77MB |
| | predicates: d.d_moy = 12, d.d_year = 1998 |
| | table stats: 73049 rows total |
| | column stats: all |
| | hosts=3 per-host-mem=48.00MB |
| | tuple-ids=1 row-size=16B cardinality=29 |
| | |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss, PARTITION=RANDOM] |
| partitions=1823/1823 size=1.10TB |
| table stats: 8251124389 rows total |
| column stats: all |
| hosts=10 per-host-mem=3.75GB |
| tuple-ids=0 row-size=24B cardinality=8251124389 |
+--------------------------------------------------------------+
set explain_level=2;
set num_nodes=0;
Query Compilation in Impala
Conclusion
• Cost-based choice of join order and strategy
• Critical for performance
• Relies on table and column stats
• Other plan optimizations currently independent of stats
• Likely to expand plan choices in the future
• Likely to increase reliance on stats
• Helpful Impala commands
• compute stats
• show table/column stats
• explain query/insert stmt
• set explain_level=[0-3]
• set num_nodes=0  show single-node plan
Query Compilation in Impala
Try It Out!
•Questions/comments?
• Download: cloudera.com/impala
• Email: impala-user@cloudera.org
• Join: groups.cloudera.org
Query Compilation in Impala

Query Compilation in Impala

  • 1.
    Query Compilation inImpala Query Compilation in Impala Alexander Behm | Software Engineer May 2014 @ Impala User Group
  • 2.
    Query Compilation inImpala Compile Query Execute Query Client Client SQL Text Executable Plan Query Results Impala Frontend (Java) Impala Backend (C++) Focus of this talk Flow of a SQL Query
  • 3.
    Query Compilation inImpala Client SQL Text Executable Plan Query Compilation Query Compiler SQL Parsing Semantic Analysis Query Planning Parse Tree Parse Tree + Analyzer
  • 4.
    Query Compilation inImpala Query Parsing SELECT c1, SUM(c2) FROM t1 JOIN t2 USING(id) WHERE c3 > 10 GROUP BY c1 SelectList TableRefs WhereClause SelectStmt GroupByClause ColRef AggExpr ColRef BinaryPredicate ColRef IntLiteral ColRefTableRef TableRef UsingClause ColRef • Applies SQL grammar, reports syntax errors • Produces parse tree capturing syntactic structure of query
  • 5.
    Query Compilation inImpala Semantic Analysis… • Precondition: Query is syntactically valid. Analysis operates on parse tree. • Consults table metadata • Do t1 and t2 exist? Does c1 exist in t1 or t2 (or both  error)? Does id exist in t1 and t2? • Does the user have privileges to SELECT from t1? • Checks type compatibility of expressions, adds implicit casts • c3 > 10  c3 > cast(10 as bigint) • SQL rules (semantic, not syntactic) • Does c1 appear in the GROUP BY clause? SELECT c1, SUM(c2) FROM t1 JOIN t2 USING(id) WHERE c3 > 10 GROUP BY c1
  • 6.
    Query Compilation inImpala … Semantic Analysis • Expression substitution for views • Resolve column references against base tables • Preparation for Planning • Register state in analyzer for correct predicate assignment during planning • Register predicates (WHERE, HAVING, ON, USING, etc.) • Register outer-joined tables • Compute value-transfer graph and equivalence classes for predicate inference • (…) • Postcondition: Query is valid. An executable plan can be produced. SELECT c1, SUM(c2) FROM (SELECT dept AS c1, revenue AS c2, month AS c3 FROM t1) AS v WHERE c3 > 10 GROUP BY c1 SELECT dept, SUM(revenue) FROM t1 WHERE month > 10 GROUP BY dept
  • 7.
    Query Compilation inImpala • Generate executable plan (“tree” of operators) • Maximize scan locality using DN block metadata • Minimize data movement • Full distribution of operators • Query operators • Scan, HashJoin, HashAggregation, Union, TopN, Exchange Query Planning: Goals
  • 8.
    Query Compilation inImpala Query Planning: Overview Semantic Analysis Parse Tree + Analyzer Query Planner Walk Parse Tree Parallelize & Fragment Single-node Plan Executable Plan
  • 9.
    Query Compilation inImpala Query Planning: Single-Node Plan • Four major functions: 1. Parse Tree  Plan Tree 2. Assigns predicates to lowest plan node 3. Optimizes join order 4. Prunes irrelevant columns
  • 10.
    Query Compilation inImpala Parse Tree  Single-Node Plan Tree HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10
  • 11.
    Query Compilation inImpala SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10 Predicate Assignment & Inference HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg COUNT(t2.revenue) > 10 t1.id2 = t3.id t1.id1 = t2.id id1 > 10 category = ‘Online’ id > 10 Inferred Predicate
  • 12.
    Query Compilation inImpala Join-Order Optimization • Inner joins are commutative and associative • Query results correct independent of execution order • Query execution costs vary dramatically! • Hash table sizes, network transfers, #hash lookups • Join-order optimization • Impala only considers left-deep join trees • (Right join input is a table, not another join) • Find cheapest valid join order • Relies heavily on table and column statistics • Limitation: Choice of join order independent of join strategy
  • 13.
    Query Compilation inImpala Invalid Join Orders SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10 No explicit or implicit predicate between t2 and t3
  • 14.
    Query Compilation inImpala Join-Order Optimization HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin HashJoin Scan: t1 Scan: t2 Scan: t3 HashJoin HashJoin Scan: t2 Scan: t3 Scan: t1 HashJoin HashJoin Scan: t2 Scan: t1 Scan: t3 HashJoin HashJoin Scan: t3 Scan: t2 Scan: t1 HashJoin HashJoin Scan: t3 Scan: t1 Scan: t2 HashJoin Order: t1, t2, t3 Order: t1, t3, t2 Order: t2, t1, t3 Order: t2, t3, t1 Order: t3, t1, t2 Order: t3, t2, t1
  • 15.
    Query Compilation inImpala Join-Order Optimization • Impala’s Implementation: 1. Heuristic • Order tables descending by size • Best plan typically has largest table on the left (if valid) 2. Plan enumeration & costing • Generate all possible join orders starting from a given left-most table (starting with largest one) • Ignore invalid join orders • Estimate intermediate result sizes (key!) • Choose plan that minimizes intermediate result sizes
  • 16.
    Query Compilation inImpala Query Planning: Overview Semantic Analysis Parse Tree + Analyzer Query Planner Walk Parse Tree Parallelize & Fragment Single-node Plan Executable Plan
  • 17.
    Query Compilation inImpala Query Planning: Distributed Plans • Distributed Aggregation • Pre-aggregation where data is first materialized • Merge-aggregation partitioned by grouping columns • Distinct aggregation: additional level of pre- and merge aggregation • Distributed Top-N • Initial Top-N where data is first materialized • Final Top-N at coordinator • Distributed Union • Pre-aggregation/top-n placed into plans of each union operand • Union-operand plans executed in parallel, merged via exchange • Above strategies are currently fixed in Impala • Independent of column/table stats
  • 18.
    Query Compilation inImpala Query Planning: Distributed Joins • Broadcast Join • Join is co-located with left input • Broadcast right input to all nodes executing join • Build hash table on right input, streaming probe from left input •  Preferred for small right side (relative to left side) • Partitioned Join • Both tables hash-partitioned on join columns • Same build/probe procedure as above •  Preferred for joins where both left and right side are large • Cost-based decision based on table/column stats • Minimize required network transfer
  • 19.
    Query Compilation inImpala Query Planning: Distributed Plans HashJoinScan: t2 Scan: t3 Scan: t1 HashJoin TopN Pre-Agg MergeAgg TopN Broadcast Merge hash t2.idhash t1.id1 hash t1.custid at HDFS DN at HBase RS at coordinator HashJoin Scan: t2 Scan: t3 Scan: t1 HashJoin TopN Agg Single-Node Plan
  • 20.
    Query Compilation inImpala Explain Example: TPCDS Q42 SELECT d.d_year, i.i_category_id, i.i_category, SUM(ss_ext_sales_price) FROM store_sales ss JOIN date_dim d ON (ss.ss_sold_date_sk = d.d_date_sk) JOIN item i ON (ss.ss_item_sk = i.i_item_sk) WHERE i.i_manager_id = 1 AND d.d_moy = 12 AND d.d_year = 1998 GROUP BY d.d_year, i.i_category_id, i.i_category ORDER BY total_sales DESC, d_year, i_category_id, i_category LIMIT 100
  • 21.
    Query Compilation inImpala Explain Example: TPCDS Q42 +-----------------------------------------------------+ | Explain String | +-----------------------------------------------------+ | Estimated Per-Host Requirements: Memory=0B VCores=0 | | | | 06:TOP-N [LIMIT=100] | | 05:AGGREGATE [FINALIZE] | | 04:HASH JOIN [INNER JOIN] | | |--02:SCAN HDFS [tpcds1000gb.item i] | | 03:HASH JOIN [INNER JOIN] | | |--01:SCAN HDFS [tpcds1000gb.date_dim d] | | 00:SCAN HDFS [tpcds1000gb.store_sales ss] | +-----------------------------------------------------+ set explain_level=0; set num_nodes=1;
  • 22.
    Query Compilation inImpala Explain Example: TPCDS Q42 +---------------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=3.76GB VCores=3 | | | | 12:TOP-N [LIMIT=100] | | 11:EXCHANGE [PARTITION=UNPARTITIONED] | | 06:TOP-N [LIMIT=100] | | 10:AGGREGATE [MERGE FINALIZE] | | 09:EXCHANGE [PARTITION=HASH(d.d_year,i.i_category_id,i.i_category)] | | 05:AGGREGATE | | 04:HASH JOIN [INNER JOIN, BROADCAST] | | |--08:EXCHANGE [BROADCAST] | | | 02:SCAN HDFS [tpcds1000gb.item i] | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | |--07:EXCHANGE [BROADCAST] | | | 01:SCAN HDFS [tpcds1000gb.date_dim d] | | 00:SCAN HDFS [tpcds1000gb.store_sales ss] | +---------------------------------------------------------------------+ set explain_level=0; set num_nodes=0;
  • 23.
    Query Compilation inImpala Explain Example: TPCDS Q42 | … | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: ss.ss_sold_date_sk = d.d_date_sk | | | hosts=10 per-host-mem=511B | | | tuple-ids=0,1 row-size=40B cardinality=8251124389 | | | | | |--07:EXCHANGE [BROADCAST] | | | | hosts=3 per-host-mem=0B | | | | tuple-ids=1 row-size=16B cardinality=29 | | | | | | | 01:SCAN HDFS [tpcds1000gb.date_dim d, PARTITION=RANDOM] | | | partitions=1/1 size=9.77MB | | | predicates: d.d_moy = 12, d.d_year = 1998 | | | table stats: 73049 rows total | | | column stats: all | | | hosts=3 per-host-mem=48.00MB | | | tuple-ids=1 row-size=16B cardinality=29 | | | | | 00:SCAN HDFS [tpcds1000gb.store_sales ss, PARTITION=RANDOM] | | partitions=1823/1823 size=1.10TB | | table stats: 8251124389 rows total | | column stats: all | | hosts=10 per-host-mem=3.75GB | | tuple-ids=0 row-size=24B cardinality=8251124389 | +--------------------------------------------------------------+ set explain_level=2; set num_nodes=0;
  • 24.
    Query Compilation inImpala Conclusion • Cost-based choice of join order and strategy • Critical for performance • Relies on table and column stats • Other plan optimizations currently independent of stats • Likely to expand plan choices in the future • Likely to increase reliance on stats • Helpful Impala commands • compute stats • show table/column stats • explain query/insert stmt • set explain_level=[0-3] • set num_nodes=0  show single-node plan
  • 25.
    Query Compilation inImpala Try It Out! •Questions/comments? • Download: cloudera.com/impala • Email: [email protected] • Join: groups.cloudera.org
  • 26.