Apache Calcite Overview 
Julian Hyde Julian Hyde 
Page 1 © Hortonworks Inc. 2014 
Kylin Meetup (eBay, San Jose) 
December 4th, 2014
Apache Calcite 
Apache incubator project since May, 2014 
Originally named Optiq 
Query planning framework 
Relational algebra, rewrite rules, cost model 
Extensible 
Packaging 
Library (JDBC server optional) 
Community-authored rules, adapters 
Adoption 
Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Apache Kylin 
Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory, Phoenix 
Page 2 © Hortonworks Inc. 2014
Conventional DB architecture 
Page 3 © Hortonworks Inc. 2014
Calcite architecture 
Page 4 © Hortonworks Inc. 2014
Expression tree 
Splunk 
Table: splunk 
MySQL 
Page 5 © Hortonworks Inc. 2014 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
scan 
Table: products
Expression tree 
(optimized) 
Splunk 
Table: splunk 
Page 6 © Hortonworks Inc. 2014 
SELECT p.“product_name”, COUNT(*) AS c 
FROM “splunk”.”splunk” AS s 
JOIN “mysql”.”products” AS p 
ON s.”product_id” = p.”product_id” 
WHERE s.“action” = 'purchase' 
GROUP BY p.”product_name” 
ORDER BY c DESC 
Key: product_id 
join 
Key: product_name 
Agg: count 
group 
Condition: 
action = 
'purchase' 
filter 
Key: c DESC 
sort 
scan 
MySQL 
scan 
Table: products
Defining a rule 
Page 7 © Hortonworks Inc. 2014 
class FilterIntoJoinRule extends RelOptRule { 
public FilterIntoJoinRule() { 
super( 
operand(Filter.class, 
operand(Join.class, any()))); 
} 
public void onMatch(RelOptRuleCall call) { 
Filter filter = call.rel(0); 
Join join = call.rel(1); 
Filter newFilter = ...; 
Join newJoin = ...; 
call.transformTo(newJoin); 
} 
} 
Filter 
Join Filter’ 
Join’ 
R1 R2 R1 R2
Calcite – APIs and SPIs 
Relational algebra 
RelNode (operator) 
• Scan 
• Filter 
• Project 
• Union 
• Aggregate 
• … 
RelDataType (type) 
RexNode (expression) 
RelTrait (physical property) 
• RelConvention (calling-convention) 
• RelCollation (sortedness) 
• TBD (bucketedness/distribution) JDBC driver 
Page 8 © Hortonworks Inc. 2014 
Cost, statistics 
RelOptCost 
RelOptCostFactory 
RelMetadataProvider 
• RelMdColumnUniquensss 
• RelMdDistinctRowCount 
• RelMdSelectivity 
SQL parser 
SqlNode 
SqlParser 
SqlValidator 
Transformation rules 
RelOptRule 
• MergeFilterRule 
• PushAggregateThroughUni 
onRule 
• RemoveCorrelationForScal 
arProjectRule 
• 100+ more 
Unification (materialized view) 
Column trimming 
Metadata 
Schema 
Table 
Function 
• TableFunction 
• TableMacro
Thank you! 
@julianhyde 
http://calcite.incubator.apache.org/ 
Page 9 © Hortonworks Inc. 2014

Apache Calcite overview

  • 1.
    Apache Calcite Overview Julian Hyde Julian Hyde Page 1 © Hortonworks Inc. 2014 Kylin Meetup (eBay, San Jose) December 4th, 2014
  • 2.
    Apache Calcite Apacheincubator project since May, 2014 Originally named Optiq Query planning framework Relational algebra, rewrite rules, cost model Extensible Packaging Library (JDBC server optional) Community-authored rules, adapters Adoption Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Apache Kylin Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory, Phoenix Page 2 © Hortonworks Inc. 2014
  • 3.
    Conventional DB architecture Page 3 © Hortonworks Inc. 2014
  • 4.
    Calcite architecture Page4 © Hortonworks Inc. 2014
  • 5.
    Expression tree Splunk Table: splunk MySQL Page 5 © Hortonworks Inc. 2014 SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan scan Table: products
  • 6.
    Expression tree (optimized) Splunk Table: splunk Page 6 © Hortonworks Inc. 2014 SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Key: product_id join Key: product_name Agg: count group Condition: action = 'purchase' filter Key: c DESC sort scan MySQL scan Table: products
  • 7.
    Defining a rule Page 7 © Hortonworks Inc. 2014 class FilterIntoJoinRule extends RelOptRule { public FilterIntoJoinRule() { super( operand(Filter.class, operand(Join.class, any()))); } public void onMatch(RelOptRuleCall call) { Filter filter = call.rel(0); Join join = call.rel(1); Filter newFilter = ...; Join newJoin = ...; call.transformTo(newJoin); } } Filter Join Filter’ Join’ R1 R2 R1 R2
  • 8.
    Calcite – APIsand SPIs Relational algebra RelNode (operator) • Scan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sortedness) • TBD (bucketedness/distribution) JDBC driver Page 8 © Hortonworks Inc. 2014 Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode SqlParser SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUni onRule • RemoveCorrelationForScal arProjectRule • 100+ more Unification (materialized view) Column trimming Metadata Schema Table Function • TableFunction • TableMacro
  • 9.
    Thank you! @julianhyde http://calcite.incubator.apache.org/ Page 9 © Hortonworks Inc. 2014