3.3查看JOIN算子执行
本节主要介绍Join, Aggregate等算子的trace的详细信息,从中可分析 出算子是否并行执行,采用具体何种方法执行等信息。
Join算子目前有:
SortMerge Join :主要用于非等值连接,如t1. a > t2.b;
Hash Join:主要用于等值连接,内部又细分为RR, One-Pass, Hybrid三 种方法,如t1.a=t2. a;
NestLoop Join :以上其他方法都失效,走嵌套循环连接;
Complex Join:主要用于包含Or的连接,Or的两端可以是Sort Merge也 可以是Hash Join;
3. 3. 1 sort merge join
BEGIN Join
cnd(0):
cnd(0): t1.i < t2. j
prepare to use sort-merge
Joining sorters created fo
inner join (T0 - T1), rows.
cnd(0) Done(time used: 0.0
等值连接走hash join。hash join有三种,一般情况下会走hybrid hash join
BEGIN Join
cnd(0):
cnd(0): lineitem.L_ORDERKEY = orders. O_ORDERKEY
prepare to use hash join
等值连接走hash join
op buffer size: 16777216, tuple width: 16. op buffer can hold 1048576 rows 算子 buffer 大小 16777216,join 字段每行 16 字节,buffer
可以装1048576行
Begin one-pass hash partitioning: divide 1500000 tuples into 6 parts. mat_buf_size = 2796200 看到'one-pass hash partitioning'代表
走 one-pass hash join
mat partition thread: divide 1500000 tuples into 6 parts, min 241835
tuples, max 258974 tuples, avg 250000 tuples.
Finish one-pass hash partitioning: divide 1500000 tuples into 6 parts. mat_buf_size = 2796200. (time used: 2.104s)
Begin one-pass hash partitioning: divide 6001215 tuples into 6 parts. mat_buf_size = 2796200 对 join 的右边划分
Finish one-pass hash partitioning: divide 6001215 tuples into 6 parts. mat_buf_size = 2796200. (time used: 5.589s)
Finish One-Pass Hash Join preparation: divided each side into 6 partitions
Hash tree is used, size: 16777216
Begin serial rowid merge-sorting: 6001215 rows
对join结果排序,这里serial可以sort可以看出join是串行的
Finish serial rowid merge-sorting: 6001215 rows. (time used: 5.806s) inner join(T0 - T1), using hash join, produced 6001215 rows.
cnd(0) Done(time used: 39.754s)
END Join(time used: 39. 755s)
BEGIN Join
prepare to use hash join
op buffer size: 16777216, tuple width: 16. op buffer can hold 1048576
rows Begin one-pass hash partitioning: divide 1500000 tuples into 48 parts. mat_buf_size = 43688 (2)mat partition thread: divide 187500 tuples into 48 parts, min 3463 tuples, max 4331 tuples, avg 3906 tuples. 前面多出的(2)表示子线程2,表示这是并行执行的,如果是串行则只会打 印一行 ( 6)mat partition thread: tuples, max 4191 tuples, ( 3)mat partition thread: tuples, max 4400 tuples, ( 5)mat partition thread: tuples, max 4320 tuples, |
divide 187500 tuples avg 3906 tuples. divide 187500 tuples avg 3906 tuples. divide 187500 tuples avg 3906 tuples. |
into |
into |
into |
48 parts, |
48 parts, |
48 parts, |
min |
min |
min |
3680 |
3413 |
3553 |
(7)mat partition thread: tuples, max 4169 tuples, (0)mat partition thread: tuples, max 4393 tuples, ( 4)mat partition thread: tuples, max 4147 tuples, ( 1)mat partition thread: tuples, max 4408 tuples, |
divide 187500 tuples avg 3906 tuples. divide 187500 tuples avg 3906 tuples. divide 187500 tuples avg 3906 tuples. divide 187500 tuples avg 3906 tuples. |
into 48 parts, min 3636 into 48 parts, min 3454 into 48 parts, min 3669 into 48 parts, min 3350 |
Finish one-pass hash partitioning: divide 1500000 tuples into 48 parts. mat_buf_size = 43688. (time used: 0.620s)
Begin rowid merge-sorting: 8 inputs, 6001215 rows 如果是串行会是Begin serial rowid merge-sorting
Finish rowid merge-sorting: 8 inputs, 6001215 rows(time used: 1.340s) Parallel join done. Merged tuple: 6001215 rows 这里也表示这是并
行执行的
cnd(0) Done(time used: 8.627s)
END Join(time used: 8. 631s)
BEGIN Join
cnd(0):
cnd(0): lineitem.L_ORDERKEY = orders. O_ORDERKEY
join |
RR 也是 hash join
Hash tree is used, size: 16777216 using multi pass hash join: 4 pass, 1500000 rows join 6001215 rows created hash on 493447/1500000 rows already. disable multi pass hash join scanned 2555904 rows, and produced 1974112 rows already.
created hash on 986894/1500000 rows already.
scanned 4653056 rows, and produced 3947396 rows already.
created hash on 1480341/1500000 rows already.
scanned 3576383 rows, and produced 5922493 rows already.
created hash on |
scanned 655360 rows, and produced 6001215 rows already.
skipped 52.44% DCs by smart index.
inner join (T0 - T1), using hash join, produced 6001215 rows.
cnd(0) Done(time used: 17.236s)
END Join(time used: 17. 236s)
BEGIN Join
cnd(0):
cnd(0): lineitem.L_ORDERKEY = orders. O_ORDERKEY
prepare to use hash join
divide to 8 blocks(using round-robin): 786432, 720896, 786432, 720896,
786432, 720896, 786432, 692799. 对数据切分,并行join
( 5)Hash tree is used, size: |
2097152 (5)子线 |
程号,并行join ( 6)Hash tree is used, size: |
2097152 |
( 0)Hash tree is used, size: |
2097152 |
( 3)Hash tree is used, size: |
2097152 |
( 2)Hash tree is used, size: |
2097152 |
( 7)Hash tree is used, size: |
2097152 |
( 1)Hash tree is used, size: |
2097152 |
Parallel join done. Merged tuple: 6001215 rows 并行 join
cnd(0) Done(time used: 19.728s)
END Join(time used: 19.728s) join 结
束,三种hash join中,只有rr hash join不需要排序
prepare to use hash join
traverse dim: 1500000 tuples, match dim: 6001215 tuples
—般取小表为traverse表,大表为match表
traverse tuple: total_size = 3, key_size = 3, tuple_size = 0, other_size
=0, pseudo_size = 0 参与join的宽度为3字节,hash字段3字节,其他 0字节
match tuple: total_size = 3, key_size = 3, tuple_size = 0, other_size = 0, pseudo_size = 0
Hybrid hash join: data will be split into 3 partitions
‘Hybrid hash join'标识,分3次运行,第一次在内存,后面先落盘
traverse dim: 1500000 tuples, match dim: 6001215 tuples
BEGIN hybrid hash join: partition(-1/3): traverse dim: 1500000 tuples, match dim: 6001215 tuples -1/3代表第一轮,在内存进行
Traverse: begin scanning from 0 to 1500000
创建hash表
HashTable: mem: total 16109940, desired 33000000, block_size 1048576 打印算子buffer,需要的内存,一个内存申请块大小
Sub 0: use 8388608+8008176; buckets: total 2097152, use 341165, avg 1.47, longest 9; rows: add 500511; count only table.
这里只有一^个Sub0标识是串行,use 8388608+8008176代表hash桶占用 8388608字节,hash字段占用8008176。共有2097152个桶,使用了 341165 个,平均冲突链1.47,最长9,共加载了 500511行,其中avg 一般在1~2 之间,如果太大说明数据分布不合理
JoinTables: mem: exp. 16777216; rows: exp. 1500000, cap. 720896, add 500511
Traverse finished. (time used: 1.323s)
创建hash表完毕,进入Match阶段
used: 3.291s) |
(time used: 4.619s) |
Match阶段完毕,进入提交阶段
Commit: begin ...
END hybrid hash join: partition(-1/3): 第一轮(-1/3)完毕
finish hash partitioning:
由于第一轮join是边join边划分的,所以第一轮join完毕后才能打印其他 2轮的统计信息 partition(0): traverse dim: 0 tuples, match dim: 0 tuples
partition(1): traverse dim: 499688 tuples, match dim: 1999733 tuples
partition(2): traverse dim: 499801 tuples, match dim: 1998529 tuples
END hybrid hash join: partition(0/3): (time used: 0.006s)
一般此轮为空,除非第一轮的算子buffer没装下所有的traverse表,此时终 止进hash表,让所有第一轮的数据先落盘,然后其与后面的轮次操作一样操 作
END hybrid hash join: partition(1/3): (time used: 0.825s)
后面一轮结束
END hybrid hash join: partition(2/3): (time used: 0.820s)
最后一轮结束
join: found 6001215 rows. (inner: 6001215, traverse outer: 0, match outer: 0)
inner join(T0 - T1), using hash join, produced 0 rows. cnd(0) Done(time used: 6.294s)
END Join(time used: 6.294s)
3.3.2.6 并行 Hybrid Hash Join
prepare to use hash join
traverse dim: 1500000 tuples, match dim: 6001215 tuples
traverse tuple: total_size = 7, key_size = 3, tuple_size = 4, other_size = 0, pseudo_size = 0
match tuple: total_size = 7, key_size = 3, tuple_size = 4, other_size = 0, pseudo_size = 0
divide to 8 blocks(using round-robin): 786432, 720896, 786432, 720896, 786432, 720896, 786432, 692799.
Hybrid hash join: data will be split into 2 partitions
BEGIN hybrid hash join: partition(-1/2) : traverse dim: 1500000 tuples, match dim: 6001215 tuples
( 5)Traverse: hash table is full.
hash表满,在-1轮出现意味着将装入第0轮操作
(1)HashTable: mem: total 2009770, desired 3375000, block_size 1572864
(1)Sub 0: use 1048576+1121100; buckets: total 262144, use 78921, avg 1.18, longest 5; rows: add 93425 多个Sub线程,代表这是并行
(1)Sub 1: use 1048576+1123404; buckets: total 262144, use 78908, avg
1.19, longest 5; rows: add 93617 (1)JoinTables: mem: exp. 16777216; rows: exp. 1500000, cap. 6990504, add 655933
END hybrid hash join: partition(T/2): (time used: 4.759s)
finish hash partitioning:
partition(0): traverse dim: 750122 tuples, match dim: 3000792 tuples partition(1): traverse dim: 749878 tuples, match dim: 3000423 tuples
BEGIN hybrid hash join: partition(0/2): traverse dim: 750122 tuples, match dim: 3000792 tuples
( 4)HashTable: mem: total 2009770, desired 3375000, block_size 1572864
( 7)Match finished. (time used: 1.901s)
无论在哪个partition发生多趟,都会多次进行traveser->match->commit
(0)Match finished. (time used: 1.529s)
END hybrid hash join: partition(0/2): (time used: 3.800s)
END hybrid hash join: partition(1/2): (time used: 4.211s)
join: found 6001215 rows. (inner: 6001215, traverse outer: 0, match outer: 0)
Begin rowid merge-sorting: 8 inputs, 6001215 rows
join后要排序
Finish rowid merge-sorting: 8 inputs, 6001215 rows(time used: 1.628s)
Parallel join done. Merged tuple: 6001215 rows
并行join结束
inner join(T0 - T1), using hash join, produced 6001215 rows.
cnd(0) Done(time used: 14.462s)
因为各种原因不能进行上面的join时,会转入最原始的Nest Loop Join 当trace里出现
nested-loop join 时表示进入此join
应尽量避免进入这种模式 下面的trace代表complex join。complex join本身不算一种join算法
cnd(0):
cnd(0): (tl.a = t2.c) or (tl.b = t2.d)
prepare to use complex join 存在 or
条件,一般会走complex join,先对or两边分别join,然后再取or
cnd(0): t1.a = t2. c
prepare to use hash join 先做左边
的 hash join
cnd(0) Done(time used: 0.003s)
cnd(0): t1.b = t2. d
prepare to use hash join 再做右
边的 hash join
cnd(0) Done(time used: 0.002s)
BEGIN Or(1 rows Or 1 rows) 对左右
两边结果取or
END Or (serial): produced 1 rows (time used: 0.000s)
cnd(0) Done(time used: 0.007s)
END Join(time used: 0. 007s) 整个
complex join 结束
3. 3.4 JOIN相关参数
A gbase_buffer_hj
算子 buffer 大小,对应 trace 里 JoinTables: mem: exp.xxx
A _gbase_hybrid_hash_joi n
0:不使用1:使用,此时不会考虑其他方式 2:内部调试,不建议开
启
A _gbase_one_pass_hash_join
_gbase_hybrid_hash_join 为 0 时,1: 使用 one-pass, 0 :使用 RR
A gbase_parallel_execution
1:并行 0:串行