执行日志（3）查看JOIN算子执行_算子 sortmergejoin失败-CSDN博客

本文链接：https://blog.csdn.net/aisirea/article/details/128075273

本文深入探讨了数据库中的JOIN操作，包括Sort Merge Join、Hash Join（One-Pass、Hybrid）和NestLoop Join。通过示例详细解析了各类型JOIN的工作原理和执行过程，如Hash Join的内存分配、多路复用和并行执行策略，旨在帮助理解JOIN算子的执行细节和优化策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

3.3查看JOIN算子执行

本节主要介绍Join, Aggregate等算子的trace的详细信息，从中可分析出算子是否并行执行，采用具体何种方法执行等信息。

Join算子目前有：

SortMerge Join :主要用于非等值连接，如t1. a > t2.b；

Hash Join：主要用于等值连接，内部又细分为RR， One-Pass， Hybrid三种方法，如t1.a=t2. a；

NestLoop Join :以上其他方法都失效，走嵌套循环连接；

Complex Join:主要用于包含Or的连接，Or的两端可以是Sort Merge也可以是Hash Join；

3. 3. 1 sort merge join

BEGIN Join

cnd(0):

cnd(0): t1.i < t2. j

prepare to use sort-merge

Joining sorters created fo

inner join (T0 - T1), rows.

cnd(0) Done(time used: 0.0

END Join(time used: 0.003s

3. 3. 2 Hash Join

等值连接走hash join。hash join有三种，一般情况下会走hybrid hash join

串行 One Pass Hash Join

BEGIN Join

cnd(0):

cnd(0): lineitem.L_ORDERKEY = orders. O_ORDERKEY

prepare to use hash join

等值连接走hash join

op buffer size: 16777216, tuple width: 16. op buffer can hold 1048576 rows 算子 buffer 大小 16777216，join 字段每行 16 字节，buffer

可以装1048576行

Begin one-pass hash partitioning: divide 1500000 tuples into 6 parts. mat_buf_size = 2796200 看到'one-pass hash partitioning'代表

走 one-pass hash join

mat partition thread: divide 1500000 tuples into 6 parts, min 241835
tuples, max 258974 tuples, avg 250000 tuples.

Finish one-pass hash partitioning: divide 1500000 tuples into 6 parts. mat_buf_size = 2796200. (time used: 2.104s)

Begin one-pass hash partitioning: divide 6001215 tuples into 6 parts. mat_buf_size = 2796200 对 join 的右边划分

Finish one-pass hash partitioning: divide 6001215 tuples into 6 parts. mat_buf_size = 2796200. (time used: 5.589s)

Finish One-Pass Hash Join preparation: divided each side into 6 partitions

Hash tree is used, size: 16777216

Begin serial rowid merge-sorting: 6001215 rows

对join结果排序，这里serial可以sort可以看出join是串行的

Finish serial rowid merge-sorting: 6001215 rows. (time used: 5.806s) inner join(T0 - T1), using hash join, produced 6001215 rows.

cnd(0) Done(time used: 39.754s)

END Join(time used: 39. 755s)

并行 One Pass Hash Join

BEGIN Join

prepare to use hash join

op buffer size: 16777216, tuple width: 16. op buffer can hold 1048576

rows

Begin one-pass hash partitioning: divide 1500000 tuples into 48 parts. mat_buf_size = 43688

(2)mat partition thread: divide 187500 tuples into 48 parts, min 3463 tuples, max 4331 tuples, avg 3906 tuples.

前面多出的(2)表示子线程2,表示这是并行执行的，如果是串行则只会打印一行

( 6)mat partition thread:

tuples, max 4191 tuples,

( 3)mat partition thread:

tuples, max 4400 tuples,

( 5)mat partition thread:

tuples, max 4320 tuples,

divide 187500 tuples avg 3906 tuples.

into

48 parts,

min

3680

3413

3553

(7)mat partition thread: tuples, max 4169 tuples,

(0)mat partition thread: tuples, max 4393 tuples,

( 4)mat partition thread: tuples, max 4147 tuples,

( 1)mat partition thread: tuples, max 4408 tuples,

divide 187500 tuples avg 3906 tuples.

into 48 parts, min 3636

into 48 parts, min 3454

into 48 parts, min 3669

into 48 parts, min 3350

Finish one-pass hash partitioning: divide 1500000 tuples into 48 parts. mat_buf_size = 43688. (time used: 0.620s)

Begin rowid merge-sorting: 8 inputs, 6001215 rows 如果是串行会是Begin serial rowid merge-sorting

Finish rowid merge-sorting: 8 inputs, 6001215 rows(time used: 1.340s) Parallel join done. Merged tuple: 6001215 rows 这里也表示这是并

行执行的

cnd(0) Done(time used: 8.627s)

END Join(time used: 8. 631s)

串行 Round Robin Hash Join

BEGIN Join

cnd(0):

cnd(0): lineitem.L_ORDERKEY = orders. O_ORDERKEY

join

prepare to use hash

RR 也是 hash join

Hash tree is used, size: 16777216 using multi pass hash join: 4 pass, 1500000 rows join 6001215 rows created hash on 493447/1500000 rows already. disable multi pass hash join scanned 2555904 rows, and produced 1974112 rows already.

created hash on 986894/1500000 rows already.

scanned 4653056 rows, and produced 3947396 rows already.

created hash on 1480341/1500000 rows already.

scanned 3576383 rows, and produced 5922493 rows already.

created hash on

all 1500000 rows already.

scanned 655360 rows, and produced 6001215 rows already.

skipped 52.44% DCs by smart index.

inner join (T0 - T1), using hash join, produced 6001215 rows.

cnd(0) Done(time used: 17.236s)

END Join(time used: 17. 236s)

并行 Round Robin Hash Join

BEGIN Join

cnd(0):

cnd(0): lineitem.L_ORDERKEY = orders. O_ORDERKEY

prepare to use hash join

divide to 8 blocks(using round-robin): 786432, 720896, 786432, 720896,

786432, 720896, 786432, 692799. 对数据切分，并行join

( 5)Hash tree is used, size:	2097152 (5)子线
程号，并行join ( 6)Hash tree is used, size:	2097152
( 0)Hash tree is used, size:	2097152
( 3)Hash tree is used, size:	2097152
( 2)Hash tree is used, size:	2097152
( 7)Hash tree is used, size:	2097152
( 1)Hash tree is used, size:	2097152

Parallel join done. Merged tuple: 6001215 rows 并行 join

cnd(0) Done(time used: 19.728s)

END Join(time used: 19.728s) join 结

束，三种hash join中，只有rr hash join不需要排序

串行 Hybrid Hash Join

prepare to use hash join

traverse dim: 1500000 tuples, match dim: 6001215 tuples

—般取小表为traverse表，大表为match表

traverse tuple: total_size = 3, key_size = 3, tuple_size = 0, other_size

=0, pseudo_size = 0 参与join的宽度为3字节，hash字段3字节，其他 0字节

match tuple: total_size = 3, key_size = 3, tuple_size = 0, other_size = 0, pseudo_size = 0

Hybrid hash join: data will be split into 3 partitions

‘Hybrid hash join'标识，分3次运行，第一次在内存，后面先落盘

traverse dim: 1500000 tuples, match dim: 6001215 tuples

BEGIN hybrid hash join: partition(-1/3): traverse dim: 1500000 tuples, match dim: 6001215 tuples -1/3代表第一轮，在内存进行

Traverse: begin scanning from 0 to 1500000

创建hash表

HashTable: mem: total 16109940, desired 33000000, block_size 1048576 打印算子buffer，需要的内存，一个内存申请块大小

Sub 0: use 8388608+8008176; buckets: total 2097152, use 341165, avg 1.47, longest 9; rows: add 500511; count only table.

这里只有一^个Sub0标识是串行，use 8388608+8008176代表hash桶占用 8388608字节，hash字段占用8008176。共有2097152个桶，使用了 341165 个，平均冲突链1.47,最长9,共加载了 500511行，其中avg 一般在1~2 之间，如果太大说明数据分布不合理

JoinTables: mem: exp. 16777216; rows: exp. 1500000, cap. 720896, add 500511

Traverse finished. (time used: 1.323s)

创建hash表完毕，进入Match阶段

used: 3.291s)

(time used: 4.619s)

Match finished. (time

Match阶段完毕，进入提交阶段

Commit: begin ...

END hybrid hash join: partition(-1/3): 第一轮（-1/3）完毕

finish hash partitioning:

由于第一轮join是边join边划分的，所以第一轮join完毕后才能打印其他 2轮的统计信息 partition(0): traverse dim: 0 tuples, match dim: 0 tuples

partition(1): traverse dim: 499688 tuples, match dim: 1999733 tuples

partition(2): traverse dim: 499801 tuples, match dim: 1998529 tuples

END hybrid hash join: partition(0/3): (time used: 0.006s)

一般此轮为空，除非第一轮的算子buffer没装下所有的traverse表，此时终止进hash表，让所有第一轮的数据先落盘，然后其与后面的轮次操作一样操作

END hybrid hash join: partition(1/3): (time used: 0.825s)

后面一轮结束

END hybrid hash join: partition(2/3): (time used: 0.820s)

最后一轮结束

join: found 6001215 rows. (inner: 6001215, traverse outer: 0, match outer: 0)

inner join(T0 - T1), using hash join, produced 0 rows. cnd(0) Done(time used: 6.294s)

END Join(time used: 6.294s)

3.3.2.6 并行 Hybrid Hash Join

prepare to use hash join

traverse dim: 1500000 tuples, match dim: 6001215 tuples

traverse tuple: total_size = 7, key_size = 3, tuple_size = 4, other_size = 0, pseudo_size = 0

match tuple: total_size = 7, key_size = 3, tuple_size = 4, other_size = 0, pseudo_size = 0

divide to 8 blocks(using round-robin): 786432, 720896, 786432, 720896, 786432, 720896, 786432, 692799.

Hybrid hash join: data will be split into 2 partitions

BEGIN hybrid hash join: partition(-1/2) : traverse dim: 1500000 tuples, match dim: 6001215 tuples

( 5)Traverse: hash table is full.

hash表满，在-1轮出现意味着将装入第0轮操作

(1)HashTable: mem: total 2009770, desired 3375000, block_size 1572864

(1)Sub 0: use 1048576+1121100; buckets: total 262144, use 78921, avg 1.18, longest 5; rows: add 93425 多个Sub线程，代表这是并行

(1)Sub 1: use 1048576+1123404; buckets: total 262144, use 78908, avg

1.19, longest 5; rows: add 93617 (1)JoinTables: mem: exp. 16777216; rows: exp. 1500000, cap. 6990504, add 655933

Traverse finished. (time used: 1.389s)

END hybrid hash join: partition(T/2): (time used: 4.759s)

finish hash partitioning:

partition(0): traverse dim: 750122 tuples, match dim: 3000792 tuples partition(1): traverse dim: 749878 tuples, match dim: 3000423 tuples

BEGIN hybrid hash join: partition(0/2): traverse dim: 750122 tuples, match dim: 3000792 tuples

( 4)HashTable: mem: total 2009770, desired 3375000, block_size 1572864

( 7)Match finished. (time used: 1.901s)

无论在哪个partition发生多趟，都会多次进行traveser->match->commit

HashTable: mem: total 2009770, desired 3375000, block_size 1572864

(0)Match finished. (time used: 1.529s)

END hybrid hash join: partition(0/2): (time used: 3.800s)

END hybrid hash join: partition(1/2): (time used: 4.211s)

join: found 6001215 rows. (inner: 6001215, traverse outer: 0, match outer: 0)

Begin rowid merge-sorting: 8 inputs, 6001215 rows

join后要排序

Finish rowid merge-sorting: 8 inputs, 6001215 rows(time used: 1.628s)

Parallel join done. Merged tuple: 6001215 rows

并行join结束

inner join(T0 - T1), using hash join, produced 6001215 rows.

cnd(0) Done(time used: 14.462s)

3. 3. 3 Nest Loop Join

因为各种原因不能进行上面的join时，会转入最原始的Nest Loop Join 当trace里出现

nested-loop join 时表示进入此join

应尽量避免进入这种模式下面的trace代表complex join。complex join本身不算一种join算法

cnd(0):

cnd(0): (tl.a = t2.c) or (tl.b = t2.d)

prepare to use complex join 存在 or

条件，一般会走complex join，先对or两边分别join，然后再取or

cnd(0): t1.a = t2. c

prepare to use hash join 先做左边

的 hash join

cnd(0) Done(time used: 0.003s)

cnd(0): t1.b = t2. d

prepare to use hash join 再做右

边的 hash join

cnd(0) Done(time used: 0.002s)

BEGIN Or(1 rows Or 1 rows) 对左右

两边结果取or

END Or (serial): produced 1 rows (time used: 0.000s)

cnd(0) Done(time used: 0.007s)

END Join(time used: 0. 007s) 整个

complex join 结束

3. 3.4 JOIN相关参数

A gbase_buffer_hj

算子 buffer 大小，对应 trace 里 JoinTables: mem: exp.xxx

A _gbase_hybrid_hash_joi n

0：不使用1：使用，此时不会考虑其他方式 2：内部调试，不建议开

启

A _gbase_one_pass_hash_join

_gbase_hybrid_hash_join 为 0 时，1: 使用 one-pass， 0 :使用 RR

A gbase_parallel_execution

1：并行 0：串行