StarRocks 分页查询使用不同的排序列,导致查询结果不一致问题

文章探讨了StarRocks中orderby导致的分页查询排序不一致问题,通过提升列区分度提出解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

CREATE TABLE `orders` (
  `o_orderkey` int(11) NOT NULL COMMENT "",
  `o_orderdate` date NOT NULL COMMENT "",
  `o_custkey` int(11) NOT NULL COMMENT "",
  `o_orderstatus` varchar(1) NOT NULL COMMENT "",
  `o_totalprice` decimal64(15, 2) NOT NULL COMMENT "",
  `o_orderpriority` varchar(15) NOT NULL COMMENT "",
  `o_clerk` varchar(15) NOT NULL COMMENT "",
  `o_shippriority` int(11) NOT NULL COMMENT "",
  `o_comment` varchar(79) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`o_orderkey`, `o_orderdate`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`o_orderkey`) BUCKETS 96
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "tpch2",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"fast_schema_evolution" = "true",
"compression" = "LZ4"
);
mysql> select sum(o_totalprice) from (select o_totalprice from orders order by o_orderkey limit 5000, 100) as a;
+-------------------+
| sum(o_totalprice) |
+-------------------+
|       14504084.88 |
+-------------------+
1 row in set (0.08 sec)

mysql> select sum(o_totalprice) from (select o_totalprice from orders order by o_orderkey limit 5000, 100) as a;
+-------------------+
| sum(o_totalprice) |
+-------------------+
|       14504084.88 |
+-------------------+
1 row in set (0.09 sec)

mysql> select sum(o_totalprice) from (select o_totalprice from orders order by o_totalprice limit 5000, 100) as a;
+-------------------+
| sum(o_totalprice) |
+-------------------+
|          94215.72 |
+-------------------+
1 row in set (0.18 sec)

mysql> select sum(o_totalprice) from (select o_totalprice from orders order by o_totalprice limit 5000, 100) as a;
+-------------------+
| sum(o_totalprice) |
+-------------------+
|          94215.72 |
+-------------------+
1 row in set (0.14 sec)

mysql> select sum(o_totalprice) from (select o_totalprice from orders order by o_orderdate limit 5000, 100) as a;
+-------------------+
| sum(o_totalprice) |
+-------------------+
|       16296704.62 |
+-------------------+
1 row in set (0.17 sec)

mysql> select sum(o_totalprice) from (select o_totalprice from orders order by o_orderdate limit 5000, 100) as a;
+-------------------+
| sum(o_totalprice) |
+-------------------+
|       16303416.82 |
+-------------------+
1 row in set (0.15 sec)

从结果中可以看到,order by UPC 排序后累加的结果是一致的,4次结果都是19908,order by create_date 每次的结果都不相同21663,19344,18457, 15856,那么是什么原因导致的使用 order by create_date 排序,每次查询结果都有差异呢?

原因为 StarRocks 采用分布式存储数据,一条分页语句在查询执行过程中,一个 job 会把多个 task 任务分发到多个 BE 中并发的查询 tablet 中数据,最后进行数据的汇总,由于 order by 指定的列区分度低,导致查询时每个节点返回的数据顺序不一样,这样某条数据在第 N 页和 N+1 页查询都被命中,导致最终的排序结果不一样,所以最后累加的结果不一致。
在这里插入图片描述

解决方案:提高 order by 后指定列的区分度,在建表时我们新增了一个类似 MySQL 中 id 特性的字段,在分页查询时,除指定业务需要排序的列以外,另加上此 id 列,保证查询的数据整体是有序的。

参考:京东到家 x StarRocks:高效支撑海博数据中台多维数据分析 StarRocks 实践经验

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值