Skip to content

feat(storage): improve optimize and recluster #11850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 17, 2023

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Jun 25, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  1. Add limit for recluster.
ALTER TABLE [IF EXISTS] <name> RECLUSTER [FINAL] [WHERE condition] [LIMIT <segment_count>]

The option LIMIT sets the maximum number of segments to be recluster. Databend will select the newest segments. The default segment_ount limit is max_threads * 4.

  1. Add memory usage limit for recluster

  2. Sort the blockmeta by cluster_statistics during compact

  3. after optimize compact, do recluster

  4. Optimize recluster, serialize block in parallel during recluster.

create table test_order (
    id bigint,
    id1 bigint,
    id2 bigint,
    id3 bigint,
    id4 bigint,
    id5 bigint,
    id6 bigint,
    id7 bigint,
    
    s1 varchar,
    s2 varchar,
    s3 varchar,
    s4 varchar,
    s5 varchar,
    s6 varchar,
    s7 varchar,
    s8 varchar,
    s9 varchar,
    s10 varchar,
    s11 varchar,
    s12 varchar,
    s13 varchar,
    
    d1 DECIMAL(20, 8),
    d2 DECIMAL(20, 8),
    d3 DECIMAL(20, 8),
    d4 DECIMAL(20, 8),
    d5 DECIMAL(20, 8),
    d6 DECIMAL(30, 8),
    d7 DECIMAL(30, 8),
    d8 DECIMAL(30, 8),
    d9 DECIMAL(30, 8),
    d10 DECIMAL(30, 8),
    
    insert_time datetime,
    insert_time1 datetime,
    insert_time2 datetime,
    insert_time3 datetime,
    
    i int
) CLUSTER BY(id,insert_time);


create table random_source (
    id bigint,
    id1 bigint,
    id2 bigint,
    id3 bigint,
    id4 bigint,
    id5 bigint,
    id6 bigint,
    id7 bigint,
    
    s1 varchar,
    s2 varchar,
    s3 varchar,
    s4 varchar,
    s5 varchar,
    s6 varchar,
    s7 varchar,
    s8 varchar,
    s9 varchar,
    s10 varchar,
    s11 varchar,
    s12 varchar,
    s13 varchar,
    
    d1 DECIMAL(20, 8),
    d2 DECIMAL(20, 8),
    d3 DECIMAL(20, 8),
    d4 DECIMAL(20, 8),
    d5 DECIMAL(20, 8),
    d6 DECIMAL(30, 8),
    d7 DECIMAL(30, 8),
    d8 DECIMAL(30, 8),
    d9 DECIMAL(30, 8),
    d10 DECIMAL(30, 8),
    
    insert_time datetime,
    insert_time1 datetime,
    insert_time2 datetime,
    insert_time3 datetime,
    
    i int
) Engine = Random;

insert into test_order select * from random_source limit 50000000;

Before optimize:

mysql> optimize table test_order compact;
Query OK, 32000000 rows affected (3 min 10.62 sec)

After:

mysql> optimize table test_order compact;
Query OK, 32000000 rows affected (55.94 sec)
The recluster Pipeline is as follows:
┌──────────┐     ┌───────────────┐     ┌─────────┐
│FuseSource├────►│CompoundBlockOp├────►│SortMerge├────┐
└──────────┘     └───────────────┘     └─────────┘    │
┌──────────┐     ┌───────────────┐     ┌─────────┐    │     ┌──────────────┐     ┌─────────┐
│FuseSource├────►│CompoundBlockOp├────►│SortMerge├────┤────►│MultiSortMerge├────►│Resize(N)├───┐
└──────────┘     └───────────────┘     └─────────┘    │     └──────────────┘     └─────────┘   │
┌──────────┐     ┌───────────────┐     ┌─────────┐    │                                        │
│FuseSource├────►│CompoundBlockOp├────►│SortMerge├────┘                                        │
└──────────┘     └───────────────┘     └─────────┘                                             │
┌──────────────────────────────────────────────────────────────────────────────────────────────┘
│         ┌──────────────┐
│    ┌───►│SerializeBlock├───┐
│    │    └──────────────┘   │
│    │    ┌──────────────┐   │    ┌─────────┐    ┌────────────────┐     ┌─────────────────┐     ┌──────────┐
└───►│───►│SerializeBlock├───┤───►│Resize(1)├───►│SerializeSegment├────►│TableMutationAggr├────►│CommitSink│
     │    └──────────────┘   │    └─────────┘    └────────────────┘     └─────────────────┘     └──────────┘
     │    ┌──────────────┐   │
     └───►│SerializeBlock├───┘
          └──────────────┘

2023-06-28_00-18
2023-06-28_00-31

Closes #11799

@vercel

This comment was marked as outdated.

@zhyass zhyass marked this pull request as draft June 25, 2023 05:06
@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Jun 25, 2023
@zhyass zhyass requested review from lichuang and dantengsky June 25, 2023 14:30
@zhyass zhyass marked this pull request as ready for review June 25, 2023 14:32
@zhyass zhyass requested a review from flaneur2020 June 25, 2023 14:33
@BohuTANG

This comment was marked as resolved.

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Jun 26, 2023

wonder about the mechanism, does the purge still needs to store candidate in memory and delete all candidates in a old segment?

@zhyass zhyass force-pushed the improve_clustering branch from 4e26c8c to 9430699 Compare June 27, 2023 10:06
@zhyass zhyass added the ci-benchmark Benchmark: run all test label Jun 27, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Jun 27, 2023
@zhyass
Copy link
Member Author

zhyass commented Jun 27, 2023

wonder about the mechanism, does the purge still needs to store candidate in memory and delete all candidates in a old segment?

will select and purge the oldest snapshots.

@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Jun 27, 2023
@databendlabs databendlabs deleted a comment from github-actions bot Jun 28, 2023
@zhyass zhyass removed the ci-benchmark Benchmark: run all test label Jun 28, 2023
@BohuTANG
Copy link
Member

Summary(By llmchain.rs)

  • Added support for "limit" parameter in OPTIMIZE TABLE and ALTER TABLE RECLUSTER statements
    The code changes added support for the "limit" parameter in the ReclusterTable action, OptimizeTableStmt struct, and Display implementation of OptimizeTableStmt. The changes also added support for parsing the "limit" parameter in the OptimizeTableStmt and added new test cases for OPTIMIZE TABLE statement with different options and limits.

  • Changed "dry_run_limit" parameter to "dry_run" boolean parameter
    The code changes modified the dry_run_limit parameter to a dry_run boolean parameter in the vacuum_handler.rs and handler.rs files. The changes also added a constant DRY_RUN_LIMIT and modified the function signature of do_vacuum to take a boolean dry_run instead of an optional dry_run_limit.

  • Modified function signatures to take a Vec instead of a slice &[DataBlock]
    The code changes modified the function signatures of compact_final in transform_block_compact.rs, transform_block_compact_for_copy.rs, and transform_compact.rs to take a Vec<DataBlock> instead of a slice &[DataBlock]. The changes also modified the consume_event function to check if the output port can push before pushing the next data block.

  • Added support for TransformSerializeBlock and TransformSerializeSegment
    The code changes added support for TransformSerializeBlock and TransformSerializeSegment in various files, including transform_append.rs, transform_serialize_data.rs, and replace.rs. The changes also removed the unused AppendTransform and added a new module mutation_meta.

  • Added limit parameter to various functions
    The code changes added a limit parameter to the compact method in table.rs, the do_purge function in gc.rs, and the do_recluster function in recluster.rs. The changes also modified the implementation of the do_recluster function to use the limit parameter to limit the number of segment locations processed at a time.

  • Changed various function parameters and return types
    The code changes changed the dry_run_limit parameter to a boolean dry_run parameter in various files, including vacuum_handler.rs, handler.rs, and gc.rs. The changes also changed the return type of apply_delete function in merge_into_mutator.rs and the function signature of compact_table function in hive_table.rs.

@zhyass zhyass requested a review from sundy-li June 28, 2023 06:27
@zhyass zhyass changed the title feat(storage): add limit for optimize feat(storage): improve optimize and reclusters Jun 28, 2023
@zhyass zhyass changed the title feat(storage): improve optimize and reclusters feat(storage): improve optimize and recluster Jun 28, 2023
@dantengsky dantengsky added the ci-cloud Build docker image for cloud test label Jun 28, 2023
@github-actions

This comment was marked as outdated.

@flaneur2020
Copy link
Member

wonder about the mechanism, does the purge still needs to store candidate in memory and delete all candidates in a old segment?

IMO purge do not need scan the world, but only need just tailing the older snapshot & segment & block files.

If I understands correctly, the block files have some kind of time ordering, if a block file's creation time earlier than the earliest active snapshot & not included in this snapshot, then it can be safely purged.

@zhyass
Copy link
Member Author

zhyass commented Jun 29, 2023

IMO purge do not need scan the world, but only need just tailing the older snapshot & segment & block files.

If I understands correctly, the block files have some kind of time ordering, if a block file's creation time earlier than the earliest active snapshot & not included in this snapshot, then it can be safely purged.

This can be used by purge orphan blocks in vacuum table.

However, we cannot guarantee the accuracy of the file's creation time and the creation time of block is generated earlier than the creation time of snapshot.

@drmingdrmer

This comment was marked as outdated.

@drmingdrmer

This comment was marked as off-topic.

@drmingdrmer

This comment was marked as off-topic.

@BohuTANG

This comment was marked as off-topic.

@drmingdrmer

This comment was marked as off-topic.

@drmingdrmer

This comment was marked as off-topic.

@zhyass zhyass force-pushed the improve_clustering branch from 2cc0705 to 94f4377 Compare July 5, 2023 13:29
@databend-bot

This comment was marked as off-topic.

@BohuTANG BohuTANG added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Jul 12, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-11850-f689982

note: this image tag is only available for internal use,
please check the internal doc for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: allow specify a LIMIT clause in the OPTIMZE TABLE ... PURGE statement
7 participants