Mongodb --- Manual sharding

本文讨论了在MongoDB中实现手动分片(manual sharding)的方法,包括如何通过编写脚本来手动分配数据块(chunk),以及在导入大型数据集时如何停用平衡器以提高效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近在google group看到一个关于manual sharding的讨论,虽然暂时还没亲自去实践一下,但是觉得办法可行,大家都知道google group是要翻墙的,所以贴在这里方便查看.


Zer0提出的问题:

-----------------------------

Sorry for my English
I 've read all the documents at home page and search many other sites
but I still can not config for manual sharding
Someone say "moveChunk" is a manual, that 's ok but what I want is
more than that
For example, a document as follow:

{"name":"John", "age":21}

Shard key is "name:1"
How can I config for shard1 to hold document where "name" started with
A-O and shard2 hold names from P-Z.
No auto sharding, no auto rebalance at all.

Thanks so much


Alberto Lerner给的第一个答复(主要都是官方文档信息,也是下面的脚本主要用到的东西):

-----------------------------------

You can split at any point you like, even at non-existing keys:
http://www.mongodb.org/display/DOCS/Splitting+Chunks

If you want to move a chunk manually
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingA...

And you can stop the balancer
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingA...


Alvin Richards给出一个脚本用例:

----------------------------------------

Here's a script I use from the mongo shell.

Will wil need to change
-- number of shards
-- min and max values of the shard key
-- value delta between chunks
-- collection name

-Alvin

use admin
function pad(number, length) {
    var str = '' + number;
    while (str.length < length) {
        str = '0' + str;
    }
    return str;

}

var shards=5
var min_value=-2061389163
var max_value=2061389163
var inc=40000000
var collection_name="scaleout.blogs"

for (j=0,i=min_value; i < max_value; i+=inc,j++)  {
        db.runCommand( { split : collection_name, middle : { ts : i }} );
        db.runCommand( { moveChunk: collection_name, find : { ts : i+1}, to :
"shard" + pad((j%shards),4) } );

}

db.printShardingStatus()


-------------------------------------------

另外一个google group上的讨论是“ fastest way to import a large dataset ”, 这里面也提到了先manual-sharding,然后使用多个mongoimport分别导数据到相应shard中去,有兴趣的翻墙看看吧!


下面贴出前几个讨论:


tcurdt

---------------------

Hey there,

we are having big trouble importing a large dataset into mongo in a
reasonable time.
We have a 6 node sharded cluster and we tried a couple of different
approaches.

The dataset consist of 1.4B small documents. Average size about 70
bytes.
Fastest import we have seen was 24 hours.

We would have thought that a mongos per machine with a couple of
mongoimports per node should give the best results. But oddly enough -
that's not faster - it's rather slower than a single mongoimport for
the whole cluster.

Right now I am wondering if there is a way to import the pre-sharded
documents into the shard databases using the --dbpath option and the
adjust the config database accordingly. Would that work? ...and be
faster?
Indexes beforehand or after?

cheers,
Torsten


Nat

-------------------------

What is your shard key?
- Index after is better than index before hand
- If you already preshard the data, turn the balancer off first
- You should break the import data in the same way that you preshard
and use mongoimport to load them up
- Your data should be sorted by shard key if possible


Torsten Curdt

-------------------------------

> What is your shard key?

We tried _id (ObjectIds) as well as our preferred keys

> - Index after is better than index before hand

So far we have been trying to index while importing.
We can give that another try.

> - If you already preshard the data, turn the balancer off first

I would shut down config server and mongos for the import.
Is that what you mean?

> - You should break the import data in the same way that you preshard

Of course.

> and use mongoimport to load them up
> - Your data should be sorted by shard key if possible

OK

Biggest question: will it be worth it?

cheers,
Torsten


Nat

-----------------------------

- If you use ObjectId as a shard key, you won't be able to scale the
import. The maximum speed is limited by the speed of one machine.
- You can leave your config server and mongos up and do the import via
mongos.
- To turn off balancer,
   > use config
   > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
true)


Torsten Curdt

------------------------------------

> - If you use ObjectId as a shard key, you won't be able to scale the
> import. The maximum speed is limited by the speed of one machine.

Why is that?
The ObjectIds should be quite different across the machines and so
hopefully fall into different chunks.

> - You can leave your config server and mongos up and do the import via
> mongos.

Confused - that's what I was doing before.

mongo1: shardsrv mongos 2*mongoimport configsrv
mongo2: shardsrv mongos 2*mongoimport configsrv
mongo3: shardsrv mongos 2*mongoimport configsrv
mongo4: shardsrv mongos 2*mongoimport
mongo5: shardsrv mongos 2*mongoimport
mongo6: shardsrv mongos 2*mongoimport

Or do you mean...

Splitting up the pre-sharded dataset across the nodes. Then turn off
balancing. But instead of using --dbpath use mongos? Wouldn't --dbpath
be faster? Wouldn't writes still get routed to other shards with
mongos?

> - To turn off balancer,
>   > use config
>   > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
> true)

Ah ... OK.
cheers,
Torsten



Nat

----------------------

- ObjectId is keyed by timestamp first.
- You can use --dbpath but you have to take mongod offline. I just
recommended another way without taking down mongod. As you will
perform mongoimport splitted by shard key, mongos should route
requests to one server per mongoimport.
- Do you have mongostat, iostat, db.stats() during import process?


Torsten Curdt

------------------------------------

> - ObjectId is keyed by timestamp first.

True ... but even with our preferred sharding key [user, time] it
doesn't behave much better.

> - You can use --dbpath but you have to take mongod offline.

That's fine.

> I just recommended another way without taking down mongod. As you will
> perform mongoimport splitted by shard key, mongos should route
> requests to one server per mongoimport.

But doesn't that depend on what chunks are configured in the config server?

> - Do you have mongostat, iostat, db.stats() during import process?

Certainly. With the current non-pre-sharded import...

- mongostat shows looong "holes" with no ops at all. I assume that's
the balancer - but not sure. numbers were much better in the beginning
of the import.

- iostat shows quite uneven activity across the nodes.

- db.stats() we are monitoring over time. the following shows the
objects graphed:

  https://skitch.com/tcurdt/rpti6/import-speed



Nat

--------------------------

if you use the sharding key [user, time], turn off balancer, you
should see better result. Can you post iostat and mongostat result?


Eliot Horowitz

------------------------------------

What version are you on?
You should shard on user,time as you want to do.
The speed is probably because of migrations.

2 main options:
 - try 1.7.5
 - pre-split the collection into a lot of chunks, let the balancer
move them around, then insert.
   this will prevent migrates.

I would not mess with --dbpath or turning off the balancer, that's
much more complicate than you need to do.


................................60多个comments,有兴趣翻墙吧!


mongodb-win32-x86_64-2012plus-4.2.24 MongoDB README Welcome to MongoDB! COMPONENTS mongod - The database server. mongos - Sharding router. mongo - The database shell (uses interactive javascript). UTILITIES install_compass - Installs MongoDB Compass for your platform. BUILDING See docs/building.md. RUNNING For command line options invoke: $ ./mongod --help To run a single server database: $ sudo mkdir -p /data/db $ ./mongod $ $ # The mongo javascript shell connects to localhost and test database by default: $ ./mongo > help INSTALLING COMPASS You can install compass using the install_compass script packaged with MongoDB: $ ./install_compass This will download the appropriate MongoDB Compass package for your platform and install it. DRIVERS Client drivers for most programming languages are available at https://docs.mongodb.com/manual/applications/drivers/. Use the shell ("mongo") for administrative tasks. BUG REPORTS See https://github.com/mongodb/mongo/wiki/Submit-Bug-Reports. PACKAGING Packages are created dynamically by the package.py script located in the buildscripts directory. This will generate RPM and Debian packages. DOCUMENTATION https://docs.mongodb.com/manual/ CLOUD HOSTED MONGODB https://www.mongodb.com/cloud/atlas FORUMS https://community.mongodb.com A forum for technical questions about using MongoDB. https://community.mongodb.com/c/server-dev A forum for technical questions about building and developing MongoDB. LEARN MONGODB https://university.mongodb.com/ LICENSE MongoDB is free and open-source. Versions released prior to October 16, 2018 are published under the AGPL. All versions released after October 16, 2018, including patch fixes for prior versions, are published under the Server Side Public License (SSPL) v1. See individual files for details. 根据这个记事本,你认为我安装了mongodb server吗?
最新发布
08-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值