Mongodb --- Manual sharding

最新推荐文章于 2025-08-08 14:01:07 发布

原创最新推荐文章于 2025-08-08 14:01:07 发布 · 1.7k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#sharding #mongodb #dataset #import #server #google

database/nosql 专栏收录该内容

102 篇文章

订阅专栏

本文讨论了在MongoDB中实现手动分片(manual sharding)的方法，包括如何通过编写脚本来手动分配数据块(chunk)，以及在导入大型数据集时如何停用平衡器以提高效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近在google group看到一个关于manual sharding的讨论，虽然暂时还没亲自去实践一下，但是觉得办法可行，大家都知道google group是要翻墙的，所以贴在这里方便查看.

Zer0提出的问题：

-----------------------------

Sorry for my English
I 've read all the documents at home page and search many other sites
but I still can not config for manual sharding
Someone say "moveChunk" is a manual, that 's ok but what I want is
more than that
For example, a document as follow:

{"name":"John", "age":21}

Shard key is "name:1"
How can I config for shard1 to hold document where "name" started with
A-O and shard2 hold names from P-Z.
No auto sharding, no auto rebalance at all.

Thanks so much

Alberto Lerner给的第一个答复(主要都是官方文档信息，也是下面的脚本主要用到的东西):

-----------------------------------

You can split at any point you like, even at non-existing keys:
http://www.mongodb.org/display/DOCS/Splitting+Chunks

If you want to move a chunk manually
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingA...

And you can stop the balancer
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingA...

Alvin Richards给出一个脚本用例：

----------------------------------------

Here's a script I use from the mongo shell.

Will wil need to change
-- number of shards
-- min and max values of the shard key
-- value delta between chunks
-- collection name

-Alvin

use admin
function pad(number, length) {
var str = '' + number;
while (str.length < length) {
str = '0' + str;
}
return str;

}

var shards=5
var min_value=-2061389163
var max_value=2061389163
var inc=40000000
var collection_name="scaleout.blogs"

for (j=0,i=min_value; i < max_value; i+=inc,j++) {
db.runCommand( { split : collection_name, middle : { ts : i }} );
db.runCommand( { moveChunk: collection_name, find : { ts : i+1}, to :
"shard" + pad((j%shards),4) } );

}

db.printShardingStatus()

-------------------------------------------

另外一个google group上的讨论是“ fastest way to import a large dataset ”，这里面也提到了先manual-sharding，然后使用多个mongoimport分别导数据到相应shard中去，有兴趣的翻墙看看吧！

下面贴出前几个讨论：

tcurdt

---------------------

Hey there,

we are having big trouble importing a large dataset into mongo in a
reasonable time.
We have a 6 node sharded cluster and we tried a couple of different
approaches.

The dataset consist of 1.4B small documents. Average size about 70
bytes.
Fastest import we have seen was 24 hours.

We would have thought that a mongos per machine with a couple of
mongoimports per node should give the best results. But oddly enough -
that's not faster - it's rather slower than a single mongoimport for
the whole cluster.

Right now I am wondering if there is a way to import the pre-sharded
documents into the shard databases using the --dbpath option and the
adjust the config database accordingly. Would that work? ...and be
faster?
Indexes beforehand or after?

cheers,
Torsten

Nat

-------------------------

What is your shard key?
- Index after is better than index before hand
- If you already preshard the data, turn the balancer off first
- You should break the import data in the same way that you preshard
and use mongoimport to load them up
- Your data should be sorted by shard key if possible

Torsten Curdt

-------------------------------

> What is your shard key?

We tried _id (ObjectIds) as well as our preferred keys

> - Index after is better than index before hand

So far we have been trying to index while importing.
We can give that another try.

> - If you already preshard the data, turn the balancer off first

I would shut down config server and mongos for the import.
Is that what you mean?

> - You should break the import data in the same way that you preshard

Of course.

> and use mongoimport to load them up
> - Your data should be sorted by shard key if possible

Biggest question: will it be worth it?

cheers,
Torsten

Nat

-----------------------------

- If you use ObjectId as a shard key, you won't be able to scale the
import. The maximum speed is limited by the speed of one machine.
- You can leave your config server and mongos up and do the import via
mongos.
- To turn off balancer,
> use config
> db.settings.update({_id:"balancer"},{$set : {stopped:true}},
true)

Torsten Curdt

------------------------------------

> - If you use ObjectId as a shard key, you won't be able to scale the
> import. The maximum speed is limited by the speed of one machine.

Why is that?
The ObjectIds should be quite different across the machines and so
hopefully fall into different chunks.

> - You can leave your config server and mongos up and do the import via
> mongos.

Confused - that's what I was doing before.

mongo1: shardsrv mongos 2*mongoimport configsrv
mongo2: shardsrv mongos 2*mongoimport configsrv
mongo3: shardsrv mongos 2*mongoimport configsrv
mongo4: shardsrv mongos 2*mongoimport
mongo5: shardsrv mongos 2*mongoimport
mongo6: shardsrv mongos 2*mongoimport

Or do you mean...

Splitting up the pre-sharded dataset across the nodes. Then turn off
balancing. But instead of using --dbpath use mongos? Wouldn't --dbpath
be faster? Wouldn't writes still get routed to other shards with
mongos?

> - To turn off balancer,
> > use config
> > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
> true)

Ah ... OK.
cheers,
Torsten

Nat

----------------------

- ObjectId is keyed by timestamp first.
- You can use --dbpath but you have to take mongod offline. I just
recommended another way without taking down mongod. As you will
perform mongoimport splitted by shard key, mongos should route
requests to one server per mongoimport.
- Do you have mongostat, iostat, db.stats() during import process?

Torsten Curdt

------------------------------------

> - ObjectId is keyed by timestamp first.

True ... but even with our preferred sharding key [user, time] it
doesn't behave much better.

> - You can use --dbpath but you have to take mongod offline.

That's fine.

> I just recommended another way without taking down mongod. As you will
> perform mongoimport splitted by shard key, mongos should route
> requests to one server per mongoimport.

But doesn't that depend on what chunks are configured in the config server?

> - Do you have mongostat, iostat, db.stats() during import process?

Certainly. With the current non-pre-sharded import...

- mongostat shows looong "holes" with no ops at all. I assume that's
the balancer - but not sure. numbers were much better in the beginning
of the import.

- iostat shows quite uneven activity across the nodes.

- db.stats() we are monitoring over time. the following shows the
objects graphed:

https://skitch.com/tcurdt/rpti6/import-speed

Nat

--------------------------

if you use the sharding key [user, time], turn off balancer, you
should see better result. Can you post iostat and mongostat result?


Eliot Horowitz

------------------------------------

What version are you on?
You should shard on user,time as you want to do.
The speed is probably because of migrations.

2 main options:
- try 1.7.5
- pre-split the collection into a lot of chunks, let the balancer
move them around, then insert.
this will prevent migrates.

I would not mess with --dbpath or turning off the balancer, that's
much more complicate than you need to do.

................................60多个comments，有兴趣翻墙吧！