文章目录
InlineSkipList
RocksDB使用skiplist作为内存的基本数据结构,skiplist的介绍可以查看网上资料。
数据结构
class MemTableRep {
Allocator* allocator_; // 内存分配器
};
struct InlineSkipList<Comparator>::Node {
// next_[0] is the lowest level link (level 0). Higher levels are
// stored _earlier_, so level 1 is at next_[-1].
std::atomic<Node*> next_[1];
const char* Key() const {
return reinterpret_cast<const char*>(&next_[1]); } // key
};
struct InlineSkipList<Comparator>::Splice {
// The invariant of a Splice is that prev_[i+1].key <= prev_[i].key <
// next_[i].key <= next_[i+1].key for all i. That means that if a
// key is bracketed by prev_[i] and next_[i] then it is bracketed by
// all higher levels. It is _not_ required that prev_[i]->Next(i) ==
// next_[i] (it probably did at some point in the past, but intervening
// or concurrent operations might have inserted nodes in between).
int height_ = 0;
Node** prev_;
Node** next_;
};
class InlineSkipList {
Allocator* const allocator_; // 内存分配器
Comparator const compare_; // key comparator
Node* const head_; // head
std::atomic<int> max_height_; // Height of the entire list
Splice* seq_splice_; // splice,一个Node各层的集合,由allocator_分配,一次分配sizeof(Node)*max_height_大小内存,访问时直接使用Node里面的next_数组指针偏移即可,详细见Splice和Node结构体注释。
};
class SkipListRep : public MemTableRep {
InlineSkipList<const MemTableRep::KeyComparator&> skip_list_; // skip_list_存储kv
const MemTableRep::KeyComparator& cmp_; // key comparator
};
class MemTable {
struct KeyComparator : public MemTableRep::KeyComparator {
const InternalKeyComparator comparator;
};
KeyComparator comparator_; // key comparator,用于比较key大小
std::unique_ptr<MemTableRep> table_; // 真正的memtable
};
根据上面类的定义可以看出,memtable上面记录了table,table里面有InlineSkipList,SlipList由Splice的双向链表组成,Splice中包含了所有level的内存,在不同level间切换使用Node中next_数组下标即可。
SkipList数据访问
在不同level间切换使用Node中next_数组下标即可。真正的key可以直接访问next_[1]即可。如下图所示,来自https://zhuanlan.zhihu.com/p/444460663。
核心的代码是这一行。
bool InlineSkipList<Comparator>::Insert(const char* key, Splice* splice,
bool allow_partial_splice_fix) {
// 在insert时,将key的内存强制向前偏移一个Node,访问key就可以使用next_[1]了。
Node* x = reinterpret_cast<Node*>(const_cast<char*>(key)) - 1;
...
}
Memtable Key
key encode:key编码
由于跳表没有kv的概念,因此将key和value进行统一编码。SkipList中的key由如下元素构成:
- key_size:key的size,按照1字节对齐,最大占用5字节。
- key:userkey内容。
- seq_num + type:key写入时的seq_num(每个IO递增)和key的类型,一共占用8字节。
- value_size:value的size,按照1字节对齐,最大占用5字节。
- value:真实的value。
key的编码逻辑如下:
Status MemTable::Add(SequenceNumber s, ValueType type,
const Slice& key, /* user key */
const Slice& value,
const ProtectionInfoKVOS64* kv_prot_info,
bool allow_concurrent,
MemTablePostProcessInfo* post_process_info, void** hint) {
.