Thesis introduction "RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes. "

Se Kwon Lee et al, “RECIPE:
Converting Concurrent DRAM
Indexes to Persistent-Memory
Indexes”
SOSP 2019
データベースコア輪読会、第2回、2020/04/21
東京⼤学⼤学院情報理⼯学研究科吉岡弘隆
hyoshiok@tkl.iis.u-tokyo.ac.jp @hyoshiok
1

概要
• DRAM IndexをPersistent-Memory (PM 不揮発性メモリ) Index
に変換することを提案している
• DRAM IndexをPM Indexの変換する時の条件を⽰し、事例とし
て次のデータ構造をPMを利⽤したものに変換した。B+ tree、
Trie、radix tree、Hash
• 変換は30⾏から200⾏で⼩規模。Intel DC Persistent Memory
で評価した。その結果、最⼤で5.2倍の性能向上を確認した。
2

はじめに
• Introduction
• Background
• Motivation
• The RECIPE Approach
• Testing Crash recovery of PM
• Case Studies
• Evaluation
• Discussion
• Related Work
• conclusion
3

Introduction
• Persistent Memory （PM、不揮発性メモリ）
• Intel DC Persistent Memory, 2019 April
• 様々な研究がされている
• FAST & FAIR, Level Hashing, CCeH, NV-Tree, wB+Tree, WOART, FPTree
• PM向けのindexの設計は複雑である。（バグの温床）
• RECIPE; DRAM indexが正しければ、RECIPEアプローチで正しく変換
したものも正しい。
• 例えば、以前に挿⼊したキーの値が失われていないならば、検索はそれを返す。
• consistencyはcrash recovery と関連性が深い
• DRAM indexをPM indexに変換するのはPM indexをゼロから作るより
複雑ではない。障害回復⽤の新規のアルゴリズムが必要となるわけで
はない。
4

RECIPEによるPM Index構築の利点
• DRAM indexを変更するので複雑ではない
• もとのDRAM indexが⾼性能ならば、その性能をそのまま受け
継ぐ
• ５つのDRAM indexをPM indexに変換してみた
5
DRAM
index
Data structure RECIPE
condition
lines lines core lines
modified
CLHT Hash Table #1 12.6K 2.8K 30(1%)
HOT Trie #1 36K 2K 38(2%)
BwTree B+Tree #2 13K 5.2K 85(1.6%)
ART Radix Tree #3 4.5K 1.5K 52(3.4%)
Masstree B+Tree & Trie #3 25K 2.2K 200(9%)

Background
• DRAM Indexの基本操作
• insert(key, value)
• update(key, value)
• lookup(key)
• range_query(key1, key2)
• delete(key)
• Structural Modification Operation (SMO)
• データそのものの変更ではなくて、索引の作成、更新など
• 性能
• 正しさ
6

Concurrency, Isolation
• ⼀貫性をたもつためにblocking（ロックなど）、non blocking
の⽅式がある。
• PM
• DRAMとstorageの中間の性質を持つ
• intel x86の場合8 byteの単位で書く、 “mfence”は格納の順番を保証する
• x86にはキャッシュフラッシュ⽤に”clflush”, “clwb”, “clflushopt”命令がある
• Crash Consistent PM Index
• PM indexはDRAMより⼤容量化できる。レイテンシーもDRAM程度
• DRAMはクラッシュ（電源断、カーネルクラッシュなど）からのリカ
バリが必要。PMは不必要ないしは軽量。多くのPM index が提案され
ている。
• 電源断からの復帰。フラッシュをしておく
7

補⾜資料、clflush vs clflushopt
8
Intel 64 and IA-32 Architectures Optimization Reference Manual,
Chapter 8, Order Number: 248966-042b September 2019

Motivation & The RECIPE Approach
• 既存のPM Indexとの⽐較
• FAST & FAIR
• PM B+Tree
• 設計と実装にバグを発⾒した
• 性能も低かった
• CCEH
• PM Hash table
• ２つのバグを発⾒した
• PM Indexの設計は難しい。
• principled design （原理原則に基づいた設計）
• テスト
9

The RECIPE Approach
• 障害⼀貫性を持った原理原則に基づいた設計
• もしDRAM indexの設計が正しいのならば、それを変換したPM index
も正しい。
• 条件１
• アトミックな格納で更新する
• PMへの変更。キャッシュ・フラッシュ、メモリフェンス命令を挿⼊
• すべてのダーティーデータはPMにフラッシュされる
• 条件２
• writerはinconsistencyを修正する(writerがinconsistencyを発⾒した場
合)
• reader/writerともにnon-blockingの場合
10

The RECIPE Approach
• 条件3
• writerはinconsistencyを修正しない
• readerがnon-blockingでwriterがblockingの場合
• PMへの変更。writerがinconsistencyを発⾒できる機構を導⼊する。
inconsistencyを修正する補助機能（helper mechanism）を導⼊する
11

PM Indexのクラッシュリカバリテスト
• クラシュリカバリテストは下記をテストする
• ⼀貫性のある状態に復元しているか
• 永続性のあるデータを消失していないか
• どの段階でクラッシュさせるか
• テスト空間が膨⼤になるので⼯夫が必要
• 少数のアトミックな格納に注⽬
• 各アトミックな格納後にクラッシュさせてテストする
• 他のPMテストツールはランダムにクラッシュさせている
• ⼀貫性のテスト
• 負荷をかける
• PMリカバリ
• 結果確認
12

事例研究
DRAM index reader writer Non-SMO SMO
CLHT Non-Blocking Blocking #1 #1
HOT Non-Blocking Blocking #1 #1
BwTree Non-Blocking Non-Blocking #1 #2
ART Non-Blocking Blocking #1 #3
Masstree Non-Blocking Blocking #1 #3
13
Table 2. Categorizing conversion actions

Trie: Height Optimized Trie (HOT)
• 空間効率性に優れた trie
• Non-SMO
• copy-on-write
• 挿⼊削除はアトミック。親のポインタを交換することによって原⼦性
を確保
• SMO
• prefixビットがミスマッチした時のツールがある
• PMへの変更法
• #1、アトミックなポインタの交換。store命令とフラッシュ命令で正し
さが保証される
• P-HOT、38⾏変更
14

Hash Table: Cache-Line Hash Table CLHT
• キャッシュフレンドリーなハッシュテーブル。各バケットは 64
bytes。non-blocking readerのためアトミックは key-value を
持つ
• Non-SMO
• SMO
• PMへの変更
• HOT同様に、挿⼊、削除、リハッシュは単⼀のアトミック store
• 30⾏変更（全体2.8K）
15

B+ TREE: BwTree
• B+Treeの拡張
• Non-SMO
• SMO
• PMへの変更
• キャッシュへのフラッシュとメモリフェンス
• 85⾏変更（全体5.2K）
16

Radix Tree: Adaptive Radix Tree (ART)
• Radix Treeの実装
• Non-SMO
• SMO
• PMへの変更
• Condition #1
• 52⾏変更（全体1.5K）
17

Hybrid Index: Masstree
• キャッシュフレンドリーなTrieとB+treeの拡張
• Non-SMO
• SMO
• PMへの変更
• Condition #3
• 200⾏変更（全体2.2K）
18

評価
• 実験環境
• Intel Optane DC Persistent Memory Module (DCPMM), 768GB
• DRAM 375GB
• 2 socket, 96 core machine (CPU型番は不明)
• 32MB Last Level Cache (LLC)
• Linux Kernel 4.17, Fedora
• Filesystem, ext4 DAX, App Direct mode
• NUMA node で single socketに pin して計測
• フラッシュは clwb を利⽤
• libvmmaloc library, PMDKを利⽤
• perfを利⽤してハードウェアイベントを計測
• workload は YCSB (Yahoo! Cloud Serving Benchmark)
19

関連研究
• 過去5年間、15件のPM indexが提案されている
• 3件、open source
• FAST & FAIR, CCEH, Level Hashing
• RECIPEは principled approach (他はad hoc)
• testing
• ランダムないしは虱潰し型のテスト
• RECIPEは効率的で強⼒。
23

おわりに
• https://github.com/utsaslab/RECIPE
• 通常のデータ構造をPM対応にするメソッドについて⾔及し、
実際に変更をして、その効果を⽰した。
• 本研究に従って、既存のB+Treeの実装やKVS実装に⼿を⼊れ
てみたくなった。
• 他者の研究についてもオープンソースの場合、深堀りできる。
ベンチマークを取るだけでなく、バグの発⾒などもするので貢
献度は⾼いと思った。先⾏研究が丸裸になるイメージだ。
24

Thesis introduction "RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes. "

More Related Content

Similar to Thesis introduction "RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes. "

More from Hiro Yoshioka

Thesis introduction "RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes. "