0% found this document useful (0 votes)
11 views

Ceph Workshop: Gridka School 2015

Uploaded by

tuandat071101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Ceph Workshop: Gridka School 2015

Uploaded by

tuandat071101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

ceph workshop

GridKA School 2015

Diana Gudu
Uros Stevanovic
September 8, 2015
Karlsruhe Institute of Technology
introduction round
diana gudu

∙ PhD researcher in Computer Science @KIT (SCC)


∙ distributed multi-agent framework for trading cloud resources

∙ working in HBP @SCC on cloud storage


∙ MSc in Computational Science and Engineering @TU Munich
∙ BSc in Computer Science @Polytechnic University of Bucharest

2
uros stevanovic

∙ working in AARC project @KIT (SCC)


∙ PhD @KIT (IPE): 2010-2015
∙ building a custom smart camera framework
∙ using FPGAs
∙ implementing image processing algorithms

∙ studied Electrical Engineering @University of Belgrade

3
you

Your turn!

4
evolution of storage
evolution of storage

Human

Computer

Disk

6
evolution of storage

©trumpiot.co
6
evolution of storage

Human Human Human

Computer

Disk Disk Disk Disk Disk

6
evolution of storage

Human Human Human Human Human

Human Human Human Human Human Human Human

Human Human

Computer

Disk Disk Disk Disk Disk

6
evolution of storage

Human Human Human Human Human

Human Human Human Human Human Human Human

Human Human

Computer
Big Expensive Computer

Disk Disk Disk Disk Disk

6
evolution of storage

Human Human Human

Computer Computer Computer Computer Computer

Disk Disk Disk Disk Disk

6
evolution of storage

Human Human Human

Storage appliance

Computer Computer Computer Computer Computer

Disk Disk Disk Disk Disk

6
storage appliance

Oracle http://www.e-business.com/zfs-7420-storage-appliance

7
future of storage

Support and maintenance

Proprietary software

Proprietary hardware

Computer Computer Computer Computer Computer

Disk Disk Disk Disk Disk

8
future of storage

Support and maintenance Enterprise subscription (optional)

Proprietary software Open-source software

Proprietary hardware Commodity hardware

Computer Computer Computer Computer Computer Computer Computer Computer Computer Computer

Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk

8
ceph
ceph

Philosophy

∙ open-source

10
ceph

Philosophy

∙ open-source
∙ community focused

10
ceph

Philosophy

∙ open-source
∙ community focused
∙ software-defined

10
ceph

Philosophy

∙ open-source
∙ community focused
∙ software-defined
∙ scale-out hardware, no SPF

10
ceph

Philosophy

∙ open-source
∙ community focused
∙ software-defined
∙ scale-out hardware, no SPF
∙ self-managing

10
ceph

Philosophy

∙ open-source
∙ community focused
∙ software-defined
∙ scale-out hardware, no SPF
∙ self-managing
∙ failure is normal

10
ceph

Philosophy History

∙ open-source
∙ community focused 2014

∙ software-defined 2012

2010
∙ scale-out hardware, no SPF
2006
∙ self-managing
2004

∙ failure is normal

10
ceph

Philosophy History

∙ open-source
∙ community focused 2014

∙ software-defined 2012

2010
∙ scale-out hardware, no SPF
2006
∙ self-managing
PhD thesis at UCSC 2004

∙ failure is normal

10
ceph

Philosophy History

∙ open-source
∙ community focused 2014

∙ software-defined 2012

2010
∙ scale-out hardware, no SPF
Project is open-sourced 2006
∙ self-managing
PhD thesis at UCSC 2004

∙ failure is normal

10
ceph

Philosophy History

∙ open-source
∙ community focused 2014

∙ software-defined 2012

Included in Linux kernel 2010


∙ scale-out hardware, no SPF
Project is open-sourced 2006
∙ self-managing
PhD thesis at UCSC 2004

∙ failure is normal

10
ceph

Philosophy History

∙ open-source
∙ community focused 2014

∙ software-defined Integrated into CloudStack 2012

Included in Linux kernel 2010


∙ scale-out hardware, no SPF
Project is open-sourced 2006
∙ self-managing
PhD thesis at UCSC 2004

∙ failure is normal

10
ceph

Philosophy History

∙ open-source
∙ community focused RedHat acquisition 2014

∙ software-defined Integrated into CloudStack 2012

Included in Linux kernel 2010


∙ scale-out hardware, no SPF
Project is open-sourced 2006
∙ self-managing
PhD thesis at UCSC 2004

∙ failure is normal

10
ceph architecture

Objects Virtual disks Files and directories

Ceph Object Gateway Ceph Block Device Ceph Filesystem

S3- and Swift-compatible virtual block device POSIX-compliant

Ceph storage cluster

A reliable, easy to manage, distributed object store

11
ceph architecture

Application REST Host/VM FS client

RADOSGW RBD CephFS

bucket-based REST gateway virtual block device POSIX-compliant

S3- and Swift-compatible Linux kernel client Linux kernel client

QEMU/KVM driver FUSE support

librados

allows apps direct access to RADOS

support for C, C++, Java, Python, Ruby, PHP

RADOS

A reliable, autonomous, distributed object store

consisting of self-healing, self-managing, intelligent storage nodes

11
rados

OSD OSD OSD OSD OSD OSD OSD


MON MON

OSD OSD OSD OSD OSD OSD OSD OSD


MON

12
ceph daemons

OSD MON
∙ serve objects to clients ∙ maintain cluster state and
∙ one per disk membership

∙ backend: btrfs, xfs, ext4 ∙ vote for distributed


decision-making
∙ peer-to-peer replication and
recovery ∙ small, odd number

∙ write-ahead journal

13
data placement
hotels

http://free-stock-illustration.com/hotel+key+card

15
hotels

http://2.bp.blogspot.com/-o-rlIrv094E/TXxj8D-B2LI/AAAAAAAAGh8/VEbrbHpxVxo/s1600/DSC02213.JPG
15
hotels

∙ What if the hotel had 1 billion rooms? Or ∞?


#13,565,983

15
hotels

What if the hotel changed constantly?

http://waltonian.com/news/eastern-library-renovations-continue/

15
hotels

Scale-up everything?

http://www.millenniumhotels.com/content/dam/global/en/the-heritage-hotel-manila/images/cons-photographics-lobby-reception-desk%2003062011_34-basicB-preview-2048.jpg

15
hotels

∙ The hotel itself must assign people to rooms instead of a


centralized place
∙ The hotel should grow itself organically

15
hotels

∙ The hotel itself must assign people to rooms instead of a


centralized place
∙ The hotel should grow itself organically
∙ Deterministic placement algorithm
∙ Intelligent nodes

15
crush

011101010100011010101010

hash obj id + pool

placement group

CRUSH

OSD

OSD OSD

16
crush

obj=’foo’

011101010100011010101010 pool=’bar’

hash(’foo’) % 256 = 0x23


hash obj id + pool
’bar’ = 5

placement group 5.23

CRUSH crush(5.23) = [2, 14, 29]

OSD osd.2

OSD OSD osd.14 osd.29

16
crush

Controlled Replication Under Scalable Hashing

∙ Pseudo-random placement algorithm


∙ Repeatable, deterministic
∙ Statistically uniform distribution
∙ Stable mapping: minimal data migration
∙ Rule-based configuration, topology aware

17
crush

Controlled Replication Under Scalable Hashing

∙ Pseudo-random placement algorithm


∙ Repeatable, deterministic
∙ Statistically uniform distribution
∙ Stable mapping: minimal data migration
∙ Rule-based configuration, topology aware

rack bucket

host bucket host bucket

osd bucket osd bucket osd bucket osd bucket

17
ceph clients
librados

∙ direct access to RADOS for applications


∙ C, C++, Python, Java, Erlang, PHP
∙ native socket access, no HTTP overhead

19
radosgw

∙ RESTful API
∙ unified object namespace
∙ S3 and Swift compatible
∙ user database and access control
∙ usage accounting, billing

20
rbd

∙ Storage of disk images in RADOS


∙ Images are striped across the cluster
∙ Decoupling of VMs from host
∙ Thin provisioning
∙ physical storage only used once you begin writing

∙ Snaphots, copy-on-write clones


∙ Support in Qemu, KVM

21
CephFS

FS client

metadata data

OSD OSD MDS OSD OSD OSD OSD


MON MON

MDS OSD OSD OSD OSD OSD MDS OSD


MON

22
CephFS

Metadata Server

∙ Manages metadata for POSIX-compliant filesystem


∙ directory hierarchy
∙ file metadata: owner, timestamps, mode etc

∙ Stores metadata in RADOS


∙ Multiple MDS for HA and load balancing

23
dynamic subtree partitioning

MDS MDS MDS MDS

24
tutorial
overview

∙ Deploy a Ceph cluster


∙ Basic operations with the storage cluster
∙ Data placement: CRUSH
∙ Ceph Filesystem
∙ Block storage: RBD
∙ Advanced topics: erasure coding
∙ Troubleshooting challenge

26
cluster set-up

OSD
ceph-5 admin ceph-1 MON
/dev/vdb

OSD
ceph-2 MON
/dev/vdb

OSD
ceph-3 MON
/dev/vdb

MDS OSD
ceph-4
/dev/vdb

27
Questions?

28

You might also like