0% found this document useful (0 votes)
22 views10 pages

Ceph, Storage For CERN Cloud

Ceph is an open-source distributed storage system utilized at CERN, providing reliable storage on commodity hardware with a focus on data consistency and self-healing capabilities. It serves as the storage backbone for various IT services at CERN, including code repositories, virtualization, and analytics, with a history of significant expansions and integrations with OpenStack. The infrastructure includes a mix of HDD and flash servers, and is monitored using tools like Prometheus and Grafana.

Uploaded by

sandor.bujdoso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

Ceph, Storage For CERN Cloud

Ceph is an open-source distributed storage system utilized at CERN, providing reliable storage on commodity hardware with a focus on data consistency and self-healing capabilities. It serves as the storage backbone for various IT services at CERN, including code repositories, virtualization, and analytics, with a history of significant expansions and integrations with OpenStack. The infrastructure includes a mix of HDD and flash servers, and is monitored using tools like Prometheus and Grafana.

Uploaded by

sandor.bujdoso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Ceph

Infrastructure Storage at CERN

Enrico Bocchi Pictet visit


CERN IT, Storage 27 September 2024
What is Ceph?
 Distributed Storage System, Open Source
 Reliable storage out of unreliable components:
 Runs on commodity hardware (IP networks, HDDs/SSDs/NVMes)
 Favors data consistency and correctness over performance and availability

 Elastic and self-healing:


 Scale up or out online and under load (or similarly shrink)
 Self-recovery from HW failures, res-establishing desired redundancy

Virtual Disks Objects Files & Directories

QEMU kRBD
S3 Swift Fuse Kernel
libvirt iSCSI

RBD RadosGW CephFS


LIBRADOS (Low-Level Storage API)
RADOS Storage Layer (Replication || Erasure Coding)

Pictet visit, 27/09/2024 2


What does Ceph do for us?
 Storage backbone underpinning CERN’s IT Cloud and Services
 Code repositories,
Container Registries,
GitOps, Agile Infrastructure
 Document / Web Hosting
Application Size (raw) Clusters
 Monitoring: OpenSearch, Kafka,
Grafana, InfluxDB, Kibana Blocks HDD, 3x replica 25.1 PB 5
 Analytics: HTCondor, Slurm, OS Cinder/Glance Flash, EC 4+2 976 TB 2
Jupyter Notebooks, Spark
File System HDD, 3x replica 13.4 PB 5
 Virtualization of other Storage:
NFS, AFS, CVMFS, … OS Manila, K8s/OKD, HPC Flash, 3x replica 1.7 PB 4

Objects HDD, EC 4+2 28.2 PB 2


S3, Swift, Backups Multi-site, EC 4+2 3.6 PB 1

Pictet visit, 27/09/2024 3


What does Ceph do for us?

Pictet visit, 27/09/2024 4


Service History
 2013: 300TB proof of concept (replica 4!)  1 cluster, 3 PB in production for RBD
 2016: 3PB to 6PB expansion with no downtime
 2018: S3 + CephFS in production
 2020: S3 Backup cluster in 2nd location
 2021: RBD Storage Availability Zones
 2022: CephFS cluster physical move with no downtime
 2023: KernelRBD in production
 2024: New Datacenter!

 19 production clusters
“don’t put all your eggs in the same basket”
 5 additional clusters in new datacenter

 Exotic cluster configurations


- Cross-DC stretch clusters Under
- S3 multi-site objects replication Evaluation

Pictet visit, 27/09/2024 5


Integration with OpenStack
 OpenStack is the entry point for compute and storage resources:
 Ceph Blocks  Cinder volumes + Glance images for VMs
 S3 Objects  Keystone as vault for authentication keypairs
 CephFS  Manila FileShares QoS Quota

 IaaS components are


self-service to end-users
 Example of Block storage provisioning
 Quota is subject to our (Cloud+Ceph) approval,
which is also an opportunity to guide users

Volume Type QoS Pool Type AZs

standard 80MB/s, 100 IOps


3x Replicas 3 Zones
io1 120MB/s, 500 IOps

io2 300MB/s, 1000 IOps


EC 4+2
Full-Flash -
300MB/s, 5 IO per GB
io3 (min 500, max 2000) Backup Quota

Pictet visit, 27/09/2024 6


A few words on Hardware and Network
 2 main hardware types for any of blocks, file system**, and objects:
 HDD server:
- Frontend with a handful of SSD devices (OS,
Ceph journals), 25Gbps NIC, SAS controller
- 2x JBOD with 24 enterprise CMR HDDs
 Full flash server:
- 2U node with 10x NVMe (was SATA SSDs)
 Cores and memory depend on number of drives

 Network:
 Ceph supports cluster VS public (i.e., anything else) networks, IPv4 or IPv6 + TCP
 It may be network hungry when doing major rebalancing or recovery operations

** CephFS at scale needs extra care:


 Metadata is stored on a dedicated pool, which loves to be on flash drives
 Metadata server (MDS) requires memory: Consider 64+ GB per MDS, can scale out horizontally

Pictet visit, 27/09/2024 7


A few words on Monitoring
 Metrics + Logs
 Prometheus Node exporter
 Ceph Prometheus module
 OpenStack exporter
for metrics integration
 Prometheus local – last 48h
 Thanos store (on S3)
for long-term archival
 Grafana for visualization

 Several homegrown scripts


remain for custom metrics
(latency, PSI, S3 checks, …)

 Logs?
Fluentbit, Kakfa,
Logstash, OpenSearch 30 days on our main Block Storage cluster

Pictet visit, 27/09/2024 8


Learn more about Ceph
 Website: ceph.io
 Mailing List: [email protected]
 Community Google Calendar – Monthly User+Dev Meetup + Tech Talks
 Cephalocon – Flagship yearly conference at CERN in December!

Pictet visit, 27/09/2024 9


Discussion

Enrico Bocchi
[email protected]

Pictet visit, 27/09/2024 10

You might also like