SlideShare a Scribd company logo
Introduction to
Cassandra.yaml
& friends
Hi, I’m Edward Capriolo.
@edwardcapriolo
https://www.linkedin.com/in/edwardcapriolo
http://www.slideshare.net/edwardcapriolo
Consultant
The Last Pickle
Cassandra user since (v 0.6.5)
White Plains, NY USA
We help people deliver and improve Apache
Cassandra based solutions.
With staff in 5 countries and over 50 years
combined experience.
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassandra Summit 2016
This talk is the ʻgatewayʼ talk…
Many ʻpicklersʼ (TLP staff) are covering
some points I will quickly cover over in
depth in other talks.
1. Key configuration settings
2. Configuration outside of the yaml
3. Multi-system configuration settings
4. Advanced settings
Section Overview
5. Exotic settings
1. $ wget <apache-cassandra*.tar.gz>
2. $ tar -xf <apache-cassandra*.tar.gz>
3. $ apache-cassandra*/bin/cassandra
Basic setup
Result:
Web scale distributed storage
Drop Mic.
Well almost…
We have to do a bit of configuration.
cqlsh> CREATE KEYSPACE test WITH replication = 

{'class': ‘SimpleStrategy', 'replication_factor' : 1};

cqlsh> USE test;

cqlsh:test> CREATE COLUMNFAMILY trip (src varchar,
... dest varchar, PRIMARY KEY (src,dest));
Before we dive into config
cqlsh:test> SELECT * FROM trip;
src | dest
-----+------
ny | ca
cqlsh:test> INSERT INTO trip (src, dest) VALUES ('ny', 'ca');
cqlsh:test> INSERT INTO trip (src, dest) VALUES ('fl', 'ca');
cqlsh:test> SELECT * FROM trip;
src | dest
-----+------
fl | ca
ny | ca
Single Data Center
Multiple Data Center
data_file_directories:
- /var/lib/cassandra/data
1. User data is stored in all listed directories
2. Do: fast seekʼing storage (SSD)
Where does the data go?
3. Do: ample free space (30% overhead)
4. Donʼt: Store on a SAN
commitlog_directory:
- /var/lib/cassandra/commitlog
2. Donʼt: Assume these are log4j type logs
3. Do: use a dedicated disk if possible
Commit log storage
4. Do: provide at least 10GB (write velocity)
1. Stores unflushed mutations (write/deletes)
Ok we now where
(most of) the data goes…
How do clients connect?
Default port binding
1. Cassandra does not bind to 0.0.0.0
2. 127.0.0.1 not web scale
3. 7000 is the “Storage Port” inter node traffic
4. 9042 is the “Native Port” client traffic
start_native_transport: true (default)
native_transport_port: 9042 (default)
listen_address: localhost
1. Change listen_address to a client-reachable address
Native transport
2. Do: consider transport security
3. Do: consider network routing performance
4. Donʼt: put nodes on a public network. EVAR
Outside the yaml file…
cassandra-env.sh (& friends)
1. JVM and startup params defined outside the YAML
2. Newer version of c* use jvm.options
#MAX_HEAP_SIZE="1G"
#HEAP_NEWSIZE="100M"
1. max(min(1/2 ram, 1024MB), 

min(1/4 ram, 8GB))
2. Do: set lower when experimenting with workstation
Memory usage
3. Do: leave ample free memory for disk cache
JMX
1. bin/nodetool uses JMX to administer Cassandra
2. All management tools require password if set
Check out Nate’s talk on Securing Cassandra to learn more
Multi-node configurations
# phi_convict_threshold: 8
1. Threshold for failure detector
3. Do: Raise for flaky WAN networks 10 - 12
Phi convict threshold
2. False positives make nodes appear down to peers
# endpoint_snitch: SimpleSnitch
2. Do: use SimpleSnitch for single switch/LAN
3. Consider: Multi DC to start
Defining network topology
1. Snitch with config data determines topology
3. Rack has impact on replication copies
4. Donʼt: Change rack unless you understand the impact
Gossiping Property File Snitch
2. DC may not be physical but is a replication unit
conf/cassandra-rackdc.properties 

dc=dc1
rack=rack1
1. Information is propagated around the cluster
internode_compression: all | dc | none
inter_dc_tcp_nodelay: false
Internode communications
1. WAN can benefit from reduced size
server_encryption_options:
internode_encryption: none
internode_authenticator:
o.a.c.auth.AllowAllInternodeAuthenticator
2. Settings which server nodes use to communicate
broadcast_address: 1.2.3.4
listen_on_broadcast_address: false
broadcast_rpc_address: 1.2.3.4
Broadcast address
1. Gossip a specific address (not bind address)
2. Useful in NAT and cloud environments
Broadcast address
Advanced settings
Write path
http://www.toadworld.com/platforms/nosql/w/wiki/11621.an-
introduction-to-apache-cassandra
#memtable_flush_writers: 1
1. Default One per data directory
Memtables
# memtable_cleanup_threshold
defaults to 1 /
(memtable_flush_writers +
#memtable_cleanup_threshold: 0.11
2. 1 / (1 + 1) = .5
#If omitted, both set to 1/4 the heap
#memtable_heap_space_in_mb: 2048

#memtable_offheap_space_in_mb: 2048
1. Depending on the next setting dictates how much
of each memory type is used
.5 of what you ask?
#heap_buffers: on heap nio buffers

#offheap_buffers: off heap nio buffers

#offheap_objects: off heap objects

#memtable_allocation_type: heap_buffers
2. Based on column value buffers vs objects may be
better
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
Trickle fsync
1. Optimization to periodically f-sync large files
2. Designed to prevent latency spikes in read path
Include image of compaction here
Compaction
https://www.instaclustr.com/blog/2016/01/27/apache-
cassandra-compaction/
concurrent_compactors: 1
compaction_throughput_mb_per_sec: 16
1. Control resources used by compaction
Compaction
2. Compaction throughput can be changed at runtime
3. Generally concurrent_compactors < 8 and > 1
disk_failure_policy: stop
commit_failure_policy: stop
Disk Failure settings
1. stop_paranoid: shut down gossip and client
transports even for single-sstable errors, kill the JVM
for errors during startup
2. die: shut down gossip and Thrift and kill the JVM,
so the node can be replaced
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /var/lib/cassandra/hints
hints_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128

hints_compression: LZ4Compressor
Hints
1. Hints recently redesigned, again again
2. Donʼt: tune high and overwhelming recovering node
3. Donʼt: tune low and have out of sync data
#disk_optimization_strategy: ssd
1. Tip for those with rotational disks
Disk optimization strategy
Exotic settings
auto_bootstrap : true(hidden variable)
Auto bootstrap
1. “Bootstrapping” here means: Should the node
joining attempt to acquire data from other nodes or
startup empty
2. Can be used when bringing on new datacenter
3. Can be used when streaming/ join issues
incremental_backups: false

snapshot_before_compaction: false

auto_snapshot: true
Backup*Ish options
1. Enable with external backup like tools
2. Creates hard link files operator must clean up
3. Enabling and not cleaning will cause disk fill up
4. Truncate/drop makes snapshot
read_request_timeout_in_ms: 5000

write_request_timeout_in_ms: 2000

request_timeout_in_ms: 10000
Per operation default timeouts
1. Each operation type has different timeout
2. Applied on the coordinator not the client
3. Previously was only global rpc_timeout
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

commitlog_segment_size_in_mb: 32
Commit Log sync
1. Alternative batch mode blocks ack to clients
2. Commit logs persist until Memtableʼs flush
Thanks!@edwardcapriolo

More Related Content

PDF
Bigtable and Dynamo
PDF
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
PDF
NoSQL
PPTX
Operating System-Memory Management
PPT
NUMA overview
PDF
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
PPTX
Introduction to Apache Spark
PPTX
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Bigtable and Dynamo
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
NoSQL
Operating System-Memory Management
NUMA overview
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Introduction to Apache Spark
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING

What's hot (20)

PDF
GPU Programming
PPT
Parallel Computing
PPTX
Multivector and multiprocessor
PDF
Introduction to Hadoop
PPTX
Allocation method - Operating System.pptx
PDF
Spark Summit East 2015 Advanced Devops Student Slides
PPTX
Thrift vs Protocol Buffers vs Avro - Biased Comparison
PDF
Intelligent, Automatic Restarts for Unhealthy Kafka Consumers on Kubernetes w...
PDF
Apache BookKeeper: A High Performance and Low Latency Storage Service
PDF
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
PPT
distributed shared memory
PPTX
Introduction to ML with Apache Spark MLlib
PPTX
Big data Hadoop presentation
PPTX
Parallel Processors (SIMD)
PPTX
DBMS - RAID
PPTX
Turing machine
PPTX
GOOGLE FILE SYSTEM
PPT
Hadoop Technology
PPTX
RocksDB compaction
PPTX
GPU Programming
Parallel Computing
Multivector and multiprocessor
Introduction to Hadoop
Allocation method - Operating System.pptx
Spark Summit East 2015 Advanced Devops Student Slides
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Intelligent, Automatic Restarts for Unhealthy Kafka Consumers on Kubernetes w...
Apache BookKeeper: A High Performance and Low Latency Storage Service
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
distributed shared memory
Introduction to ML with Apache Spark MLlib
Big data Hadoop presentation
Parallel Processors (SIMD)
DBMS - RAID
Turing machine
GOOGLE FILE SYSTEM
Hadoop Technology
RocksDB compaction
Ad

Similar to A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassandra Summit 2016 (20)

PDF
Building Apache Cassandra clusters for massive scale
PDF
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
PDF
Percona XtraDB 集群文档
DOCX
Network Manual
PDF
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
PPTX
Docker Swarm secrets for creating great FIWARE platforms
PDF
Percona Cluster Installation with High Availability
PDF
Squid proxy server
PPTX
Deploying Foreman in Enterprise Environments
PDF
Securing Cassandra for Compliance
PDF
Hardening cassandra q2_2016
PPT
Performance and Scalability
PDF
Building a Gateway Server
PPTX
Hadoop administration
PDF
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
PPT
Montreal On Rails 5 : Rails deployment using : Nginx, Mongrel, Mongrel_cluste...
PDF
Docker Security in Production Overview
PPTX
Flume and Hadoop performance insights
PPTX
Big Data in Container; Hadoop Spark in Docker and Mesos
PDF
Hadoop Cluster With High Availability
Building Apache Cassandra clusters for massive scale
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
Percona XtraDB 集群文档
Network Manual
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Docker Swarm secrets for creating great FIWARE platforms
Percona Cluster Installation with High Availability
Squid proxy server
Deploying Foreman in Enterprise Environments
Securing Cassandra for Compliance
Hardening cassandra q2_2016
Performance and Scalability
Building a Gateway Server
Hadoop administration
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
Montreal On Rails 5 : Rails deployment using : Nginx, Mongrel, Mongrel_cluste...
Docker Security in Production Overview
Flume and Hadoop performance insights
Big Data in Container; Hadoop Spark in Docker and Mesos
Hadoop Cluster With High Availability
Ad

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
PDF
Designing a Distributed Cloud Database for Dummies
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PDF
How to Evaluate Cloud Databases for eCommerce
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
PPTX
Datastax - The Architect's guide to customer experience (CX)
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Is Your Enterprise Ready to Shine This Holiday Season?
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Datastax - The Architect's guide to customer experience (CX)
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded (20)

PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
System and Network Administraation Chapter 3
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
AI in Product Development-omnex systems
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Softaken Excel to vCard Converter Software.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Choose the Right IT Partner for Your Business in Malaysia
Reimagine Home Health with the Power of Agentic AI​
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
System and Network Administraation Chapter 3
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Odoo Companies in India – Driving Business Transformation.pdf
Transform Your Business with a Software ERP System
wealthsignaloriginal-com-DS-text-... (1).pdf
AI in Product Development-omnex systems
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
PTS Company Brochure 2025 (1).pdf.......
Softaken Excel to vCard Converter Software.pdf

A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassandra Summit 2016