SlideShare a Scribd company logo
Codebase 2011Getting to know the codebaseGary Dusbabek@gdusbabek
Questions?
OutlineHow to contributeInternalsSome thoughts
How to Contribute
How to Contributehttp://wiki.apache.org/cassandra/HowToContributeJIRA: “lhf” label (Low hanging fruit)Scratch your itch
How to ContributeRun the testsant testnosetests test/system/test_thrift_server.py
How to Contributehttp://wiki.apache.org/cassandra/CodeStyleAvoid:Reformatting white spaceRenaming things everywhereUnrelated changes
How to ContributeUse gitAttach patchesgit format-patch as jira attachments.Group them sensibly
How to ContributeSomeone will review your codeUsually a committerPersistence helpsDon’t get your feelings hurtIt usually takes a few rounds
How to ContributeParticipate!#cassandra-dev on freenodedev@cassandra.apache.org
Internals
ServicesRing Operations (StorageService)Storage Operations (StorageProxy)
Startup Sequencebin/cassandraFinds cassandra.in.sh$CLASSPATH (mandatory)$CASSANDRA_HOME$CASSANDRA_CONF (mandatory)Executes $CASSANDRA_CONF/cassandra-env.shSets heap sizes (gc tuning goes here!)
o.a.c.thrift.CassandraDaemon
AbstractCassandraDaemonACD.setup():Reads configuration: DatabaseDescriptorLoads schema: DD.loadSchemas()Scrub directoriesInitialize storage (keyspaces + CFs)Commit log recovery: CL.recover()StorageService.initServer() -> StorageService.joinTokenRing()
Attn Tinkerers!Abstracted initialization of transport.Handy if you’re experimenting with transports/RPCJust extend AbstractCassandraDaemon and make sure that class is started up via bin/cassandra.
o.a.c.thrift.CassandraServerImplements thrift interface methods (the API).Start here when trying to understand the read/write path and RPC.
ConfigurationDatabaseDescriptorSide-effect of ACD.setup()Reads config settings from yamlDefines system tablesChanges regularlyI hate this code.  Please fix it.
Main SingletonsStorageServiceStorageProxyMessagingServiceCompactionManagerStageManagerMigrationManager
Did you just say ‘Singletons?’
Main SingletonsStorageServiceStorageProxyMessagingServiceCompactionManagerStageManagerMigrationManager
JMX MBeansTooling supplied by MbeansAnything that does measureable/configurable work is tooledThread poolsCompactionHinted handoffStreamingStorageCommit log
StorageServiceinitServer() -> joinTokenRing()Starts gossipStarts MessagingServiceNegotiates bootstrapMany ring operations live here.Repository of ring topologyTokenMetadata (quasi-singleton via SS.tokenMetadata_)Partitioner instance is also here
MessagingServiceVerb handlers live here (initialized from SS).Main event handlers, haven’t changed much.Socket listener2 threads per ring nodeMessage gatewayemitted from MessageProducerimplsMS.sendRR()MS.sendOneWay()MS.receive()Messages are versioned now (0.8)IncomingTCPConnection
StorageProxyTop level of all read/write operationsCalled from o.a.c.thrift.CassandraServerWrite path changed because of countersNotion of WritePerformerEventually to Table and ColumnFamilyStoreFurther, to SSTable and related classes.
StageManagerFancy java ThreadPoolExecutorSEDA:  http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdfconsumes callables from a queue.Manages concurrency.Hasn’t changed much.
Adding API MethodsDefine method+structures in IDLinterface/cassandra.thriftRegenerate filesant gen-thrift-java gen-thrift-pyImplement stubs:o.a.c.thrift.CassandraServerCreate a system testtests/system/test_thrift_server.py
ReadingSocket->CassandraServerPermissionsRequest validationMarshallingReadCommands created in CS.multigetSliceInternal, passed to StorageProxy1 per key
ReadingStorageProxy.read(), fetchRows()For each ReadCommandDetermine endpointsLocal & remote branches
ReadingStorageProxy localREAD stage executes a LocalReadRunnableTrue read vs digestTable, ColumnFamilyStoreCFS.getTopLevelColumnsMake QueryFilterQuery MemtablesQuery SSTablesCoalesce in iterators
ReadingStorageProxy remoteread commandResponse handlerSend to remote nodesRead repair happens in SP.fetchRows().
WritingCS.doInsert()Marshalling, creates RMsStorageProxylocal/remote branchSP.sendToHintedEndpoints()RowMutationone Key per (several CFs)ColumnFamilyCollection of column modifications
WritingRM.apply->Table.applyWrite to CLIterate over RM CFsCFS.apply()Overwrites results on pre-existing column families
WritingRM is serialized into a Message and sent to other nodesWaits for ACKs depending on CL
Challenges
ChallengesTo have an in-depth understanding of everything.Hard for hobbyist/part-timersOutside of Datastax, little support for full-timersStill changing fastKeeping up
Challenge: Lines of Code0.4 (Sep 2009)52 kloc0.5 (Jan 2010)59 kloc0.6 (Apr 2010)73 kloc0.7 (Jan 2011)122 kloc0.8 (Jun 2011)146 klocTrunk (yesterday)149 klocAverage:4,500 lines per month
ChallengesCodewise Growing painsSoftware maturityDecisions made early on

More Related Content

What's hot (20)

PDF
使用ZooKeeper打造軟體式負載平衡
Lawrence Huang
 
PPTX
Gude for C++11 in Apache Traffic Server
Apache Traffic Server
 
PPTX
Realtime Statistics based on Apache Storm and RocketMQ
Xin Wang
 
PPTX
Asynchronous Orchestration DSL on squbs
Anil Gursel
 
PDF
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
aaronmorton
 
PPTX
Cassandra Java APIs Old and New – A Comparison
shsedghi
 
PDF
Apache Zookeeper
Nguyen Quang
 
PDF
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
PDF
Troubleshooting redis
DaeMyung Kang
 
PDF
[245] presto 내부구조 파헤치기
NAVER D2
 
PDF
Introduction to .Net Driver
DataStax Academy
 
ODP
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Md Safiyat Reza
 
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Reactivesummit
 
PDF
Cassandra NodeJS driver & NodeJS Paris
Duyhai Doan
 
PPTX
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Akara Sucharitakul
 
PDF
PagerDuty: One Year of Cassandra Failures
DataStax Academy
 
PPT
Specs2 whirlwind tour at Scaladays 2014
Eric Torreborre
 
PPTX
Introduction to apache zoo keeper
Omid Vahdaty
 
PDF
Testing Kafka components with Kafka for JUnit
Markus Günther
 
PDF
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
使用ZooKeeper打造軟體式負載平衡
Lawrence Huang
 
Gude for C++11 in Apache Traffic Server
Apache Traffic Server
 
Realtime Statistics based on Apache Storm and RocketMQ
Xin Wang
 
Asynchronous Orchestration DSL on squbs
Anil Gursel
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
aaronmorton
 
Cassandra Java APIs Old and New – A Comparison
shsedghi
 
Apache Zookeeper
Nguyen Quang
 
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
Troubleshooting redis
DaeMyung Kang
 
[245] presto 내부구조 파헤치기
NAVER D2
 
Introduction to .Net Driver
DataStax Academy
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Md Safiyat Reza
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Reactivesummit
 
Cassandra NodeJS driver & NodeJS Paris
Duyhai Doan
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Akara Sucharitakul
 
PagerDuty: One Year of Cassandra Failures
DataStax Academy
 
Specs2 whirlwind tour at Scaladays 2014
Eric Torreborre
 
Introduction to apache zoo keeper
Omid Vahdaty
 
Testing Kafka components with Kafka for JUnit
Markus Günther
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 

Similar to Cassandra Codebase 2011 (20)

PDF
Apache Con NA 2013 - Cassandra Internals
aaronmorton
 
PDF
Apache Cassandra in Bangalore - Cassandra Internals and Performance
aaronmorton
 
PDF
Evergreen Sysadmin Survival Skills
Evergreen ILS
 
PDF
Cassandra overview
Sean Murphy
 
PPT
5266732.ppt
hothyfa
 
PPT
Building scalable and language-independent Java services using Apache Thrift ...
IndicThreads
 
PPT
NOSQL and Cassandra
rantav
 
PDF
Storage tiering and erasure coding in Ceph (SCaLE13x)
Sage Weil
 
PDF
On Rails with Apache Cassandra
Stu Hood
 
PPTX
UNIT V DIS.pptx
Premkumar R
 
PPT
Building scalable and language independent java services using apache thrift
Talentica Software
 
PDF
Silicon Valley Data Science: Extending Cassandra for Fun and Profit
DataStax Academy
 
PPT
Scaling Web Applications with Cassandra Presentation (1).ppt
veronica380506
 
ODP
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Community
 
PDF
Building a distributed Key-Value store with Cassandra
aaronmorton
 
PPTX
Multi-Lingual Accumulo Communications
Accumulo Summit
 
PDF
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
PPTX
Using Cassandra with your Web Application
supertom
 
PDF
Ceph Overview for Distributed Computing Denver Meetup
ktdreyer
 
PPTX
Cassandra tutorial
Ramakrishna kapa
 
Apache Con NA 2013 - Cassandra Internals
aaronmorton
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
aaronmorton
 
Evergreen Sysadmin Survival Skills
Evergreen ILS
 
Cassandra overview
Sean Murphy
 
5266732.ppt
hothyfa
 
Building scalable and language-independent Java services using Apache Thrift ...
IndicThreads
 
NOSQL and Cassandra
rantav
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Sage Weil
 
On Rails with Apache Cassandra
Stu Hood
 
UNIT V DIS.pptx
Premkumar R
 
Building scalable and language independent java services using apache thrift
Talentica Software
 
Silicon Valley Data Science: Extending Cassandra for Fun and Profit
DataStax Academy
 
Scaling Web Applications with Cassandra Presentation (1).ppt
veronica380506
 
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Community
 
Building a distributed Key-Value store with Cassandra
aaronmorton
 
Multi-Lingual Accumulo Communications
Accumulo Summit
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
Using Cassandra with your Web Application
supertom
 
Ceph Overview for Distributed Computing Denver Meetup
ktdreyer
 
Cassandra tutorial
Ramakrishna kapa
 
Ad

More from gdusbabek (14)

PPTX
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
gdusbabek
 
PDF
How To (Not) Open Source - Javazone, Oslo 2014
gdusbabek
 
PDF
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
gdusbabek
 
PDF
Measure All the Things! - Austin Data Day 2014
gdusbabek
 
PDF
Blueflood: Open Source Metrics Processing at CassandraEU 2013
gdusbabek
 
PDF
Introduction to Blueflood at Berlin Buzzwords 2013
gdusbabek
 
PDF
Rackspace Cloud Monitoring - Strata NYC
gdusbabek
 
PPTX
Austin cassandra meetup
gdusbabek
 
PPTX
How Rackspace Cloud Monitoring uses Cassandra
gdusbabek
 
PPTX
Breaking the Relational Headlock: A Survey of NoSQL Datastores
gdusbabek
 
PPTX
Building Rackspace Cloud Monitoring
gdusbabek
 
PPTX
Data Modeling with Cassandra Column Families
gdusbabek
 
PPTX
Introduction to Cassandra (June 2010)
gdusbabek
 
PPTX
Cassandra Presentation for San Antonio JUG
gdusbabek
 
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
gdusbabek
 
How To (Not) Open Source - Javazone, Oslo 2014
gdusbabek
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
gdusbabek
 
Measure All the Things! - Austin Data Day 2014
gdusbabek
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
gdusbabek
 
Introduction to Blueflood at Berlin Buzzwords 2013
gdusbabek
 
Rackspace Cloud Monitoring - Strata NYC
gdusbabek
 
Austin cassandra meetup
gdusbabek
 
How Rackspace Cloud Monitoring uses Cassandra
gdusbabek
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
gdusbabek
 
Building Rackspace Cloud Monitoring
gdusbabek
 
Data Modeling with Cassandra Column Families
gdusbabek
 
Introduction to Cassandra (June 2010)
gdusbabek
 
Cassandra Presentation for San Antonio JUG
gdusbabek
 
Ad

Recently uploaded (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 

Cassandra Codebase 2011

Editor's Notes

  • #2: Who was here last year?Very good presentations on data modeling and capacity planning.
  • #3: Turn it around.Ask questions first.
  • #16: Transport still not initialized though.DD getting loaded is just a side-effect
  • #19: This is actually a good exercise.
  • #23: Good place to extend and experiment on your own.