NoSQL Data Modeling
Concepts and Cases


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
NoSQL?
NoSQL : Various Shapes and Sizes

• Document Databases


• Column-family Oriented Stores


• Key/value Data stores


• XML Databases


• Object Databases


• Graph Databases
Key Questions

• How do I model data for my application?


• How do I determine which one is right for me?


• Can I easily shift from one database to the other?


• Is there a standard way of storing, accessing, and querying data?
Agenda for this session

• Explore some of the main NoSQL products


• Understand how they are similar and different


• How best to use these products in the stack


•
Document Databases




• also GenieDB, SimpleDB
What is a document db?

• One that stores documents


• Popular options:


  • MongoDB -- C++


  • CouchDB -- Erlang


  • Also Amazon’s SimpleDB


• ...what exactly is a document?
In the real world




• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON

• {name: “John Doe”,


• zip: 10001}
What about db schema?

• Schema-less


• Different documents could be stored in a single collection
Data types: MongoDB

• Essential JSON types:


• string


• integer


• boolean


• double
Data types: MongoDB (...cont)

• Additional JSON types


• null, array and object


• BSON types -- binary encoded serialization of JSON like documents


   • date, binary data, object id, regular expression and code


   • (Reference: bsonspec.org)
A BSON example: object id
Data types: CouchDB

• Everything JSON


• Large objects: attachments
CRUD operations for documents

• Create


• Read


• Update


• Delete
MongoDB: Create Document

• use mydb


• w = {name: “John Doe”, zip: 10001};


• db.location.save(w);
Create db and collection

• Lazily created


• Implicitly created


• use mydb


• db.collection.save(w)
MongoDB: Read Document

• db.location.find({zip: 10001});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Read Document (...cont)

• db.location.find({name: "John Doe"});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Update Document

• Atomic operations on single documents


• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
CouchDB: RESTful

• Supports REST verbs: GET, HEAD, PUT, POST, DELETE


• Supports Replication


• Supports the notion of attachments


• Could work in offline modes and supports small footprint profiles
Sorted Ordered Column-family Datastores

• Sorted


• Ordered


• Distributed


• Map
Essential schema
Multi-dimensional View
A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Architectural View (HBase)
The Persistence Mechanism
Model Wrappers (The GAE Way)

• Python


  • Model, Expando, PolyModel


• Java


  • JDO, JPA
HBase Data Access

• Thrift + Avro


• Java API -- HTable, HBaseAdmin


• Hive (SQL like)


• MapReduce -- sink and/or source
Transactions

• Atomic row level


• GAE Entity Groups
Indexes

• Row ordered


• Secondary indexes


• GAE style multiple indexes


  • thinking from output to query
Use cases

• Many Google’s Products


• Facebook Messaging


• StumbleUpon


  • Open TSDB


• Mahalo, Ning, Meetup, Twitter, Yahoo!


• Lily -- open source CMS built on HBase & Solr
Brewer’s CAP Theorem




• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf


• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math

• R – Number of nodes that are read from.


• W – Number of nodes that are written to.


• N – Total number of nodes in the cluster.




• In general: R < N and W < N for higher availability
R+W>N

• Easy to determine consistent state


• R + W = 2N


  • absolutely consistent, can provide ACID gaurantee


• In all cases when R + W > N there is some overlap between read and write
  nodes.
R = 1, W = N

• more reads than writes


•W=N


  • 1 node failure = entire system unavailable
R = N, W =1

•W=N


 • Chance of data inconsistency quite high


•R=N


 • Read only possible when all nodes in the cluster are available
R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
Eventual consistency variants

• Causal consistency -- A writes and informs B then B always sees updated
  value


• Read-your-writes-consistency -- A writes a new value and never see the old
  one


• Session consistency -- read-your-writes-consistency within a client session


• Monotonic read consistency -- once seen a new value, never return previous
  value


• Monotonic write consistency -- serialize writes by the same process
Dynamo Techniques

• Consistent Hashing (Incremental scalability)


• Vector clocks (high availability for writes)


• Sloppy quorum and hinted handoff (recover from temporary failure)


• Gossip based membership protocol (periodic, pair wise, inter-process
  interactions, low reliability, random peer selection)


• Anti-entropy using Merkle trees


• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-
  dynamo-sosp2007.pdf)
Consistent Hashing
CouchDB MVCC Style




• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores

• Memcached


• Membase


• Redis


• Tokyo Cabinet


• Kyoto Cabinet


• Berkeley DB
Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

More Related Content

PPTX
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
PPTX
Modeling JSON data for NoSQL document databases
PPTX
SQL vs NoSQL
PDF
SQL vs NoSQL, an experiment with MongoDB
PPTX
Common MongoDB Use Cases
PDF
SQL vs. NoSQL Databases
PPTX
Azure DocumentDB 101
PDF
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
Modeling JSON data for NoSQL document databases
SQL vs NoSQL
SQL vs NoSQL, an experiment with MongoDB
Common MongoDB Use Cases
SQL vs. NoSQL Databases
Azure DocumentDB 101
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

What's hot (20)

PPTX
PDF
MongoDB for Coder Training (Coding Serbia 2013)
PPT
5 Data Modeling for NoSQL 1/2
PPTX
Azure DocumentDB
PPS
PPT
MongoDB - An Agile NoSQL Database
KEY
NoSQL: Why, When, and How
KEY
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
PPTX
Cool NoSQL on Azure with DocumentDB
PPTX
An Introduction To NoSQL & MongoDB
PPTX
Azure doc db (slideshare)
PPTX
Introduction à DocumentDB
PPTX
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
PPTX
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
PDF
The What and Why of NoSql
PDF
Introduction to mongo db
PPTX
PPTX
MongoDB Schema Design by Examples
MongoDB for Coder Training (Coding Serbia 2013)
5 Data Modeling for NoSQL 1/2
Azure DocumentDB
MongoDB - An Agile NoSQL Database
NoSQL: Why, When, and How
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Cool NoSQL on Azure with DocumentDB
An Introduction To NoSQL & MongoDB
Azure doc db (slideshare)
Introduction à DocumentDB
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
The What and Why of NoSql
Introduction to mongo db
MongoDB Schema Design by Examples
Ad

Viewers also liked (20)

PDF
Big Data Modeling
PPTX
Data Modeling for NoSQL
PDF
Data Modeling for Big Data
PPTX
Ocean base海量结构化数据存储系统 hadoop in china
PDF
Couchdb and me
PDF
Ooredis
PDF
Mysql HandleSocket技术在SNS Feed存储中的应用
PDF
Consistency Models in New Generation Databases
PPT
8 minute MongoDB tutorial slide
ODP
Consistency in Distributed Systems
PDF
Big Challenges in Data Modeling: NoSQL and Data Modeling
PPT
skip list
PDF
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
PDF
Cache coherence
PDF
Thoughts on Transaction and Consistency Models
PDF
Data Modeling for Integration of NoSQL with a Data Warehouse
PDF
Boosting Machine Learning with Redis Modules and Spark
KEY
Schema Design with MongoDB
PDF
Coherence and consistency models in multiprocessor architecture
PDF
Consistency in Distributed Systems
Big Data Modeling
Data Modeling for NoSQL
Data Modeling for Big Data
Ocean base海量结构化数据存储系统 hadoop in china
Couchdb and me
Ooredis
Mysql HandleSocket技术在SNS Feed存储中的应用
Consistency Models in New Generation Databases
8 minute MongoDB tutorial slide
Consistency in Distributed Systems
Big Challenges in Data Modeling: NoSQL and Data Modeling
skip list
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
Cache coherence
Thoughts on Transaction and Consistency Models
Data Modeling for Integration of NoSQL with a Data Warehouse
Boosting Machine Learning with Redis Modules and Spark
Schema Design with MongoDB
Coherence and consistency models in multiprocessor architecture
Consistency in Distributed Systems
Ad

Similar to SDEC2011 NoSQL Data modelling (20)

PDF
SDEC2011 NoSQL concepts and models
PPTX
Introduction to NoSQL
PPTX
Webinar: Building Your First Application with MongoDB
PDF
Mongodb my
PDF
MongoDB
PPTX
MongoDB
PDF
NoSQL overview #phptostart turin 11.07.2011
PPTX
NoSQL and The Big Data Hullabaloo
PPTX
A Practical Look at the NOSQL and Big Data Hullabaloo
PDF
Spring one2gx2010 spring-nonrelational_data
PPT
No sql Database
PPTX
No SQL : Which way to go? Presented at DDDMelbourne 2015
PPTX
NoSQL, which way to go?
PPT
Object Relational Database Management System
PPTX
Drop acid
PDF
MongoDB: a gentle, friendly overview
PDF
Solr cloud the 'search first' nosql database extended deep dive
PDF
NOsql Presentation.pdf
PDF
NoSQL Introduction
SDEC2011 NoSQL concepts and models
Introduction to NoSQL
Webinar: Building Your First Application with MongoDB
Mongodb my
MongoDB
MongoDB
NoSQL overview #phptostart turin 11.07.2011
NoSQL and The Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
Spring one2gx2010 spring-nonrelational_data
No sql Database
No SQL : Which way to go? Presented at DDDMelbourne 2015
NoSQL, which way to go?
Object Relational Database Management System
Drop acid
MongoDB: a gentle, friendly overview
Solr cloud the 'search first' nosql database extended deep dive
NOsql Presentation.pdf
NoSQL Introduction

More from Korea Sdec (15)

KEY
SDEC2011 Big engineer vs small entreprenuer
PDF
SDEC2011 Implementing me2day friend suggestion
PDF
SDEC2011 Introducing Hadoop
PDF
Sdec2011 shashank-introducing hadoop
PDF
SDEC2011 Essentials of Pig
PDF
SDEC2011 Essentials of Mahout
PDF
SDEC2011 Essentials of Hive
ZIP
Sdec2011 Introducing Hadoop
PDF
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
PDF
SDEC2011 Rapidant
PDF
SDEC2011 Mahout - the what, the how and the why
PDF
SDEC2011 Going by TACC
PDF
SDEC2011 Glory-FS development & Experiences
PDF
SDEC2011 Using Couchbase for social game scaling and speed
PDF
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Introducing Hadoop
Sdec2011 shashank-introducing hadoop
SDEC2011 Essentials of Pig
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of Hive
Sdec2011 Introducing Hadoop
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Rapidant
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Going by TACC
SDEC2011 Glory-FS development & Experiences
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Arcus NHN memcached cloud

Recently uploaded (20)

PPTX
maintenance powerrpoint for adaprive and preventive
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PDF
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
PPTX
Presentation - Principles of Instructional Design.pptx
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PDF
Domain-specific knowledge and context in large language models: challenges, c...
PPTX
How to use fields_get method in Odoo 18
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PDF
State of AI in Business 2025 - MIT NANDA
PDF
The Digital Engine Room: Unlocking APAC’s Economic and Digital Potential thro...
PPTX
Information-Technology-in-Human-Society (2).pptx
PDF
Child-friendly e-learning for artificial intelligence education in Indonesia:...
PDF
Technical Debt in the AI Coding Era - By Antonio Bianco
PDF
Peak of Data & AI Encore: Scalable Design & Infrastructure
PDF
Applying Agentic AI in Enterprise Automation
PDF
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
PDF
FASHION-DRIVEN TEXTILES AS A CRYSTAL OF A NEW STREAM FOR STAKEHOLDER CAPITALI...
PDF
Be ready for tomorrow’s needs with a longer-lasting, higher-performing PC
PPTX
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
PDF
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
maintenance powerrpoint for adaprive and preventive
Report in SIP_Distance_Learning_Technology_Impact.pptx
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
Presentation - Principles of Instructional Design.pptx
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
Domain-specific knowledge and context in large language models: challenges, c...
How to use fields_get method in Odoo 18
NewMind AI Journal Monthly Chronicles - August 2025
State of AI in Business 2025 - MIT NANDA
The Digital Engine Room: Unlocking APAC’s Economic and Digital Potential thro...
Information-Technology-in-Human-Society (2).pptx
Child-friendly e-learning for artificial intelligence education in Indonesia:...
Technical Debt in the AI Coding Era - By Antonio Bianco
Peak of Data & AI Encore: Scalable Design & Infrastructure
Applying Agentic AI in Enterprise Automation
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
FASHION-DRIVEN TEXTILES AS A CRYSTAL OF A NEW STREAM FOR STAKEHOLDER CAPITALI...
Be ready for tomorrow’s needs with a longer-lasting, higher-performing PC
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...

SDEC2011 NoSQL Data modelling

  • 1. NoSQL Data Modeling Concepts and Cases Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 3. NoSQL : Various Shapes and Sizes • Document Databases • Column-family Oriented Stores • Key/value Data stores • XML Databases • Object Databases • Graph Databases
  • 4. Key Questions • How do I model data for my application? • How do I determine which one is right for me? • Can I easily shift from one database to the other? • Is there a standard way of storing, accessing, and querying data?
  • 5. Agenda for this session • Explore some of the main NoSQL products • Understand how they are similar and different • How best to use these products in the stack •
  • 6. Document Databases • also GenieDB, SimpleDB
  • 7. What is a document db? • One that stores documents • Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB • ...what exactly is a document?
  • 8. In the real world • (Source: http://guide.couchdb.org/draft/why.html)
  • 9. In terms of JSON • {name: “John Doe”, • zip: 10001}
  • 10. What about db schema? • Schema-less • Different documents could be stored in a single collection
  • 11. Data types: MongoDB • Essential JSON types: • string • integer • boolean • double
  • 12. Data types: MongoDB (...cont) • Additional JSON types • null, array and object • BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 13. A BSON example: object id
  • 14. Data types: CouchDB • Everything JSON • Large objects: attachments
  • 15. CRUD operations for documents • Create • Read • Update • Delete
  • 16. MongoDB: Create Document • use mydb • w = {name: “John Doe”, zip: 10001}; • db.location.save(w);
  • 17. Create db and collection • Lazily created • Implicitly created • use mydb • db.collection.save(w)
  • 18. MongoDB: Read Document • db.location.find({zip: 10001}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 19. MongoDB: Read Document (...cont) • db.location.find({name: "John Doe"}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 20. MongoDB: Update Document • Atomic operations on single documents • db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 21. CouchDB: RESTful • Supports REST verbs: GET, HEAD, PUT, POST, DELETE • Supports Replication • Supports the notion of attachments • Could work in offline modes and supports small footprint profiles
  • 22. Sorted Ordered Column-family Datastores • Sorted • Ordered • Distributed • Map
  • 25. A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 28. Model Wrappers (The GAE Way) • Python • Model, Expando, PolyModel • Java • JDO, JPA
  • 29. HBase Data Access • Thrift + Avro • Java API -- HTable, HBaseAdmin • Hive (SQL like) • MapReduce -- sink and/or source
  • 30. Transactions • Atomic row level • GAE Entity Groups
  • 31. Indexes • Row ordered • Secondary indexes • GAE style multiple indexes • thinking from output to query
  • 32. Use cases • Many Google’s Products • Facebook Messaging • StumbleUpon • Open TSDB • Mahalo, Ning, Meetup, Twitter, Yahoo! • Lily -- open source CMS built on HBase & Solr
  • 33. Brewer’s CAP Theorem • http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 34. Distributed Systems & Consistency (case: success)
  • 35. Distributed Systems & Consistency (case: failure)
  • 39. RWN Math • R – Number of nodes that are read from. • W – Number of nodes that are written to. • N – Total number of nodes in the cluster. • In general: R < N and W < N for higher availability
  • 40. R+W>N • Easy to determine consistent state • R + W = 2N • absolutely consistent, can provide ACID gaurantee • In all cases when R + W > N there is some overlap between read and write nodes.
  • 41. R = 1, W = N • more reads than writes •W=N • 1 node failure = entire system unavailable
  • 42. R = N, W =1 •W=N • Chance of data inconsistency quite high •R=N • Read only possible when all nodes in the cluster are available
  • 43. R = W = ceiling ((N + 1)/2) Effective quorum for eventual consistency
  • 44. Eventual consistency variants • Causal consistency -- A writes and informs B then B always sees updated value • Read-your-writes-consistency -- A writes a new value and never see the old one • Session consistency -- read-your-writes-consistency within a client session • Monotonic read consistency -- once seen a new value, never return previous value • Monotonic write consistency -- serialize writes by the same process
  • 45. Dynamo Techniques • Consistent Hashing (Incremental scalability) • Vector clocks (high availability for writes) • Sloppy quorum and hinted handoff (recover from temporary failure) • Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection) • Anti-entropy using Merkle trees • (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 47. CouchDB MVCC Style • (Source: http://guide.couchdb.org/draft/consistency.html)
  • 48. Key/value Stores • Memcached • Membase • Redis • Tokyo Cabinet • Kyoto Cabinet • Berkeley DB
  • 49. Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com