NoSQL
           What it is and is it for you?

           Iraj Islam
           Rubayeet Islam
           Nurul Ferdous


                     NewsCred

Thursday, February 3, 2011
Agenda                                                   NewsCred



                    •        Part 1. Why NoSQL?

                    •        Part 2. NoSQL Use Cases

                    •        Part 3. Choosing a NoSQL Solution

                    •        Part 4. Understanding MongoDB

                    •        Part 5. Building a MongoDB App

                    •        Part 6. Scaling MongoDB

                    •        Questions



Thursday, February 3, 2011
Who We Are                                   NewsCred




                Iraj Islam
                CTO/Co-founder, NewsCred


                Rubayeet Islam
                Senior Software Engineer, NewsCred


                Nurul Ferdous
                Senior Software Engineer, NewsCred




Thursday, February 3, 2011
Our Story                                                NewsCred




                Launched 2008
                Founded by two Bangladeshis 2008


                Funded By Investors of Twitter
                Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn)


                Top-tier Clients
                Yahoo! Orange Telecom, Harvard U, The Daily Star etc.




Thursday, February 3, 2011
What We Do                     NewsCred



                             Domain Expertise
                             •   Big Data

                             •   Information Retrieval

                             •   Machine Learning

                             •   Semantic Web


                             Technologies
                             •   Apache Solr

                             •   MySQL/MongoDB

                             •   Python/Java



Thursday, February 3, 2011
Part 1
           Why NoSQL?


                     NewsCred


Thursday, February 3, 2011
What’s NoSQL?                                  NewsCred




                                   NoSQL
                             What’s with the weird name?




Thursday, February 3, 2011
What’s NoSQL?                                NewsCred




                                NoSQL
                      Non-relational, web-scale database.




Thursday, February 3, 2011
Why NoSQL?                                       NewsCred


                                       Web 1.0
                                The read intensive web
             Publishing Model




Thursday, February 3, 2011
Why NoSQL?                                         NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model




                Textual Content




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                              Small Data




                Textual Content




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                Browsing      Small Data




                Textual Content




Thursday, February 3, 2011
Why NoSQL?                                              NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                Browsing      Small Data




                Textual Content                            Search




Thursday, February 3, 2011
Why NoSQL?                                               NewsCred


                                         Web 1.0
                                  The read intensive web
             Publishing Model                   Browsing    Small Data




                Textual Content         Personal Computer   Search




Thursday, February 3, 2011
Why NoSQL?                                                          NewsCred


                                    The Age of Big Data
                              Exabytes (1018) of data stored per year

                                                                      1000



                                                                      750


                                                                   500


                                                                  250
                             2006
                                    2007
                                           2008                   0
                                                  2009
                                                          2010


Thursday, February 3, 2011
Why NoSQL?                                     NewsCred


                                   Web 2.0+
                             The write intensive web




Thursday, February 3, 2011
Why NoSQL?                                       NewsCred


                                   Web 2.0+
                             The write intensive web




                                                  User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                        NewsCred


                                   Web 2.0+
                             The write intensive web
                                                       Big Data




                                                  User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                        NewsCred


                                   Web 2.0+
                             The write intensive web
      Semi-structured Data                             Big Data




                                                  User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                           NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                                Big Data




                 Semantic Web                        User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                           NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                Real-time       Big Data




                 Semantic Web                         User-generated Content




Thursday, February 3, 2011
Why NoSQL?                                                  NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                    Real-time          Big Data




                 Semantic Web              Ubiquity          User-generated Content
                                     Any device. Anywhere.




Thursday, February 3, 2011
Why NoSQL?                               NewsCred


                             The MySQL Problem
                                  1. Default

                               Application


                    Data
                   Source
                                       Writing

                                                 MySQL


                     User             Reading




Thursday, February 3, 2011
Why NoSQL?                                          NewsCred


                             The MySQL Problem
                                  1. Default

                               Application
                                                 Bottleneck, too much load!


                    Data
                   Source
                                       Writing

                                                             MySQL


                     User             Reading




Thursday, February 3, 2011
Why NoSQL?                               NewsCred


                             The MySQL Problem
                                2. Replication

                               Application


                    Data
                   Source
                                       Writing     MySQL
                                                   Master




                     User             Reading         MySQL
                                                      Slaves




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                             The MySQL Problem
                                2. Replication

                               Application


                    Data
                   Source
                                       Writing                     MySQL
                                                                   Master




                     User             Reading                         MySQL
                                                                      Slaves


                                                 Scalable Reads!

Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                             The MySQL Problem
                                2. Replication
                                                 Bottleneck, writes won’t scale!
                               Application


                    Data
                   Source
                                       Writing                       MySQL
                                                                     Master




                     User             Reading                            MySQL
                                                                         Slaves


                                                  Scalable Reads!

Thursday, February 3, 2011
Why NoSQL?                                   NewsCred


                             The MySQL Problem
                                 3. Sharding

                               Application


                    Data
                   Source
                                       Writing   S

                                                       MySQL


                     User             Reading    S


Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                             The MySQL Problem
                                 3. Sharding

                               Application
                                                     Great, scalable writes!

                    Data
                   Source
                                       Writing   S

                                                                    MySQL


                     User             Reading    S


Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                             The MySQL Problem
                                 3. Sharding

                               Application
                                                        Great, scalable writes!

                    Data
                   Source
                                       Writing    S

                                                                       MySQL


                     User             Reading     S
                                             Development and maintenance
                                                costs just skyrocketed!

Thursday, February 3, 2011
Why NoSQL?                                                  NewsCred


                                      Web 2.0+
                                The write intensive web
      Semi-structured Data                    Real-time          Big Data




                 Semantic Web              Ubiquity          User-generated Content
                                     Any device. Anywhere.




Thursday, February 3, 2011
Why NoSQL?                                             NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free




Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free

                                  Big Data    >> Scalable reads/writes




Thursday, February 3, 2011
Why NoSQL?                                                NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free

                                  Big Data    >> Scalable reads/writes

                                 Real-time    >> High-performance




Thursday, February 3, 2011
Why NoSQL?                                                   NewsCred


                                    The NoSQL Solution
                                       Design Goals


                             Semi-structure   >> Schema-free

                                  Big Data    >> Scalable reads/writes

                                 Real-time    >> High-performance

                                  Ubiquity    >> High-availability



Thursday, February 3, 2011
NoSQL vs RDMS                                   NewsCred



          NoSQL                           RDBMS
          • Schema-free                   • Relational schema
          • Scalable writes/reads         • Scalable reads
                                     vs
          • Auto high-availability        • Custom high-availability




Thursday, February 3, 2011
NoSQL vs RDMS                                       NewsCred



          NoSQL                               RDBMS
          • Schema-free                       • Relational schema
          • Scalable writes/reads             • Scalable reads
                                   vs
          • Auto high-availability            • Custom high-availability
          • Limited queries                   • Flexible queries
          • Eventual Consistency *            • Consistency
          • BASE                              • ACID
            * Applies to most NoSQL systems


Thursday, February 3, 2011
Is NoSQL For You?                                   NewsCred



          NoSQL                               RDBMS
          • Schema-free                       • Relational schema
          • Scalable writes/reads             • Scalable reads
                                   vs
          • Auto high-availability            • Custom high-availability
          • Limited queries                   • Flexible queries
          • Eventual Consistency *            • Consistency
          • BASE                              • ACID
            * Applies to most NoSQL systems


Thursday, February 3, 2011
Is NoSQL For You?                                   NewsCred



          NoSQL                               RDBMS
          • Schema-free                       • Relational schema
          • Scalable writes/reads             • Scalable reads
                                   vs
          • Auto high-availability            • Custom high-availability
          • Limited queries                   • Flexible queries
          • Eventual Consistency *            • Consistency
          • BASE                              • ACID
            * Applies to most NoSQL systems


Thursday, February 3, 2011
Part 2
           NoSQL Use Cases


                     NewsCred


Thursday, February 3, 2011
Who’s Using NoSQL?   NewsCred




Thursday, February 3, 2011
NoSQL Use Cases                    NewsCred



                • Consumer Use Cases
                        • Facebook
                        • Twitter
                        • NetFlix


                  • Enterprise Use Cases
                        • Rackspace
                        • TrendMicro
                        • NewsCred




Thursday, February 3, 2011
NoSQL Use Cases                                               NewsCred



                • Facebook
                        • Hbase - Facebook messages
                        • Scribe - Real-time click logs
                        • Hive      - SQL queries -> MapReduce jobs
                        • Hadoop
                             • Web analytics warehouse
                             • Distributed datastore
                             • MySQL backups




Thursday, February 3, 2011
NoSQL Use Cases                                          NewsCred



                • Twitter
                        • Hadoop    - Analytics
                        • Hbase     - People search
                        • Scribe    - Log collection framework
                        • FlockDB   - Social graph analysis




Thursday, February 3, 2011
NoSQL Use Cases                                                NewsCred



                • Rackspace
                        • Cassandra – stat collection, mail and apps

                  • TrendMicro
                        • Hbase & Hadoop – reputation databases

                  • NewsCred
                        • MongoDB
                          • API usage analytics
                          • Pixel tracking analytics
                          • Entity metadata storage


Thursday, February 3, 2011
Demo
           NewsCred API Analytics


                     NewsCred


Thursday, February 3, 2011
Part 3
           Choosing a NoSQL Solution


                     NewsCred


Thursday, February 3, 2011
Choosing a NoSQL Solution                                                                                                    NewsCred

                                                                          Availability
                                                                   Each:client:can:always:read:and:write




                                                                                A
                                     RDBMSs                                                                     Cassandra
                                      MySQL:                                                                    Voldemort
                                  PostgreSQL                                                                    CouchDB
                                   Aster:Data         CA                                                   AP   Dynamo
                                  GreenPlum                                                                     SimpleDB
                                       Vertica                                                                  Tokyo:Cabinet
                                                                                                                Riak




                                                 C                                                          P          PartitionDtolerance:
                 Consistency                                                    CP
                 All:clients:have:the:same:view:of:                                                                    The:system:works:well:despite:
                 the:data                                  BigTable        Scalaris                                    physical:network:partitions

                                                           HyperTable      Berkeley:DB
                                                           Hbase           Memcache:DB
                                                           MongoDB         Redis



Thursday, February 3, 2011
Consistent, Available (CA)                                 NewsCred


                             CA-systems have trouble with partitions and
                                     deal with it with replication.

                  • Examples
                        • MySQL (relational)
                        • Aster Data (relational)
                        • Greenplum (relational)
                        • Vertica (column)




Thursday, February 3, 2011
Availability, Partition-Tolerant (AP)                    NewsCred


                         AP-systems have trouble with consistency, achieve
                            “eventual consistency” through replication.

                  • Examples
                        • Cassandra (column/tabular)
                        • Dynamo (key-value)
                        • Voldemort (key-value)
                        • Tokyo Cabinet (key-value)
                        • CouchDB (document)
                        • SimpleDB (document)
                        • Riak (document)



Thursday, February 3, 2011
Consistent, Partition-Tolerant (CP)                          NewsCred


                              CP-systems have trouble with availability while
                             keeping data consistent across partitioned nodes.

                  • Examples
                        • MongoDB (document)
                        • BigTable (column/tabular)
                        • HyperTable (column/tabular)
                        • Hbase (column/tabular)
                        • Redis (key-value)
                        • Scalaris (key-value)
                        • MemcacheDB (key-value)



Thursday, February 3, 2011
Hbase                                                                 NewsCred


             Selling point:                                             A
             Billions of rows, millions of columns


             Use when you need:
             Random, real-time access to Big Data
                                                              C                  P

             Written in: Java
             License: Apache
             Type: Column/Tabular
             Protocol: HTTP/REST/Thrift              Users:
             Community Support: Good                 Yahoo!, Facebook, Microsoft, Adobe,
             Learning Curve: High                    StumbleUpon etc.



Thursday, February 3, 2011
Cassandra                                                              NewsCred


             Selling point:                                            A
             Best of Google BigTable and Amazon Dynamo


             Use when you need:
             To write more than you read (logging)
                                                              C                   P

             Written in: Java
             License: Apache
             Type: Column/Tabular
             Protocol: Custom, binary (Thrift)       Users:
             Community Support: Great                Facebook, Twitter, Digg, Reddit,
             Learning Curve: Medium                  Rackspace, Cisco, SimpleGeo, Cloudkick etc.



Thursday, February 3, 2011
Redis                                                                   NewsCred


             Selling point:                                              A
             Blazing fast, in-memory like memcached


             Use when you need:
             To manage rapidly changing data
                                                               C                     P

             Written in: C/C++
             License: BSD
             Type: Key-value
             Protocol: Telnet-like                    Users:
             Community Support: Good                  Github, Craigslist, Stackoverflow,
             Learning Curve: Low                      Disqus, The Guardian Uk etc.



Thursday, February 3, 2011
MongoDB                                                               NewsCred


             Selling point:                                           A
             Best of NoSQL and RDBMS


             Use when you need:
             Dynamic queries and indexing on a Big DB
                                                             C                   P

             Written in: C++
             License: AGPL
             Type: Document
             Protocol: Custom, binary (BSON)        Users:
             Community Support: Great               NewsCred, Foursquare, Github, Sourceforge,
             Learning Curve: Low                    The New York Times, Etsy, Shutterfly etc.



Thursday, February 3, 2011
Part 4
           Understanding MongoDB


                     NewsCred


Thursday, February 3, 2011
Understanding MongoDB            NewsCred



                • Database == Database
                • Table == Collection
                • Row == Document




Thursday, February 3, 2011
Understanding MongoDB   NewsCred



                • Mongo Shell




Thursday, February 3, 2011
Understanding MongoDB   NewsCred



                • INSERT




Thursday, February 3, 2011
Understanding MongoDB                                                   NewsCred



                • SELECT

                SELECT * FROM users WHERE X = 3 AND Y = 'abc';

                db.users.find({X:3, Y: ”abc”})



                SELECT * FROM users WHERE X = 3 AND Y = 'abc' ORDER BY X ASC;

                db.users.find({X:3, Y: ”abc”}).sort({X:1})



                SELECT username, email FROM users WHERE X = 3 AND Y = 'abc';

                db.users.find({X:3, Y: ”abc”}, {username:true, email:true})




Thursday, February 3, 2011
Understanding MongoDB                                                                 NewsCred



                • UPDATE
                db.collection.update(criteria, modifier, upsert, multi)


                criteria : Query which selects the record(s) to update
                modifier : $set, $inc, $unset, $push, $pop...
                upsert : Insert if not exists, update otherwise
                multi : Update multiple docs matching the criteria


                UPDATE users SET X = 4, Y = 'abc' WHERE username = 'joegunchy';

                db.users.update({username:”joegunchy”}, {$set: {X:4, Y:'abc'}}, true, true)




Thursday, February 3, 2011
Understanding MongoDB                                                              NewsCred



                • DELETE
                db.articles.remove({}) /*remove all*/

                db.articles.remove({tag:'sql'}) /*remove all articles with tag = 'sql'*/

                db.articles.remove({tag:'sql'}) /*block other ops while removing*/




Thursday, February 3, 2011
Understanding MongoDB                                          NewsCred



                • AGGREGATION
                > db.users.count()
                42

                > db.addresses.distinct('zipcode', {'city':'Dhaka'})
                [1000, 1100, 1204, 1205....]




Thursday, February 3, 2011
Understanding MongoDB                                                  NewsCred



                • Map/Reduce
                       • Algorithm introduced by Google for processing large
                             datasets on clusters



                • MongoDB uses it for:
                       • Aggregation (Group By, Avg, Sum etc.)
                       • Batch processing jobs




Thursday, February 3, 2011
Understanding MongoDB   NewsCred



                • Map/Reduce




Thursday, February 3, 2011
Understanding MongoDB                       NewsCred



                • Map/Reduce Example

                  Document




                  We want to do something like...




Thursday, February 3, 2011
Understanding MongoDB          NewsCred



                • Map/Reduce Example

                  Map




                  Reduce




Thursday, February 3, 2011
Understanding MongoDB          NewsCred



                • Map/Reduce Example

                  Execute




Thursday, February 3, 2011
Understanding MongoDB          NewsCred



                • Map/Reduce Example

                  Result




Thursday, February 3, 2011
Part 5
           Building a MongoDB App


                     NewsCred


Thursday, February 3, 2011
Part 6
           Scaling with MongoDB


                     NewsCred


Thursday, February 3, 2011
Scaling with MongoDB               NewsCred



                • Scaling is a challenge

                • No silver bullet

                • Strategies
                       • Replication
                       • Replica Sets
                       • Auto-sharding


Thursday, February 3, 2011
Scaling with MongoDB                               NewsCred


                                     Replication


                                         Master




                             Slave       Slave     Slave




Thursday, February 3, 2011
Scaling with MongoDB                            NewsCred


                                    Replica Sets


                                            Secondary




                             User
                                                        Passive




                                            Primary




Thursday, February 3, 2011
Scaling with MongoDB                                           NewsCred


                                 Replica Sets: Election
                                                                Synced,3ms,ago




                                                      C
                                                  Priority,1
                             A

                                                                Synced,1ms,ago




                                                     E
                                                  Priority,1
                                                   Priority 1




                             B

                                                      D
                                                  Priority,0




Thursday, February 3, 2011
Scaling with MongoDB                                                           NewsCred



                • Replica Sets: Network Partition
                             • Election Process initiated
                                 • When a node can’t reach primary
                                 • When primary can’t reach majority of nodes in set

                             • New primary is elected by majority of nodes in set

                             • Node with the most recent data gets priority

                             • Arbiter node used to break ties




Thursday, February 3, 2011
Scaling with MongoDB                                                     NewsCred



                • Auto-sharding
                             • Cluster handles sharding data and rebalancing
                               automatically

                             • No administrative headaches of manual sharding

                             • Application is oblivious to existence of shards




Thursday, February 3, 2011
Scaling with MongoDB                                  NewsCred


                                              Auto-sharding




                             Big$Collection




Thursday, February 3, 2011
Scaling with MongoDB                 NewsCred


                             Auto-sharding

                                   User




                                 Router)




Thursday, February 3, 2011
Scaling with MongoDB                                      NewsCred


                                             Auto-sharding

               • Connect to a single server
                        • db = connect(‘localhost:27017’)

               • Connect to a router
                        • db = connect(‘localhost:27017’)



                                      User

                                                             Mongo)DB




Thursday, February 3, 2011
Scaling with MongoDB                                                NewsCred



                • When to shard?
                             • Running out of disk space
                             • Write intensive
                             • Need to keep large chunk of data in memory


                • Don’t start out with a sharded collection!

                • Shard “if and when” you need to



Thursday, February 3, 2011
Scaling with MongoDB                                                      NewsCred



                • Choosing a Shard Key
                             • Incremental
                                • Example: timestamps i.e. ‘created_at’
                                • Queries on shard key is highly efficient

                             • Random
                                • Example: ‘username’
                                • Writes are distributed across multiple shards




Thursday, February 3, 2011
Scaling with MongoDB                                  NewsCred


                             Sharding + Replica Sets

                                           User




                                         Router




                                 P                    P




                             S       S            S       S




Thursday, February 3, 2011
Questions?                                 NewsCred




                Iraj Islam
                iraj@newscred.com, @irajislam


                Rubayeet Islam
                rubayeet@newscred.com, @rubayeet


                Nurul Ferdous
                nurul@newscred.com, @ferdous




Thursday, February 3, 2011

More Related Content

PDF
Tagging and Folksonomy Schema Design for Scalability and Performance
PPTX
初心者向けMongoDBのキホン!
PDF
Updating Your Website to Drupal 7
PDF
MySQL DW Breakfast
PDF
Authorities as Linked Data Hubs
PDF
Better front-end development in Atlassian plugins
PDF
Django and Neo4j - Domain modeling that kicks ass
PDF
Scaling MySQL writes through Partitioning - IPC Spring Edition
Tagging and Folksonomy Schema Design for Scalability and Performance
初心者向けMongoDBのキホン!
Updating Your Website to Drupal 7
MySQL DW Breakfast
Authorities as Linked Data Hubs
Better front-end development in Atlassian plugins
Django and Neo4j - Domain modeling that kicks ass
Scaling MySQL writes through Partitioning - IPC Spring Edition

Similar to NoSQL! is it for you? (20)

PDF
MySQL & NoSQL from a PHP Perspective
PDF
Community Code: Xero
PDF
Ontotext Overview Winter 2012
PDF
2011 The Year of Web apps
PDF
Node js techtalksto
PDF
Web micro-framework BATTLE!
PDF
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
PDF
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
PDF
Publishing linked data from relational databases
PDF
NoSQL and SQL Databases
PPTX
Linq to sql
PDF
Big Data Israel Meetup : Couchbase and Big Data
PDF
Flowdock's full-text search with MongoDB
PDF
Database Management in Different Applications of IOT
PDF
JavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
PDF
Nuxeo introduction to ecr at the NYC Java meetup, April 2011
PDF
Froscon2011: How i learned to use sql and then learned not to use it
PDF
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
KEY
NoSQL databases and managing big data
PDF
Architecting for failure
MySQL & NoSQL from a PHP Perspective
Community Code: Xero
Ontotext Overview Winter 2012
2011 The Year of Web apps
Node js techtalksto
Web micro-framework BATTLE!
LinkedOpenDataItalia@LAPSI-Primer-Milano-2011
LinkedOpenDataItalia@LAPSI-Primer-Milan-2011
Publishing linked data from relational databases
NoSQL and SQL Databases
Linq to sql
Big Data Israel Meetup : Couchbase and Big Data
Flowdock's full-text search with MongoDB
Database Management in Different Applications of IOT
JavaScript as a Server side language (NodeJS): JSConf 2011, Dhaka
Nuxeo introduction to ecr at the NYC Java meetup, April 2011
Froscon2011: How i learned to use sql and then learned not to use it
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NoSQL databases and managing big data
Architecting for failure
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Human Computer Interaction Miterm Lesson
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
substrate PowerPoint Presentation basic one
PDF
Examining Bias in AI Generated News Content.pdf
PDF
CEH Module 2 Footprinting CEH V13, concepts
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
Decision Optimization - From Theory to Practice
PDF
SaaS reusability assessment using machine learning techniques
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PPTX
Internet of Everything -Basic concepts details
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
The AI Revolution in Customer Service - 2025
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
NewMind AI Weekly Chronicles – August ’25 Week IV
Connector Corner: Transform Unstructured Documents with Agentic Automation
giants, standing on the shoulders of - by Daniel Stenberg
Module 1 Introduction to Web Programming .pptx
Rapid Prototyping: A lecture on prototyping techniques for interface design
Human Computer Interaction Miterm Lesson
A symptom-driven medical diagnosis support model based on machine learning te...
Ensemble model-based arrhythmia classification with local interpretable model...
substrate PowerPoint Presentation basic one
Examining Bias in AI Generated News Content.pdf
CEH Module 2 Footprinting CEH V13, concepts
Presentation - Principles of Instructional Design.pptx
Decision Optimization - From Theory to Practice
SaaS reusability assessment using machine learning techniques
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Internet of Everything -Basic concepts details
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Early detection and classification of bone marrow changes in lumbar vertebrae...
The AI Revolution in Customer Service - 2025
SGT Report The Beast Plan and Cyberphysical Systems of Control
Ad

NoSQL! is it for you?

  • 1. NoSQL What it is and is it for you? Iraj Islam Rubayeet Islam Nurul Ferdous NewsCred Thursday, February 3, 2011
  • 2. Agenda NewsCred • Part 1. Why NoSQL? • Part 2. NoSQL Use Cases • Part 3. Choosing a NoSQL Solution • Part 4. Understanding MongoDB • Part 5. Building a MongoDB App • Part 6. Scaling MongoDB • Questions Thursday, February 3, 2011
  • 3. Who We Are NewsCred Iraj Islam CTO/Co-founder, NewsCred Rubayeet Islam Senior Software Engineer, NewsCred Nurul Ferdous Senior Software Engineer, NewsCred Thursday, February 3, 2011
  • 4. Our Story NewsCred Launched 2008 Founded by two Bangladeshis 2008 Funded By Investors of Twitter Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn) Top-tier Clients Yahoo! Orange Telecom, Harvard U, The Daily Star etc. Thursday, February 3, 2011
  • 5. What We Do NewsCred Domain Expertise • Big Data • Information Retrieval • Machine Learning • Semantic Web Technologies • Apache Solr • MySQL/MongoDB • Python/Java Thursday, February 3, 2011
  • 6. Part 1 Why NoSQL? NewsCred Thursday, February 3, 2011
  • 7. What’s NoSQL? NewsCred NoSQL What’s with the weird name? Thursday, February 3, 2011
  • 8. What’s NoSQL? NewsCred NoSQL Non-relational, web-scale database. Thursday, February 3, 2011
  • 9. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Thursday, February 3, 2011
  • 10. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Textual Content Thursday, February 3, 2011
  • 11. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Small Data Textual Content Thursday, February 3, 2011
  • 12. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Thursday, February 3, 2011
  • 13. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Search Thursday, February 3, 2011
  • 14. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Personal Computer Search Thursday, February 3, 2011
  • 15. Why NoSQL? NewsCred The Age of Big Data Exabytes (1018) of data stored per year 1000 750 500 250 2006 2007 2008 0 2009 2010 Thursday, February 3, 2011
  • 16. Why NoSQL? NewsCred Web 2.0+ The write intensive web Thursday, February 3, 2011
  • 17. Why NoSQL? NewsCred Web 2.0+ The write intensive web User-generated Content Thursday, February 3, 2011
  • 18. Why NoSQL? NewsCred Web 2.0+ The write intensive web Big Data User-generated Content Thursday, February 3, 2011
  • 19. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data User-generated Content Thursday, February 3, 2011
  • 20. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data Semantic Web User-generated Content Thursday, February 3, 2011
  • 21. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web User-generated Content Thursday, February 3, 2011
  • 22. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere. Thursday, February 3, 2011
  • 23. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Data Source Writing MySQL User Reading Thursday, February 3, 2011
  • 24. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Bottleneck, too much load! Data Source Writing MySQL User Reading Thursday, February 3, 2011
  • 25. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Thursday, February 3, 2011
  • 26. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads! Thursday, February 3, 2011
  • 27. Why NoSQL? NewsCred The MySQL Problem 2. Replication Bottleneck, writes won’t scale! Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads! Thursday, February 3, 2011
  • 28. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Data Source Writing S MySQL User Reading S Thursday, February 3, 2011
  • 29. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Thursday, February 3, 2011
  • 30. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Development and maintenance costs just skyrocketed! Thursday, February 3, 2011
  • 31. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere. Thursday, February 3, 2011
  • 32. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Thursday, February 3, 2011
  • 33. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Thursday, February 3, 2011
  • 34. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Thursday, February 3, 2011
  • 35. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Ubiquity >> High-availability Thursday, February 3, 2011
  • 36. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability Thursday, February 3, 2011
  • 37. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systems Thursday, February 3, 2011
  • 38. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systems Thursday, February 3, 2011
  • 39. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systems Thursday, February 3, 2011
  • 40. Part 2 NoSQL Use Cases NewsCred Thursday, February 3, 2011
  • 41. Who’s Using NoSQL? NewsCred Thursday, February 3, 2011
  • 42. NoSQL Use Cases NewsCred • Consumer Use Cases • Facebook • Twitter • NetFlix • Enterprise Use Cases • Rackspace • TrendMicro • NewsCred Thursday, February 3, 2011
  • 43. NoSQL Use Cases NewsCred • Facebook • Hbase - Facebook messages • Scribe - Real-time click logs • Hive - SQL queries -> MapReduce jobs • Hadoop • Web analytics warehouse • Distributed datastore • MySQL backups Thursday, February 3, 2011
  • 44. NoSQL Use Cases NewsCred • Twitter • Hadoop - Analytics • Hbase - People search • Scribe - Log collection framework • FlockDB - Social graph analysis Thursday, February 3, 2011
  • 45. NoSQL Use Cases NewsCred • Rackspace • Cassandra – stat collection, mail and apps • TrendMicro • Hbase & Hadoop – reputation databases • NewsCred • MongoDB • API usage analytics • Pixel tracking analytics • Entity metadata storage Thursday, February 3, 2011
  • 46. Demo NewsCred API Analytics NewsCred Thursday, February 3, 2011
  • 47. Part 3 Choosing a NoSQL Solution NewsCred Thursday, February 3, 2011
  • 48. Choosing a NoSQL Solution NewsCred Availability Each:client:can:always:read:and:write A RDBMSs Cassandra MySQL: Voldemort PostgreSQL CouchDB Aster:Data CA AP Dynamo GreenPlum SimpleDB Vertica Tokyo:Cabinet Riak C P PartitionDtolerance: Consistency CP All:clients:have:the:same:view:of: The:system:works:well:despite: the:data BigTable Scalaris physical:network:partitions HyperTable Berkeley:DB Hbase Memcache:DB MongoDB Redis Thursday, February 3, 2011
  • 49. Consistent, Available (CA) NewsCred CA-systems have trouble with partitions and deal with it with replication. • Examples • MySQL (relational) • Aster Data (relational) • Greenplum (relational) • Vertica (column) Thursday, February 3, 2011
  • 50. Availability, Partition-Tolerant (AP) NewsCred AP-systems have trouble with consistency, achieve “eventual consistency” through replication. • Examples • Cassandra (column/tabular) • Dynamo (key-value) • Voldemort (key-value) • Tokyo Cabinet (key-value) • CouchDB (document) • SimpleDB (document) • Riak (document) Thursday, February 3, 2011
  • 51. Consistent, Partition-Tolerant (CP) NewsCred CP-systems have trouble with availability while keeping data consistent across partitioned nodes. • Examples • MongoDB (document) • BigTable (column/tabular) • HyperTable (column/tabular) • Hbase (column/tabular) • Redis (key-value) • Scalaris (key-value) • MemcacheDB (key-value) Thursday, February 3, 2011
  • 52. Hbase NewsCred Selling point: A Billions of rows, millions of columns Use when you need: Random, real-time access to Big Data C P Written in: Java License: Apache Type: Column/Tabular Protocol: HTTP/REST/Thrift Users: Community Support: Good Yahoo!, Facebook, Microsoft, Adobe, Learning Curve: High StumbleUpon etc. Thursday, February 3, 2011
  • 53. Cassandra NewsCred Selling point: A Best of Google BigTable and Amazon Dynamo Use when you need: To write more than you read (logging) C P Written in: Java License: Apache Type: Column/Tabular Protocol: Custom, binary (Thrift) Users: Community Support: Great Facebook, Twitter, Digg, Reddit, Learning Curve: Medium Rackspace, Cisco, SimpleGeo, Cloudkick etc. Thursday, February 3, 2011
  • 54. Redis NewsCred Selling point: A Blazing fast, in-memory like memcached Use when you need: To manage rapidly changing data C P Written in: C/C++ License: BSD Type: Key-value Protocol: Telnet-like Users: Community Support: Good Github, Craigslist, Stackoverflow, Learning Curve: Low Disqus, The Guardian Uk etc. Thursday, February 3, 2011
  • 55. MongoDB NewsCred Selling point: A Best of NoSQL and RDBMS Use when you need: Dynamic queries and indexing on a Big DB C P Written in: C++ License: AGPL Type: Document Protocol: Custom, binary (BSON) Users: Community Support: Great NewsCred, Foursquare, Github, Sourceforge, Learning Curve: Low The New York Times, Etsy, Shutterfly etc. Thursday, February 3, 2011
  • 56. Part 4 Understanding MongoDB NewsCred Thursday, February 3, 2011
  • 57. Understanding MongoDB NewsCred • Database == Database • Table == Collection • Row == Document Thursday, February 3, 2011
  • 58. Understanding MongoDB NewsCred • Mongo Shell Thursday, February 3, 2011
  • 59. Understanding MongoDB NewsCred • INSERT Thursday, February 3, 2011
  • 60. Understanding MongoDB NewsCred • SELECT SELECT * FROM users WHERE X = 3 AND Y = 'abc'; db.users.find({X:3, Y: ”abc”}) SELECT * FROM users WHERE X = 3 AND Y = 'abc' ORDER BY X ASC; db.users.find({X:3, Y: ”abc”}).sort({X:1}) SELECT username, email FROM users WHERE X = 3 AND Y = 'abc'; db.users.find({X:3, Y: ”abc”}, {username:true, email:true}) Thursday, February 3, 2011
  • 61. Understanding MongoDB NewsCred • UPDATE db.collection.update(criteria, modifier, upsert, multi) criteria : Query which selects the record(s) to update modifier : $set, $inc, $unset, $push, $pop... upsert : Insert if not exists, update otherwise multi : Update multiple docs matching the criteria UPDATE users SET X = 4, Y = 'abc' WHERE username = 'joegunchy'; db.users.update({username:”joegunchy”}, {$set: {X:4, Y:'abc'}}, true, true) Thursday, February 3, 2011
  • 62. Understanding MongoDB NewsCred • DELETE db.articles.remove({}) /*remove all*/ db.articles.remove({tag:'sql'}) /*remove all articles with tag = 'sql'*/ db.articles.remove({tag:'sql'}) /*block other ops while removing*/ Thursday, February 3, 2011
  • 63. Understanding MongoDB NewsCred • AGGREGATION > db.users.count() 42 > db.addresses.distinct('zipcode', {'city':'Dhaka'}) [1000, 1100, 1204, 1205....] Thursday, February 3, 2011
  • 64. Understanding MongoDB NewsCred • Map/Reduce • Algorithm introduced by Google for processing large datasets on clusters • MongoDB uses it for: • Aggregation (Group By, Avg, Sum etc.) • Batch processing jobs Thursday, February 3, 2011
  • 65. Understanding MongoDB NewsCred • Map/Reduce Thursday, February 3, 2011
  • 66. Understanding MongoDB NewsCred • Map/Reduce Example Document We want to do something like... Thursday, February 3, 2011
  • 67. Understanding MongoDB NewsCred • Map/Reduce Example Map Reduce Thursday, February 3, 2011
  • 68. Understanding MongoDB NewsCred • Map/Reduce Example Execute Thursday, February 3, 2011
  • 69. Understanding MongoDB NewsCred • Map/Reduce Example Result Thursday, February 3, 2011
  • 70. Part 5 Building a MongoDB App NewsCred Thursday, February 3, 2011
  • 71. Part 6 Scaling with MongoDB NewsCred Thursday, February 3, 2011
  • 72. Scaling with MongoDB NewsCred • Scaling is a challenge • No silver bullet • Strategies • Replication • Replica Sets • Auto-sharding Thursday, February 3, 2011
  • 73. Scaling with MongoDB NewsCred Replication Master Slave Slave Slave Thursday, February 3, 2011
  • 74. Scaling with MongoDB NewsCred Replica Sets Secondary User Passive Primary Thursday, February 3, 2011
  • 75. Scaling with MongoDB NewsCred Replica Sets: Election Synced,3ms,ago C Priority,1 A Synced,1ms,ago E Priority,1 Priority 1 B D Priority,0 Thursday, February 3, 2011
  • 76. Scaling with MongoDB NewsCred • Replica Sets: Network Partition • Election Process initiated • When a node can’t reach primary • When primary can’t reach majority of nodes in set • New primary is elected by majority of nodes in set • Node with the most recent data gets priority • Arbiter node used to break ties Thursday, February 3, 2011
  • 77. Scaling with MongoDB NewsCred • Auto-sharding • Cluster handles sharding data and rebalancing automatically • No administrative headaches of manual sharding • Application is oblivious to existence of shards Thursday, February 3, 2011
  • 78. Scaling with MongoDB NewsCred Auto-sharding Big$Collection Thursday, February 3, 2011
  • 79. Scaling with MongoDB NewsCred Auto-sharding User Router) Thursday, February 3, 2011
  • 80. Scaling with MongoDB NewsCred Auto-sharding • Connect to a single server • db = connect(‘localhost:27017’) • Connect to a router • db = connect(‘localhost:27017’) User Mongo)DB Thursday, February 3, 2011
  • 81. Scaling with MongoDB NewsCred • When to shard? • Running out of disk space • Write intensive • Need to keep large chunk of data in memory • Don’t start out with a sharded collection! • Shard “if and when” you need to Thursday, February 3, 2011
  • 82. Scaling with MongoDB NewsCred • Choosing a Shard Key • Incremental • Example: timestamps i.e. ‘created_at’ • Queries on shard key is highly efficient • Random • Example: ‘username’ • Writes are distributed across multiple shards Thursday, February 3, 2011
  • 83. Scaling with MongoDB NewsCred Sharding + Replica Sets User Router P P S S S S Thursday, February 3, 2011
  • 84. Questions? NewsCred Iraj Islam [email protected], @irajislam Rubayeet Islam [email protected], @rubayeet Nurul Ferdous [email protected], @ferdous Thursday, February 3, 2011