0% found this document useful (0 votes)

16 views

10 NoSQL Databases - HBase Hive Cassandra

The document discusses NoSQL databases and provides details about HBase. It defines what NoSQL is and explains why NoSQL databases were created. It then discusses the CAP theorem and how it relates to ACID and BASE properties. The remainder of the document focuses specifically on HBase, describing its data model, architecture, and how data is stored in HFiles on HDFS.

Uploaded by

Neeraj Garg

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

10 NoSQL Databases - HBase Hive Cassandra

Uploaded by

Neeraj Garg

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 74

NO SQL DATABASES

HBASE CASSANDRA HIVE

Dr. Emmanuel S. Pilli
Malaviya NIT Jaipur
What is NoSQL?
• Stands for Not Only SQL
• Class of non-relational data storage systems
• Usually do not require a fixed table schema nor
do they use the concept of joins
• All NoSQL offerings relax one or more of the
ACID properties (next … CAP theorem)
Why NoSQL?
• For data storage, an RDBMS cannot be the
be-all / end-all
• Just as there are different programming
languages, need to have other data storage tools
in the toolbox
• A NoSQL solution is more acceptable to a client
now than even a year ago
The CAP Theorem

Availability
Consistency

Partition
tolerance
The CAP Theorem
Once a writer has written, all readers
will see that write

Availability

Consistency

Partition
tolerance
The CAP Theorem
System is available during software
and hardware upgrades and node
failures.
Availability

Consistency

Partition
tolerance
The CAP Theorem
A system can continue to operate in
the presence of a network partition
failures.
Availability

Consistency

Partition
tolerance
The CAP Theorem
Theorem: You can have at
most two of these
Availability properties for any shared-
data system
Consistency

Partition ACID vs BASE

tolerance
ACID
 Atomic
► All operations in a transaction succeed or every operation
is rolled back.
 Consistent
► On the completion of a transaction, the database is
structurally sound.
 Isolated
► Transactions do not contend with one another.
Contentious access to data is moderated by the database
so that transactions appear to run sequentially.
 Durable
► The results of applying a transaction are permanent,
even in the presence of failures.
BASE
 Basic Availability
► The database appears to work most of the time.
 Soft-state
► Stores don’t have to be write-consistent, nor do different
replicas have to be mutually consistent all the time.
 Eventual consistency
► Stores exhibit consistency at some later point (e.g., lazily
at read time).
CAP theorem
NoSQL Options – Key-Value Stores
• Extremely simple interface
• Data model: (key, value) pairs
• Operations: Insert(key,value), Fetch(key),
Update(key), Delete(key)
• Some allow (non-uniform) columns within value
• Some allow Fetch on range of keys
• Examples
• Redis, Voldormort
• Memcached
• and 100s more
NoSQL Options – Document Stores
►Like Key-Value Stores except value is document
• Data model: (key, document) pairs
• Document: JSON, XML, other semistructured formats
• Basic operations: Insert(key,document), Fetch(key),
• Update(key), Delete(key)
• Also Fetch based on document contents
►Example systems
• CouchDB, MongoDB, SimpleDB, …
NoSQL Options – Column Stores
• Multi-dimensional map
• Not all entries are relevant each time
• Column families
• Examples
• Cassandra
• Hbase
• Amazon SimpleDB
NoSQL Options – Graph Stores
• Data model: nodes and edges
• Nodes may have properties (including ID)
• Edges may have labels or roles
 Relational DBs can model graphs, but an edge
requires a join which is expensive
• Example Neo4j FlockDB, Titan, Pregel, …
• RDF “triple stores” can map to graph databases
NoSQL Database Examples
• MongoDB Open-source document database.
• CouchDB Database that uses JSON for documents,
JavaScript for MapReduce queries, and regular HTTP for
an API.
• GemFire Distributed data management platform
providing dynamic scalability, high performance, and
database-like persistence.
• Redis Data structure server wherein keys can contain
strings, hashes, lists, sets, and sorted sets.
• Cassandra Database that provides scalability and high
availability without compromising performance.
NoSQL Database Examples
• Memcached Open source high-performance,
distributed-memory, and object-caching system.
• Hazelcast Open source highly scalable data distribution
platform.
• HBase Hadoop database, a distributed and scalable big
data store.
• Mnesia Distributed database management system that
exhibits soft real-time properties.
• Neo4j Open source high-performance, enterprise-grade
graph database.
HBASE
Dr. Emmanuel S. Pilli
Malaviya NIT Jaipur
It all started when ..
• Google published its paper on BigTable.
• The paper put forward a new way of storing and
retrieving data
• A proven architecture as Google has been using
it for many of their successful applications
• Primarily geared for web scale data storage and
lookups
Why not an RDBMs for such web scale data?
• RDBMs tend to scale less better at large data sizes
• ~500GB (both read and write)

• Some RDBMs support sharding but …

• Data rearrangement (Resharding) becomes a problem
• If the size of shards gets imbalanced
• Requires a hashing function to push data into shards
• Code for access becomes complex
• First find the appropriate shard to locate your data
• Pretty much rigid schemas
• Why do we need dynamic schemas
• Sparse nature of web scale data

• Sharding is a type of database partitioning that separates very large

databases the into smaller, faster, more easily managed parts called
data shards. The word shard means a small part of a whole.
HBase
• Column-Oriented data store, known as “Hadoop
Database”
• Distributed – designed to serve large tables
• Billions of rows and millions of columns
• Supports random real-time CRUD operations (unlike
HDFS) - create, read, update, and delete
• Runs on a cluster of commodity hardware
• Server hardware, not laptop/desktops
• Open-source, written in Java, Part of the Apache
Hadoop ecosystem
• Type of “NoSQL” DB
• Does not provide a SQL based access
• Does not adhere to Relational Model for storage
• Simple data model
• Dynamic control over data layout and format
Traditional RDBMS and HBase
HBase Data Model

► Map<byte[], Map<byte[], Map<byte[], Map<Long, byte[]>>>>

HBase Data Model
• Data is stored in Tables
• Tables contain rows
• Rows are referenced by a unique key
• Key is an array of bytes – good news
• Anything can be a key: string, long and your own
serialized data structures
• Rows made of columns which are grouped in column
families
• Data is stored in cells
• Identified by row x column-family x column
• Cell's content is also an array of bytes
HBase Data Model
• Rows are grouped into families
• Labeled as “family:column”
• Example “user:first_name”
• A way to organize your data
• Various features are applied to families
• Compression
• In-memory option
• Stored together - in a file called HFile/StoreFile

• Family definitions are static

• Created with table, should be rarely added and changed
• Limited to small number of families
HBase Families
• Family name must be composed of printable
characters
• Not bytes, unlike keys and values
• Think of family:column as a tag for a cell value and NOT
as a spreadsheet
• Columns on the other hand are NOT static
• Create new columns at run-time
• Can scale to millions for a family
HBase timestamps
• Cells' values are versioned
• For each cell multiple versions are kept
• 3 by default
• Another dimension to identify your data
• Either explicitly timestamped by region server or
provided by the client
• Versions are stored in decreasing timestamp order
• Read the latest first – optimization to read the current
value
• You can specify how many versions are kept
HBase Row Keys
• Rows are sorted lexicographically by key
• Compared on a binary level from left to right
• For example keys 1,2,3,10,15 will get sorted as
• 1, 10, 15, 2, 3
• Somewhat similar to Relational DB primary index
• Always unique
• Some but minimal secondary indexes support
HBase Architecture
• A HBase table is made up of regions
• Each region is made up of start row key and end row key.
• Table = SUM of regions
• Region = (tablename, startkey, endkey)
• Each region may live on a different node
• Each region is made up of HDFS blocks and files
• Point where HBase falls back on Hadoop
• These building blocks of storage are replicated by
Hadoop infrastructure (replication settings configurable)
Rows distribution in region servers
HBase Architecture – Contd ..
• HBase Nodes are of two types
• RegionServer
• The actual node where the data is stored
• A Region server can hold more than one table.
• Master
• Manage Region Servers
• Load balancing of the Regions
• Move the row keys as data is getting inserted so that Regions are
almost equally balanced.
• This aspect is what gives HBase advantage over RDBMs sharding
approaches.

• Special tables exist inside HBase

• -ROOT-
• Stores the schema information
• .META.
• Stores the Region Servers information
HBase Architecture Implementation
• HBase Master
• Administration of RegionServers

• HRegionServer
• Write Requests
• Read Requests
• Cache Flushes
• Compactions
• Region Splits

• HBase Client
• Caching for region lookups
Data Storage
• Data is stored in files called HFiles/StoreFiles
• Usually saved in HDFS
• HFile is basically a key-value map
• Keys are sorted lexicographically
• When data is added it's written to a log called
Write Ahead Log (WAL) and is also stored in memory
(memstore)
• Flush: when in-memory data exceeds maximum value it is
flushed to an HFile
• Data persisted to HFile can then be removed from WAL
• Region Server continues serving read-writes during the flush
operations, writing values to the WAL and memstore
Data Storage
• HDFS doesn't support updates to an existing file
therefore HFiles are immutable
• Cannot remove key-values out of HFile(s)
• Over time more and more HFiles are created
• Delete marker is saved to indicate that a record
was removed
• These markers are used to filter the data - to “hide” the
deleted records
• At runtime, data is merged between the content of the
HFile and WAL
Data Storage
• To control the number of HFiles and to keep
cluster well balanced HBase periodically
performs data compactions
• Minor Compaction: Smaller HFiles are merged into
larger HFiles (n-way merge)
• Fast - Data is already sorted within files
• Delete markers are not applied
• Major Compaction:
• For each region merges all the files within a column-family into
a single file
• Scan all the entries and apply all the deletes as necessary
HBase – Data Access
• HBase Shell
• list, get,put, disable, drop,alter,count,describe,scan etc
• Java Client API
• Table API
• Client API for data access, MapReduce
• Thrift Server
• Thrift compiler, Thrift Server and Thrift client
• REST API
• Stargate Servlet
• Avro Server
• Apache Avro is also a cross-language schema compiler
• http://avro.apache.org
• Requires running Avro Server
• HBql
• SQL like syntax for HBase
• http://www.hbql.com
HBase Map Reduce constructs
When to use HBase
 Use HBase if…
– You need random write, random read, or both (but
not neither)
– You need to do many thousands of operations per
second on multiple TB of data
– Your access patterns are well-known and simple

 Don’t use HBase if…

– You only append to your dataset, and tend to read
the whole thing
– You primarily do ad-hoc analytics (ill-defined access
patterns)
– Your data easily fits on one beefy node
References
• An Excellent blog on HBase Architecture
• http://www.larsgeorge.com
• HBase Wiki
• http://wiki.apache.org/hadoop/Hbase
• Some presentations made on HBase
• http://wiki.apache.org/hadoop/HBase/HBasePresentatio
ns
CASSANDRA
Dr. Emmanuel S. Pilli
Malaviya NIT Jaipur
The History of Cassandra
Why Use Cassandra?
Cassandra Characteristics…
Column Oriented
Schema Free
Cassandra Use Case - Summary
What is Apache Cassandra?
Apache Cassandra is an open source, distributed,
decentralized, elastically scalable, highly available, fault-
tolerant, Tuneably consistent, column-oriented
database.
Distributed and Decentralized
Distributed and Decentralized
Elastic Scalability
High Avalability and Fault Tolerance
Tunable Consistency

Cassandra enables us to tune the Consistency based

on Application Requirement.
High Performance
Cassandra was designed specifically from the ground up to
take full advantage of multiprocessor / multicore machines
and to run across many dozens of these machines housed
in multiple data centres.

It scales consistently and semlessly to hundreds of

terabytes.
Shows exceptional performance under heavy loads.
Consistently shows very fast throughput for writes per
seconds on a basic commodity workstation.
Where to Use Cassandra?
Use if your application has:
 Big Data (Billions of Records Rows & Columns)
 Very High Velocity Random Reads & Writes
 No Multiple Secondary Index Needs
 Low Latency

Use Cases:
 eCommerce Inventory Cache Use Cases
 Time Series / Events Use Cases
 Feed Based Activity / Use Cases
Where NOT to Use Cassandra?
Don’t Use if your application has:
 Secondary Indexes.
 Relational Data.
 Transactional (Rollback, Commit)
 Primary & Financial Records.
 Stringent Security & Authorization Needs On Data.
 Dynamic Queries on Columns.
 Searching Column Data.
 Low Latency.
HIVE
Dr. Emmanuel S. Pilli
Asst. Professor, CSE, MNIT Jaipur
58

Why Another Data Warehousing System?

• Problem : Data, data and more data
• Several TBs of data everyday
• The Hadoop Experiment:
• Uses Hadoop File System (HDFS)
• Scalable/Available
• Problem
• Lacked Expressiveness
• Map-Reduce hard to program
• Solution : HIVE
59

What is Hive?

• A system for managing and querying

unstructured data as if it were structured
• Uses Map-Reduce for execution
• HDFS for Storage

• Key Building Principles

• SQL as a familiar data warehousing tool
• Extensibility (Pluggable map/reduce scripts in the language of
your choice, Rich and User Defined Data Types, User Defined
Functions)
• Interoperability (Extensible Framework to support different file
and data formats)
• Performance
60

SQL vs HiveQL
61

SQL vs HiveQL
HiveQL: Type System
• Primitive types
– Integers:TINYINT, SMALLINT, INT, BIGINT.
– Boolean: BOOLEAN.
– Floating point numbers: FLOAT, DOUBLE .
– String: STRING.
• Complex types
– Structs: {a INT; b INT}.
– Maps: M['group'].
– Arrays: ['a', 'b', 'c'], A[1] returns 'b‘
• Functions
► SHOW functions
► DESCRIBE FUNCTION funname
63

Hive Data Model: Tables

• Managed Tables:
CREATE TABLE managed_table (dummy STRING);
• Hive manages the data
• Moves files to warehouse directory [During LOAD
operation]
• External Tables
CREATE EXTERNAL TABLE external_table (dummy
STRING);
64

Hive Data Model: Partitions

• Give extra structure to the data
• Useful for more efficient queries.
CREATE TABLE logs (ts BIGINT, line STRING)
PARTITIONED BY (dt STRING, country STRING);

LOAD DATA LOCAL INPATH 'input/hive/partitions/file1'

INTO TABLE logs
PARTITION (dt='2001-01-01', country='GB');

/user/hive/warehouse/logs/dt=2010-01-01/country=GB/file1
/file2
/country=US/file3
/dt=2010-01-02/country=GB/file4
/country=US/file5
/file6
65

Hive Data Model: Buckets

• To enable more efficient queries
• JOIN queries
• To make sampling more efficient
CREATE TABLE bucketed users (id INT, name STRING)
CLUSTERED BY (id) INTO 4 BUCKETS;
Examples – DDL Operations
CREATE TABLE sample (foo INT, bar STRING)
PARTITIONED BY (ds STRING);
SHOW TABLES '.*s';
DESCRIBE sample;
ALTER TABLE sample ADD COLUMNS (new_col
INT);
DROP TABLE sample;
Examples – DML Operations
LOAD DATA LOCAL INPATH './sample.txt'
OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24');

LOAD DATA INPATH '/user/falvariz/hive/sample.txt' OVERWRITE

INTO TABLE sample PARTITION (ds='2012-02-24');
68

Importing data
• INSERT OVERWRITE TABLE
INSERT OVERWRITE TABLE target
SELECT col1, col2
FROM source;

• Multitable Insert
FROM records2
INSERT OVERWRITE TABLE stations_by_year
SELECT year, COUNT(DISTINCT station)
GROUP BY year
INSERT OVERWRITE TABLE records_by_year
SELECT year, COUNT(1)
GROUP BY year

• Create Table as Select

CREATE TABLE target
AS
SELECT col1, col2
FROM source;
69

Querying data
• SELECT
SELECT foo FROM sample WHERE ds='2012-02-24‘;

• Sorting and Aggregating

FROM records2
SELECT year, temperature
DISTRIBUTE BY year
SORT BY year ASC, temperature DESC;
• JOINS:
• Inner Joins
SELECT sales.*, things.*
FROM sales JOIN things ON (sales.id = things.id);
• Outer Join
70

Joins…
• Semi joins
SELECT *
FROM things
WHERE things.id IN (SELECT id from sales);
We can rewrite it as follows:
Can be written as…
SELECT *
FROM things LEFT SEMI JOIN sales ON (sales.id = things.id);
• Map joins
• If one table is small enough to fit in memory
SELECT /*+ MAPJOIN(things) */ sales.*, things.*
FROM sales JOIN things ON (sales.id = things.id);
Performance -Result
System Architecture and Components
References:
• Hadoop: The Definitive Guide
Tom White (Author)
O'Reilly Media; 3rd Edition (May6, 2012)
• Programming Hive
Edward Capriolo, Dean Wampler,
Jason Rutherglen (Authors)
O'Reilly Media; 1 edition (October, 2012)
Any Questions and Thanks

Apple LaserWriter II Service Source
No ratings yet
Apple LaserWriter II Service Source
225 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
HBase
No ratings yet
HBase
36 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
C7 Hbase
No ratings yet
C7 Hbase
36 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
Hadoop HBase Notes-Abhijit-Nagargoje
No ratings yet
Hadoop HBase Notes-Abhijit-Nagargoje
24 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Apache HBase PPT
No ratings yet
Apache HBase PPT
12 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
DBMS Unit3
No ratings yet
DBMS Unit3
28 pages
Lecture10 HBase
No ratings yet
Lecture10 HBase
70 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Unit 2
No ratings yet
Unit 2
26 pages
NoSql-Unit-2
No ratings yet
NoSql-Unit-2
72 pages
HBase
No ratings yet
HBase
31 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
NO-SQL
No ratings yet
NO-SQL
32 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
10_HBase
No ratings yet
10_HBase
13 pages
bigdata-chap3 notes
No ratings yet
bigdata-chap3 notes
11 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
HBASE
No ratings yet
HBASE
35 pages
lec18
No ratings yet
lec18
21 pages
cp5293 Big Data Analytics Unit 5 PDF
No ratings yet
cp5293 Big Data Analytics Unit 5 PDF
28 pages
Hbase
No ratings yet
Hbase
13 pages
HBase
No ratings yet
HBase
6 pages
9 HBase
No ratings yet
9 HBase
77 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Lesson1 Video1 Transcript
No ratings yet
Lesson1 Video1 Transcript
2 pages
DB Unit-4
No ratings yet
DB Unit-4
15 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
lec18
No ratings yet
lec18
18 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Module 3
No ratings yet
Module 3
39 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Hadoop and HBase
No ratings yet
Hadoop and HBase
31 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
BDA_answers[1]
No ratings yet
BDA_answers[1]
6 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Unit5_Notes_Short_DB
No ratings yet
Unit5_Notes_Short_DB
6 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
AZ-104.examcollection - Premium.exam.103q: Number: AZ-104 Passing Score: 800 Time Limit: 120 Min File Version: 2.0
No ratings yet
AZ-104.examcollection - Premium.exam.103q: Number: AZ-104 Passing Score: 800 Time Limit: 120 Min File Version: 2.0
112 pages
Floormap3Di-R Manual Section 4 SIMS
No ratings yet
Floormap3Di-R Manual Section 4 SIMS
84 pages
Learning Activity Sheet Java (LAS) - 5
No ratings yet
Learning Activity Sheet Java (LAS) - 5
9 pages
CAT 1 OpenBook
No ratings yet
CAT 1 OpenBook
2 pages
3HAC032104 OM RobotStudio-En
No ratings yet
3HAC032104 OM RobotStudio-En
414 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
JMeter Tutorial
100% (2)
JMeter Tutorial
53 pages
MENTARI 4766-Vm-3000-Series-Integrated-Voice-Evac-System-Brochure-Brochure
No ratings yet
MENTARI 4766-Vm-3000-Series-Integrated-Voice-Evac-System-Brochure-Brochure
4 pages
Analysis of The New Features of OpenFlow 1.4 PDF
No ratings yet
Analysis of The New Features of OpenFlow 1.4 PDF
5 pages
How To Cancel/ Restart The Cost Manager:: More Create Blog Sign in
No ratings yet
How To Cancel/ Restart The Cost Manager:: More Create Blog Sign in
4 pages
Virtual Instrumentation and Traditional Instruments
0% (1)
Virtual Instrumentation and Traditional Instruments
5 pages
Chapter 2-Let's Move, Maqueen!
No ratings yet
Chapter 2-Let's Move, Maqueen!
6 pages
1.overview of R and RStudio
No ratings yet
1.overview of R and RStudio
19 pages
VLOOKUP
No ratings yet
VLOOKUP
6 pages
Motocalv Eg: Calibration Utilities
No ratings yet
Motocalv Eg: Calibration Utilities
2 pages
697248-2026-2028-syllabus
No ratings yet
697248-2026-2028-syllabus
56 pages
CPTC Aman Sinha
No ratings yet
CPTC Aman Sinha
57 pages
f5 NDCPP ST
No ratings yet
f5 NDCPP ST
66 pages
Questions Set 13 Aug 2022
No ratings yet
Questions Set 13 Aug 2022
64 pages
MCS-022 Solved Assignment 2015-16 PDF
No ratings yet
MCS-022 Solved Assignment 2015-16 PDF
60 pages
Lenovo V520S SFF Spec
No ratings yet
Lenovo V520S SFF Spec
1 page
Lab 1-4 - Reports
No ratings yet
Lab 1-4 - Reports
29 pages
I Descriptors
No ratings yet
I Descriptors
3 pages
The 8086 Interrupt Mechanism: The 8259A PIC
No ratings yet
The 8086 Interrupt Mechanism: The 8259A PIC
14 pages
CCIE Enterprise Infrastructure Lab v1.0 - Practice Lab - Topology
No ratings yet
CCIE Enterprise Infrastructure Lab v1.0 - Practice Lab - Topology
11 pages
Switching
No ratings yet
Switching
21 pages
Wa0001.
No ratings yet
Wa0001.
6 pages
Clearing Equallogic Lost Raid Blocks - Virtual Grind
No ratings yet
Clearing Equallogic Lost Raid Blocks - Virtual Grind
2 pages
Genesys Logic GL834 MNY03 - C161830
No ratings yet
Genesys Logic GL834 MNY03 - C161830
16 pages

10 NoSQL Databases - HBase Hive Cassandra

Uploaded by

10 NoSQL Databases - HBase Hive Cassandra

Uploaded by

NO SQL DATABASES

HBASE CASSANDRA HIVE

Partition ACID vs BASE

• Some RDBMs support sharding but …

• Sharding is a type of database partitioning that separates very large

► Map<byte[], Map<byte[], Map<byte[], Map<Long, byte[]>>>>

• Family definitions are static

• Special tables exist inside HBase

 Don’t use HBase if…

Cassandra enables us to tune the Consistency based

It scales consistently and semlessly to hundreds of

Why Another Data Warehousing System?

• A system for managing and querying

• Key Building Principles

Hive Data Model: Tables

Hive Data Model: Partitions

LOAD DATA LOCAL INPATH 'input/hive/partitions/file1'

Hive Data Model: Buckets

LOAD DATA INPATH '/user/falvariz/hive/sample.txt' OVERWRITE

• Create Table as Select

• Sorting and Aggregating

You might also like