0% found this document useful (0 votes)
210 views8 pages

Cassandra Data Model Big Data Seminar

Apache Cassandra is a distributed NoSQL database designed for handling large datasets with high availability and no single point of failure. It utilizes a wide-column store model organized into keyspaces and tables, allowing for dynamic column addition and efficient data modeling. Cassandra is particularly beneficial for big data analytics due to its scalability, high write throughput, and fault-tolerant architecture.

Uploaded by

msroshni232006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
210 views8 pages

Cassandra Data Model Big Data Seminar

Apache Cassandra is a distributed NoSQL database designed for handling large datasets with high availability and no single point of failure. It utilizes a wide-column store model organized into keyspaces and tables, allowing for dynamic column addition and efficient data modeling. Cassandra is particularly beneficial for big data analytics due to its scalability, high write throughput, and fault-tolerant architecture.

Uploaded by

msroshni232006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Seminar

Regno:611723104085
Name:roshni m s.
Subject:Big data
analytics
Introduction to Cassandra
• • Apache Cassandra is a distributed NoSQL
database.
• • Designed to handle large amounts of data
across many servers.
• • Provides high availability with no single point
of failure.
• • Suitable for big data applications.
Cassandra Data Model Overview
• • Cassandra uses a wide-column store model.
• • Data is organized into keyspaces and tables.
• • Tables consist of rows identified by a primary
key.
• • Columns within a row can be added
dynamically.
Key Data Modeling Concepts
• • Keyspace: Top-level namespace that defines
replication.
• • Table: Structure that holds data with defined
schema.
• • Partition Key: Determines data distribution
across nodes.
• • Clustering Columns: Define data sorting
within partitions.
• • Primary Key = Partition Key + Clustering
Columns.
Example: Student Table
• CREATE TABLE student (
• student_id UUID PRIMARY KEY,
• name TEXT,
• age INT,
• department TEXT
• );

• • Each student has a unique student_id.


• • Data is partitioned by student_id.
Example: Composite Primary Key
• CREATE TABLE marks (
• student_id UUID,
• subject TEXT,
• score INT,
• PRIMARY KEY (student_id, subject)
• );

• • Partitioned by student_id, clustered by


subject.
Benefits for Big Data Analytics
• • Scalable architecture suitable for growing
datasets.
• • High write throughput and low latency
reads.
• • Ideal for real-time analytics applications.
• • Fault-tolerant and distributed by design.
Conclusion
• • Cassandra is a powerful NoSQL database for
big data.
• • Its data model supports flexible and efficient
storage.
• • Widely used in industries requiring
scalability and speed.
• • Ideal choice for Big Data Analytics solutions.

You might also like