Seminar
Regno:611723104085
Name:roshni m s.
Subject:Big data
analytics
Introduction to Cassandra
• • Apache Cassandra is a distributed NoSQL
database.
• • Designed to handle large amounts of data
across many servers.
• • Provides high availability with no single point
of failure.
• • Suitable for big data applications.
Cassandra Data Model Overview
• • Cassandra uses a wide-column store model.
• • Data is organized into keyspaces and tables.
• • Tables consist of rows identified by a primary
key.
• • Columns within a row can be added
dynamically.
Key Data Modeling Concepts
• • Keyspace: Top-level namespace that defines
replication.
• • Table: Structure that holds data with defined
schema.
• • Partition Key: Determines data distribution
across nodes.
• • Clustering Columns: Define data sorting
within partitions.
• • Primary Key = Partition Key + Clustering
Columns.
Example: Student Table
• CREATE TABLE student (
• student_id UUID PRIMARY KEY,
• name TEXT,
• age INT,
• department TEXT
• );
• • Each student has a unique student_id.
• • Data is partitioned by student_id.
Example: Composite Primary Key
• CREATE TABLE marks (
• student_id UUID,
• subject TEXT,
• score INT,
• PRIMARY KEY (student_id, subject)
• );
• • Partitioned by student_id, clustered by
subject.
Benefits for Big Data Analytics
• • Scalable architecture suitable for growing
datasets.
• • High write throughput and low latency
reads.
• • Ideal for real-time analytics applications.
• • Fault-tolerant and distributed by design.
Conclusion
• • Cassandra is a powerful NoSQL database for
big data.
• • Its data model supports flexible and efficient
storage.
• • Widely used in industries requiring
scalability and speed.
• • Ideal choice for Big Data Analytics solutions.