DATA
ANALYTICS
BIG DATA
Name : Keerthi S
Class:5 th sem
BCA(B)
What is Big Data?
Big data describes large and diverse datasets that are
huge in volume and also rapidly grow in size over time.
Characteristics of Big Data
1.Volume: Massive amounts of data are generated
continuously (e.g., terabytes and petabytes of data from
various sources).
2.Velocity: Data is generated at high speeds, often in real
time (e.g., streaming data).
3.Variety: Data comes in different formats, from structured
to unstructured (e.g., texts, images, videos).
4.Veracity: The quality and accuracy of the data are crucial,
as low-quality data can lead to misleading insights.
5.Value: The ability to extract meaningful insights and value
from data is the ultimate goal of Big Data analysis.
Importance of bigdata
•Informed Decision-Making
•Improved Operational Efficiency
•Enhanced Customer Experience
•Competitive Advantage
•Predictive Analytics
•Risk Management
•Cost Reduction
•Innovation and Product Development
Structured data
• Highly organized, follows a strict format with a predefined schema.
• Stored in relational databases (SQL-based systems).
• Key Features:
• Data is arranged in rows and columns.
• Easy to search, query, and analyze using traditional tools (e.g., SQL queries).
• Examples:
• Customer Records: Names, addresses, phone numbers in a CRM system.
• Financial Transactions: Bank transactions including amounts, dates, and
account numbers.
• Inventory Data: Product IDs, prices, and quantities in retail databases.
• Employee Data: Payroll details, job titles, department information.
Unstructured data
• Not organized in a predefined manner, no fixed structure.
• Cannot be stored in traditional relational databases; requires specialized tools (e.g.,
NoSQL, Hadoop).
• Key Features:
• Does not fit neatly into rows and columns.
• Requires more advanced techniques like text mining or machine learning for analysis.
• Examples:
• Social Media Content: Tweets, Facebook posts, Instagram images/videos.
• Multimedia Files: Images, videos, and audio recordings.
• Customer Feedback: Reviews, emails, chat logs in free-text format.
• Surveillance Data: CCTV footage, audio files from calls or meetings .
Semi-structured data
• Data that does not fit into a traditional relational database but still has some
organizational properties.
• Contains tags or markers that organize elements, but the structure can be
flexible.
• Key Features:
• Flexible schema, not as rigid as structured data but still organized in a
readable way.
• Can be processed more easily than unstructured data but requires
specialized tools.
• Examples:
• JSON and XML Files: Web APIs providing structured responses (e.g., weather
data, product catalogs).
• Emails: Contains unstructured text (body) but structured metadata (sender,
timestamp, subject).
• Log Files: System logs that follow a certain structure but may vary in content.
• HTML Pages: Contain structured tags but can hold unstructured content
(e.g., text, images).
Big data sources
1.User-generated content: posts, images, videos.
Platforms: Facebook, Twitter, Instagram.
Insights into behaviour , preferences, and trends.
Sensors and IoT Devices
2.Data from smart devices, wearables, and industrial sensors.
Real-time data: temperature, heart rate, location.
Transactional Data
3.Financial transactions, retail sales, and e-commerce.
Sources: credit card payments, online orders.
Healthcare Records
4.Electronic Health Records (EHRs), clinical data.
Used for patient monitoring and personalized medicine.
Big data sources
5.Server and application logs.
Used for system monitoring and cybersecurity.
Multimedia Data
6.Images, audio, video from sources like YouTube, CCTV.
Used for facial recognition, video analytics.
Public Data
7.Government data, research publications.
Used in policy-making and research.
GPS and Geospatial Data
8.Location data from smartphones, GPS devices.
Used for mapping, traffic monitoring, logistics
.
Technologies used in Bigdata
•Hadoop: An open-source framework for distributed
storage and processing of large datasets across clusters.
•Apache Spark: A fast, in-memory data processing
engine for batch and real-time analytics.
•NoSQL Databases: Non-relational databases designed
to handle unstructured and semi-structured data.
•Apache Kafka: A distributed streaming platform for
building real-time data pipelines and streaming
applications.
•Data Warehousing Solutions: Systems optimized for
data analysis and reporting, such as Amazon Redshift and
Google Big Query.
•Apache Flank: A stream processing framework that
enables high-throughput and low-latency data processing.
•Data Visualization Tools: Software solutions like
Tableau and Power BI used for creating visual
representations of data insights.
Application of Big data
Healthcare and Personalized Medicine
Financial Services and Fraud Detection
Retail and E-commerce
Social Media and Sentiment Analysis
Manufacturing and Supply Chain Optimization
Smart Cities and IoT
Telecommunications and Network Optimization
Education and Personalized Learning
Marketing and Customer Insights
Energy and Utilities Management
Big Data Use Case:
Healthcare
Predictive Analytics for Patient
Outcomes
Personalized Medicine
Medical Imaging Analysis
Electronic Health Records
(EHR) Optimization
Wearables and IoT for Remote
Monitoring
Drug Discovery and
Development
Fraud Detection in Healthcare
Claims
Operational Efficiency in
Big data use case-Business
Customer Behaviour Analysis
Personalized Marketing
Supply Chain Optimization
Predictive Maintenance
Risk Management and Fraud
Detection
Product Development and
Innovation
Financial Analytics and
Forecasting
Real-Time Business Decision
Making
Big data use case-Education
•Personalized Learning
•Student Performance Analytics
•Curriculum Development
•Predictive Analytics for Student
Success
•Optimizing Resource Allocation
•Early Warning Systems for
Dropout Prevention
•Enhancing Student Engagement
•Data-Driven Decision Making in
Educational Institutions
Case study
•Company Overview
•Netflix: A global streaming service with over 230 million subscribers (as of 2024).
•Big Data Challenge
•Managing and analyzing vast amounts of user data to personalize recommendations, improve user
experience, and optimize content delivery.
•Big Data Solution
•Recommendation Engine: Netflix uses big data analytics to power its recommendation system,
analyzing viewing history, ratings, and search data.
•A/B Testing: Analyzes user interaction to optimize features like thumbnails, autoplay, and user interface
designs.
•Content Creation: Analyzes viewing patterns and preferences to create hit original shows like House of
Cards and Stranger Things.
•Technologies Used
•Apache Spark for real-time data processing.
•Hadoop for distributed storage and large-scale data processing.
•Machine Learning algorithms for recommendations.
•Results/Impact
•Increased user engagement with highly personalized recommendations (80% of content watched is from
recommendations).
•Enhanced content production strategies, leading to higher customer satisfaction and retention.
•Optimized streaming quality based on user data, reducing buffering issues and improving viewing
experience
Challenges of big data
Increased Use of AI and
future of Big data
Machine Learning.
Growth in Real-Time
Analytics.
Edge Computing
Enhanced Data Privacy and
Security
Expansion of Cloud-Based Big
Data Solutions
Rise of Data-as-a-Service
(DaaS)
Wider Adoption Across
Industries
Thank you