0% found this document useful (0 votes)

349 views63 pages

Understanding Big Data Basics

This document provides an overview of big data basics. It defines big data as large and complex datasets that are difficult to process using traditional database management tools. It then gives examples of the volume of data generated each day by various companies and services. The key characteristics of big data - volume, variety, velocity, veracity, and value - are explained. Common big data tools, applications, and challenges are also outlined. Finally, the document discusses how big data works through integration, management, and analysis stages using descriptive, predictive, and prescriptive analytics.

Uploaded by

Memes Instagram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

349 views63 pages

Understanding Big Data Basics

Uploaded by

Memes Instagram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 63

BIG

DATA
Basics
The BIGGER picture...
BIG DATA
PRESENTED BY: SUBMITTED TO:

RONICA GUPTA PROF. ALKA JINDAL

16104048 COMPUTER SCIENCE DEPARTMENT

[email protected]
How big is BIG DATA?
Big data is the term for
a collection of data sets
so large and complex
that it becomes difficult
to process using on-
hand database
management tools or
traditional data
processing applications.
The Depth of BIG DATA
● Walmart handles more than 1 million customer transactions every hour.
● Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.
● 230+ millions of tweets are created every day.
● More than 5 billion people are calling, texting, tweeting and browsing on
mobile phones worldwide.
● YouTube users upload 48 hours of new video every minute of the day.
● Amazon handles 15 million customer clickstream user data per day to
recommend products.
● 294 billion emails are sent every day. Services analyses this data to find the
spams.
● Modern cars have close to 100 sensors which monitors fuel level, tire pressure
etc. , each vehicle generates a lot of sensor data.
BIG DATA Characteristics
Volume
● The name ‘Big Data’ itself is related to a size which is enormous.
● Volume is a huge amount of data.
● To determine the value of data, size of data plays a very crucial role. If
the volume of data is very large then it is actually considered as a ‘Big
Data’.
● Hence while dealing with Big Data it is necessary to consider a
characteristic ‘Volume’
Variety
Structured Semi-Structured Unstructured

● stored and processed ● some organizational ● unknown form

in a fixed format properties like tags ● can’t be analyzed
● easy to process and markers unless transformed
Velocity
● Velocity refers to the high speed of accumulation of data.
● In Big Data velocity data flows in from sources like machines,
networks, social media, mobile phones etc.
● There is a massive and continuous flow of data. This determines
the potential of data that how fast the data is generated and
processed to meet the demands.
● Sampling data can help in dealing with the issue like ‘velocity’.
Veracity
● It refers to inconsistencies and uncertainty in data, that is data which is
available can sometimes get messy and quality and accuracy are difficult to
control.
● Big Data is also variable because of the multitude of data dimensions
resulting from multiple disparate data types and sources.
Value
● Data in itself is of no use or importance but it needs to be
converted into something valuable to extract Information
● Value denotes the added value for companies. Many companies
have recently established their own data platforms, filled their data
pools and invested a lot of money in infrastructure.
Accessing BIG DATA

DATA DATA DATA

DATA MINING DATA CLEANING VISUALIZATION
Data sets come in
STORAGE ANALYSIS
Data mining is This is the part
all shapes and The major
the process of Once all the data that takes all the
sizes. Before one difficulty with Big
discovering has been collected
can even think Data is managing work done prior
it needs to be
insights within about how the how it will be and outputs a
analysed to look
a database. data will be stored, stored. visualisation that
for interesting
it needs to be in
patterns and ideally anyone
an acceptable
trends. can understand.
format.
BIG DATA Tools
BIG DATA Applications
BIG DATA Challenges
THANK YOU!
ANY QUESTIONS?
What happened that
made data

“BIG”
A presentation on BIG DATA for beginners
BIG DATA
Submitted by Submitted to
Abhinav Chadha Prof. Alka Jindal
16104122 Computer Science Department
[email protected]
What actually is
BIG DATA?
BIG DATA refers to huge volume
of DATA that cannot be stored
and processed using the
traditional database and
software techniques
How huge does a DATA needs to be?
100 MB

It’s all
Relative!
Why study BIG
DATA?
48hrs
90% Of video is uploaded every minute
on YOUTUBE!

Of all data in this world is

created in past two years 322 PB
Worth Image files are sent
every hour over Whatsapp
How BIG DATA works?
1. Integrate 2. Manage 3. Analyze
During integration, you need You can store your data in Build data models with
to bring in the data, process any form you want and bring machine learning and
it, and make sure it’s your desired processing artificial intelligence. Put
formatted and available in a requirements and necessary your data to work.
form that your business process engines to those
analysts can get started data sets on an on-demand
with. basis
BIG DATA a concept!
BIG DATA ANALYTICS
● used by companies
to facilitate their
growth and
development

● This involves
applying various
data mining
algorithms on the
given set of data

● aid them in better

decision making.
What is analysis of BIG DATA important?
How BIG DATA works?
Three Stages in
BIG DATA Analytics
Descriptive
Analysis
This analysis performs in-depth
analysis of historical data to
reveal details such as underlying
reason for failures

Descriptive analytics answers the

question
‘What happened in the business’?
Predictive
Analysis
In predictive analytics, machine
learning and statistics tools are
used to predict the future.

Predictive analytics analyse

previous reports to predict the
future
‘What could happened’?
Prescriptive
Analysis
This analysis minimize or
maximize the manufacturing cost,
reframe marketing policies etc.

Prescriptive analytics answer the

question
‘What should we do’?
It’s not the end,
It's the beginning!
Feel free to ask
questions
Technologies and Applica
of Big Data

Submitted by: Soumy Jain

[email protected]
Big Data Technology can be defined as a Software-Utility that is designed to Ana
Extract the information from an extremely complex and large data sets which the
Processing Software could never deal with.

Big Data Technology is mainly classified into two types:

Operational Big Data Technologies

Analytical Big Data Technologies

Firstly, The Operational Big Data is all about the normal day to day data that we g
be the Online Transactions, Social Media, or the data from a Particular Organisat
even consider this to be a kind of Raw Data which is used to feed the Analytical
Technologies.
Big Data Technologies
Big Data Technologies in Data Storage.

Hadoop
Hadoop Framework was designed to store and process data in a Distributed Data
with commodity hardware with a simple programming model. It can Store and An
different machines with High Speeds and Low Costs.

Developed by: Apache Software Foundation in the year 2011 10th of Dec.
Written in: JAVA
Current stable version: Hadoop 3.11

MongoDB
The NoSQL Document Databases like MongoDB, offer a direct alternative to the rig
Relational Databases. This allows MongoDB to offer Flexibility while handling a wide
large volumes and across Distributed Architectures.

Developed by: MongoDB in the year 2009 11th of Feb

Written in: C++, Go, JavaScript, Python
Current stable version: MongoDB 4.0.10
Rainstor

RainStor is a software company that developed a Database Management System

designed to Manage and Analyse Big Data for large enterprises. It uses Deduplica
organize the process of storing large amounts of data for reference.

Developed by: RainStor Software company in the year 2004.

Works like: SQL
Current stable version: RainStor 5.5

Hunk
Hunk lets you access data in remote Hadoop Clusters through virtual indexes and
Search Processing Language to analyse your data. With Hunk, you can Report a
amounts from your Hadoop and NoSQL data sources.

Developed by: Splunk INC in the year 2013.

Written in: JAVA
Current stable version: Splunk Hunk 6.2
Components and ecosystem of bigd

Technologies
 Business intelligence
 Cloud computing
 Databases

Techniques for analyzing Data

 A/B testing
 Machine learning
 Natural language processing

Visualization
 Charts
 Graphs
 Multidimensional big data can also be represented as da
mathematically, tensors.
 Array Database Systems have set out to provide storage
query support on this data type.
 Additional technologies being applied to big data includ
based computation such as multilinear subspace learning
parallel-processing (MPP) databases, search-based appli
mining, distributed file systems, distributed cache (e.g., bu
Memcached), distributed databases, cloud and HPC-bas
(applications, storage and computing resources)and the
 Although, many approaches and technologies have bee
still remains difficult to carry out machine learning with big
Applications

 Government
The use and adoption of big data within governmental processes allows efficie
productivity, and innovation
CRVS (Civil Registration and Vital Statistics) collects all certificates status from b
source of big data for governments.

 Manufacturing
Big data provides an infrastructure for transparency in the manufacturing indu
unravel uncertainties such as inconsistent component performance and availa

Predictive manufacturing as an applicable approach toward near-zero downt

requires a vast amount of data and advanced prediction tools for a systematic
useful information
 Healthcare
Big data analytics has helped healthcare improve by providing personalized me
prescriptive analytics, clinical risk intervention and predictive analytics, waste an
reduction, automated external and internal reporting of patient data, standard
and patient registries and fragmented point solutions

Human inspection at the big data scale is impossible and there is a desperate n
for intelligent tools for accuracy and believability control and handling of inform

 Education
Private bootcamps have been developed programs to meet big data demand
like The Data Incubator and programs like General Assembly
 Media
Publishing environments are increasingly tailoring messages (advertisements) an
to appeal to consumers that have been exclusively gleaned through various da

• Targeting of consumers (for advertising by marketers)

• Data capture
• Data journalism: publishers and journalists use big data tools to provide uniqu
insights and infographics.

 Internet of Things (IoT)

Big data and the IoT work in conjunction. Data extracted from IoT devices provi
device interconnectivity. Such mappings have been used by the media industr
governments to more accurately target their audience and increase media eff
increasingly adopted as a means of gathering sensory data, and this sensory da
medical, manufacturing and transportation contexts.
How is Big Data stored and
processed?

Aniket Tiwari
16104093
[email protected]
How is Big Data stored and processed?
There are two approaches for storing and processing
Big Data:-

• Traditional approach
• Modern approach
Traditional Approach
• The data that is being generated is given as an input
to the ETL System.
• An ETL System, would then Extract this data, and
transform it.
• Now the end users can generate reports and perform
analytics, by querying this data.
• But as the data grows, it becomes a very challenging
task to manage and process this data.
Drawbacks of Traditional Approach

• It an expensive system
• No scalability
• It is time consuming
Modern Approach
• Hadoop is used to store and process, a huge volume
of data, efficiently.
• Hadoop has two components – HDFS and
MapReduce.
• HDFS takes care of storing and managing the data
within the Hadoop Cluster.
• MapReduce takes care of processing and computing
the data, that is present within the HDFS.
Hadoop Cluster
• Hadoop cluster comprises of – Master Node,
Slave Node and a Secondary Node.
How does Hadoop work?
• HDFS divides data into chunks
• Each part of data is stored into a separate
node
• Each part is replicated on other nodes to
increase availability and decrease latency time
Features provided by Hadoop
• Cost effective
• Cluster of Nodes
• Parallel processing
• Distributed data
• Automatic Fail Over Management
• Data locality optimization
• Heterogeneous Cluster
• Scalability
APACHE PIG
Vandit Goel
16104034
[email protected]
INTRODUCTION
TO PIG
Introduction

● Apache Pig is a platform for data analysis. It is an alternative to

MapReduce Programming.

● Pig was developed as a research project at Yahoo in 2006.

Why Pig?

● Writing mappers and reducers by hand takes a long time.

● Pig introduces Pig Latin, a scripting language that lets you use SQL-like
syntax to define your map and reduce steps.

● Highly extensible with user-defined functions (UDF’s).

Pig Architecture
Pig Architecture

1. Parser

At first, all the Pig Scripts are handled by the Parser. Parser basically checks
the syntax of the script, does type checking, and other miscellaneous checks.
Afterwards, Parser’s output will be a DAG (directed acyclic graph) that
represents the Pig Latin statements as well as logical operators.

The logical operators of the script are represented as the nodes and the data
flows are represented as edges in DAG (the logical plan).
Pig Architecture

2. Optimizer

Afterwards, the logical plan (DAG) is passed to the logical optimizer. It carries
out the logical optimizations.

3. Compiler

Then compiler compiles the optimized logical plan into a series of MapReduce
jobs.
Pig Architecture

4. Execution engine

Eventually, all the MapReduce jobs are submitted to Hadoop in a sorted order.

Ultimately, it produces the desired results while these MapReduce jobs are
executed on Hadoop.
Working of Pig
How twitter used Apache Pig to analyse
their large data set ?
Twitter had both semi-structured data like Twitter Apache logs, Twitter search
logs, Twitter MySQL query logs, application logs and structured data like
tweets, users, block notifications, phones, favorites, saved searches, re-tweets,
authentications, SMS usage, user followings, etc. which can be easily
processed by Apache Pig.

Twitter dumps all its archived data on HDFS. It has two tables i.e. user data and
tweets data. User data contains information about the users like username,
followers, followings, number of tweets etc. While Tweet data contains tweet,
its owner, number of re-tweets, number of likes etc. Now, twitter uses this data
to analyse their customer’s behaviors and improve their past experiences.
How twitter used Apache Pig to analyse
their large data set ?

The step by step solution of this problem is shown in the above image.

STEP 1– First of all, twitter imports the twitter tables (i.e. user table and tweet
table) into the HDFS.

STEP 2– Then Apache Pig loads (LOAD) the tables into Apache Pig framework.

STEP 3– Then it joins and groups the tweet tables and user table using GROUP
command.
How twitter used Apache Pig to analyse
their large data set ?

STEP 4– Then the tweets are counted according to the users using COUNT
command. So, that the total number of tweets per user can be easily
calculated.

STEP 5– At last the result is joined with user table to extract the user name
with produced result.

STEP 6– Finally, this result is stored back in the HDFS..

How twitter used Apache Pig to analyse their large data set ?
Properties of Pig?

● Pig process data in parallel on the Hadoop cluster.

● It provides a language called as “Pig Latin” to express data flows.
● Pig Latin contains operators for many of the traditional data operation
such as LOAD, STORE, Filter, Foreach, Group By, Distinct etc.
● It allows users to develop their own functions (user defined functions) for
reading, processing and writing data.

Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
0% (1)
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
Traits and Characteristics of Big Data
No ratings yet
Traits and Characteristics of Big Data
25 pages
Big Data
No ratings yet
Big Data
3 pages
Big Data Tech Guide for Organizations
No ratings yet
Big Data Tech Guide for Organizations
8 pages
Big Data and Spark Developer Course
No ratings yet
Big Data and Spark Developer Course
5 pages
Big Data Not Right Data Yes
No ratings yet
Big Data Not Right Data Yes
8 pages
Data Science Intro for Women
0% (1)
Data Science Intro for Women
24 pages
2.introduction To Big Data
No ratings yet
2.introduction To Big Data
28 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
30 pages
Data Engineer Path - Hands On SQL, Data Pipelines - Dataquest
No ratings yet
Data Engineer Path - Hands On SQL, Data Pipelines - Dataquest
1 page
Big Data & Data Analytics Training Course
No ratings yet
Big Data & Data Analytics Training Course
5 pages
IBM Big Data Presentation
No ratings yet
IBM Big Data Presentation
32 pages
Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
43 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Big Data: Concepts, Challenges, and Solutions
No ratings yet
Big Data: Concepts, Challenges, and Solutions
22 pages
BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
Big Data Analytics
No ratings yet
Big Data Analytics
131 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
Big Data Architecture Overview
No ratings yet
Big Data Architecture Overview
8 pages
Big Data Course Overview and Modules
No ratings yet
Big Data Course Overview and Modules
4 pages
DW Slides
No ratings yet
DW Slides
248 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
175 pages
Big Data in Mobile Networks
No ratings yet
Big Data in Mobile Networks
15 pages
Big Data Processing Techniques Guide
No ratings yet
Big Data Processing Techniques Guide
134 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
Data Warehousing Overview and Benefits
No ratings yet
Data Warehousing Overview and Benefits
63 pages
AI in Business: Overview and Insights
No ratings yet
AI in Business: Overview and Insights
8 pages
Big Data in E-Commerce
100% (2)
Big Data in E-Commerce
21 pages
Unit 1 Introduction To Big Data and Hadoop
No ratings yet
Unit 1 Introduction To Big Data and Hadoop
100 pages
Big Data Analytics (CS443) IV B.Tech (IT) 2018-19 I Semester
No ratings yet
Big Data Analytics (CS443) IV B.Tech (IT) 2018-19 I Semester
72 pages
BIG DATA For BBA
No ratings yet
BIG DATA For BBA
80 pages
Big Data Analysis Workshop 2018
No ratings yet
Big Data Analysis Workshop 2018
4 pages
Data Warehousing for Decision Makers
No ratings yet
Data Warehousing for Decision Makers
31 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Master PySpark 1-18
No ratings yet
Master PySpark 1-18
59 pages
Pixel-Oriented Visualization in Data Analytics
No ratings yet
Pixel-Oriented Visualization in Data Analytics
61 pages
Overview of Apache Spark and RDDs
100% (1)
Overview of Apache Spark and RDDs
109 pages
BIG DATA ANALYTICS - Syllabus
No ratings yet
BIG DATA ANALYTICS - Syllabus
4 pages
Spark Questions Imp
No ratings yet
Spark Questions Imp
33 pages
Databricks Certified Data Engineer Associate Course V2 Release
No ratings yet
Databricks Certified Data Engineer Associate Course V2 Release
300 pages
4.big Data Technology Landscape
No ratings yet
4.big Data Technology Landscape
31 pages
Bda - Unit 1
No ratings yet
Bda - Unit 1
33 pages
Analytics: The Real-World Use of Big Data: How Innovative Enterprises Extract Value From Uncertain Data
100% (1)
Analytics: The Real-World Use of Big Data: How Innovative Enterprises Extract Value From Uncertain Data
22 pages
Big Data Academy Overview Deck 032918 Updated
No ratings yet
Big Data Academy Overview Deck 032918 Updated
34 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
Nikitha Data Analyst
No ratings yet
Nikitha Data Analyst
4 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
55 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
10
No ratings yet
10
4 pages
HR Analytics for Workforce Insights
No ratings yet
HR Analytics for Workforce Insights
4 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Big Data Analysis Introduction
No ratings yet
Big Data Analysis Introduction
42 pages
Big Data Seminar Overview
No ratings yet
Big Data Seminar Overview
30 pages
Understanding SAP STMS Configuration
No ratings yet
Understanding SAP STMS Configuration
11 pages
Instruments in MATLAB
No ratings yet
Instruments in MATLAB
30 pages
ITEM 8 Datasheet c78 742473
No ratings yet
ITEM 8 Datasheet c78 742473
10 pages
TecnoManager User Manual
100% (1)
TecnoManager User Manual
51 pages
Master Proposal - UKM
No ratings yet
Master Proposal - UKM
5 pages
Cupcarbon A Multi-Agent and Discrete Event
No ratings yet
Cupcarbon A Multi-Agent and Discrete Event
7 pages
Multiplexing vs. Multiple Access Techniques
No ratings yet
Multiplexing vs. Multiple Access Techniques
32 pages
Media Access Control Protocols
No ratings yet
Media Access Control Protocols
71 pages
GLBP Implementation Lab Guide
No ratings yet
GLBP Implementation Lab Guide
15 pages
Device Center User Manual
No ratings yet
Device Center User Manual
88 pages
Ibusinside BMW
No ratings yet
Ibusinside BMW
30 pages
Internal Load Balancing on GCP Guide
No ratings yet
Internal Load Balancing on GCP Guide
18 pages
Networking Devices Overview
No ratings yet
Networking Devices Overview
34 pages
HP Vertica 7.1.x AdministratorsGuide
No ratings yet
HP Vertica 7.1.x AdministratorsGuide
1,217 pages
KS - ISHO PS Problem & Solution - v.1
No ratings yet
KS - ISHO PS Problem & Solution - v.1
2 pages
DSL 2730EUser Manual
100% (1)
DSL 2730EUser Manual
93 pages
Cachatto Corporate Brochure Option2
No ratings yet
Cachatto Corporate Brochure Option2
7 pages
Fiber Optic Internet Pricing
No ratings yet
Fiber Optic Internet Pricing
3 pages
Nighthawk App Setup & Features Guide
No ratings yet
Nighthawk App Setup & Features Guide
2 pages
Cloud Security and BYOD Responsibilities
No ratings yet
Cloud Security and BYOD Responsibilities
3 pages
Schneider Electric - PowerLogic-Power-Quality-Meters-PM8000 - METSEPM8240
No ratings yet
Schneider Electric - PowerLogic-Power-Quality-Meters-PM8000 - METSEPM8240
8 pages
Understanding the Internet: Definition, Evolution, and Services
No ratings yet
Understanding the Internet: Definition, Evolution, and Services
10 pages
ALC Eth
No ratings yet
ALC Eth
130 pages
What Is A Network
No ratings yet
What Is A Network
47 pages
ISCOM2828F Configuration Guide 20090619 PDF
No ratings yet
ISCOM2828F Configuration Guide 20090619 PDF
413 pages
Manual For All Netlink Ethernet Versions
No ratings yet
Manual For All Netlink Ethernet Versions
65 pages
Epson EB-W18 Projector Overview
No ratings yet
Epson EB-W18 Projector Overview
2 pages
MX-ONE Provisioning Manager
No ratings yet
MX-ONE Provisioning Manager
42 pages
How To - Smart RF WiNG 5 v1.6 Final
No ratings yet
How To - Smart RF WiNG 5 v1.6 Final
30 pages
ThinkStation P330 Tower SFF Datasheet
No ratings yet
ThinkStation P330 Tower SFF Datasheet
3 pages

Understanding Big Data Basics

Uploaded by

Understanding Big Data Basics

Uploaded by

BIG

RONICA GUPTA PROF. ALKA JINDAL

16104048 COMPUTER SCIENCE DEPARTMENT

● stored and processed ● some organizational ● unknown form

DATA DATA DATA

Of all data in this world is

● aid them in better

Descriptive analytics answers the

Predictive analytics analyse

Prescriptive analytics answer the

Submitted by: Soumy Jain

Big Data Technology is mainly classified into two types:

Operational Big Data Technologies

Developed by: MongoDB in the year 2009 11th of Feb

RainStor is a software company that developed a Database Management System

Developed by: RainStor Software company in the year 2004.

Developed by: Splunk INC in the year 2013.

Techniques for analyzing Data

Predictive manufacturing as an applicable approach toward near-zero downt

• Targeting of consumers (for advertising by marketers)

 Internet of Things (IoT)

● Apache Pig is a platform for data analysis. It is an alternative to

● Pig was developed as a research project at Yahoo in 2006.

● Writing mappers and reducers by hand takes a long time.

● Highly extensible with user-defined functions (UDF’s).

STEP 6– Finally, this result is stored back in the HDFS..

● Pig process data in parallel on the Hadoop cluster.

You might also like