0% found this document useful (0 votes)

83 views21 pages

Rainfall Analysis in India Project

This document is a project synopsis for analyzing rainfall in India submitted to the University of Mumbai. It was developed by Mr. Ashank Y Patil under the guidance of Prof. Aniruddha Phadke. The synopsis provides an introduction to the climate and rainfall patterns in India. It also summarizes related work on analyzing rainfall data using big data tools like Hadoop and Hive. The objective, methodology, and references for the project are outlined.

Uploaded by

gunjan jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views21 pages

Rainfall Analysis in India Project

Uploaded by

gunjan jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

NAAC Accredited ’’A”

D. E. Society’s

Kirti College of Arts, Science And Commerce

Department Of Computer Science & IT
(2018-2019)

A PROJECT SYNOPSIS ON

“RAINFALL IN INDIA ANALYSIS”

Exam Seat No: 30541

SUBMITTED
TO

UNIVERSITY OF MUMBAI

Guided By: Developed By:

[Link] Phadke [Link] Y Patil

Certificate

This is to certify that the project Synopsis “ RAINFALL ANALYSIS IN INDIA”

under the guidance of Prof. Aniruddha Phadke has successfully completed by
examination Seat no: 30541 as per syllabus and partial fulfillment for the
completion of the [Link] II Sem III in Computer Science from University Of
Mumbai.
It is also to certify that this is the original work of the candidate during the
academic year 2018-2019.

Date:

Place: MUMBAI

Project Guide Head of Department External Examiner

IndeX
Sr. No Topic Page No.
1 Introduction
2 Rainfall Analysis(Related Work)

3 Objective
4 Methodology
5 References
Introduction:

Climate of India

The Climate of India comprises a wide range of weather conditions across a vast

geographic scale and varied topography, making generalisations difficult.

The country's meteorological department follows the international standard of four

climatological seasons with some local adjustments: winter (December, January and
February), summer (March, April and May), a monsoon rainy season (June to
September), and a post-monsoon period (October to November).

A product of southeast trade winds originating from a high-pressure mass centred over

the southern Indian Ocean, the monsoonal torrents supply over 80% of India's annual
rainfall.
Rainfall Analysis:

New monthly, seasonal and annual rainfall time series of 36 meteorological

subdivisions of India were constructed using the monthly rainfall data for the
period 1901–2015 of fixed network of 1476 rain gauge stations .
Related Work:
BIG DATA AND HADOOP

 BIG DATA
Big Data is a term used to suggest huge data sets (several gigabytes / terabytes /
petabytes) of data. The data is so large and complex that it would become difficult to process
using traditional data processing applications. Big data requires a new set of tools,
applications and frameworks to process and manage data.

 Evolution of Data/Big Data

Data has always been around and there has been a need to store, process and manage
data since the beginning of modern human civilization. However, the amount of data
captured, stored, processed and managed depends on various factors including the necessity
felt by humans for certain information, available tools and technologies needed for making
decisions based on the data analysis and so on.
In today’s world, due to advancements of technology there is a huge (several
terabytes /petabytes) amount of data that’s being constantly captured [4]. Natural curiosity
about truly important things, like whether more teenagers like Justin Bieber than millennials,
demand processing Twitter data, which is huge.

 Characteristics of Big Data

 VOLUME

Volume refers to the size of the data that the user is working with. Due to
advancements of technology the amount of data that is being generated is growing rapidly.
Data is spread across different places, different formats, in large volumes ranging from
gigabytes to terabytes to petabytes. Data is not only being generated from humans but by
machines too. Now-a-days the data that’s getting generated from machines is surpassing that
data that’s being generated by Humans Weather data is a good example.

 VARIETY

Variety refers to different formats in which data is getting generated. Apart from
structured data like spreadsheets and traditional flat files, there is a large amount of
unstructured data that’s being generated in the form of weblogs, sensor data, social media,
etc. Enterprises are making use of both structured and unstructured data for data analysis, and
thereby making better business decisions to stay competitive

 VELOCITY

Velocity refers to the speed at which data is getting generated. Different applications
in different fields have different requirements. So we see data getting generated at different
speeds based on the application requirements.
 HADOOP

Hadoop is an open source framework. It is capable of processing large amounts of

data sets in a distributed fashion across clusters using a simplified programming model.
Hadoop provides a reliable way to store, process and analyze the data.

 Hadoop Architecture
Hadoop works in a master-slave fashion. Hadoop has two core components-
HDFS(Hadoop Distributed File System) and MapReduce.

 Hadoop Components

 HDFS (HADOOP DISTRIBUTED FILE SYSTEM)

HDFS offers a reliable and distributed storage. It replicates the data across multiple
nodes on clouds or commodity computers. Unlike a regular file system, when data is pushed
into HDFS, it internally splits into multiple data blocks (configurable parameter with default
size of 64Mb). Each incoming file is broken into 64Mb data block by default and all blocks
which make up the file are of the same size 64Mb except the last block which could be less
than 64Mb depending upon the size of the incoming file.
HDFS also replicates (replication rate is configurable) the data across various data
nodes thus ensuring fault tolerance and reliability. It also ensures that a replication factor is
maintained, so that if a node goes down, one can recover by using replicated data on other
nodes.
HDFS is capable of storing large amounts of data which can be structured or unstructured.
The computers present in the cluster can be present in any location and there is no
physical location dependency. HDFS works in a master/slave fashion.

NameNode: NameNode is a master component and holds the information about all
other nodes in the Hadoop cluster, files present and their locations in the cluster.
There is only one NameNode per cluster.

DataNode: DataNode is a slave node and holds the user data in the form of data blocks.
There can be a lot of DataNodes in a Hadoop cluster.

 MAPREDUCE
MapReduce offers a framework/analysis system which performs complex
computations on large datasets in a parallelized fashion. This system breaks down the
complex computations into multiple smaller tasks and assigns those to individual slave nodes
and takes care of the co-ordination and consolidation of the results [4]. These tasks run
independently on various nodes across the cluster. There are primarily two types of tasks:
Map tasks and Reduce tasks.

As in HDFS, MapReduce (computation part) also works in master/slave fashion.

JobTracker: Keeps track of the tasks assigned and co-ordinates the exchange of
information and the results with the slave nodes. Its responsibility also includes
rescheduling of failed tasks and monitoring the overall progress of the job. There is
only one JobTracker per cluster.

TaskTracker: Acts as a slave and is responsible for running the tasks assigned by the
JobTracker and providing the results back to the JobTracker. There can be multiple
TaskTracker nodes that can exist in a cluster.
Figure 2.3. Data processing using MapReduce framework.

FileInputFormat: This is the input file/data which need to be processed.

Split: Hadoop splits the incoming data into several blocks.

RecordReader: RecordReader helps to read the data line by line and converts into
key/value pairs to be passed as the input to the Mapper.

Mapper: Mapper contains the logic to process input data. The Map function
transforms the input records to intermediate records.

Combiner: This is an optional step often used to improve the performance by

reducing the data to be transferred across the network .

Shuffle: Output of all the mappers is collected, shuffled and sorted, to be sent to the
Reducer.
Reducer: Reducer applies logic to aggregate the data and provide it to an
FileOutputFormat class.

FileOutputFormat: It is a pre-defined class provided by the MapReducer framework

through which final output can be written to HDFS.

 Hadoop Characteristics

 Hadoop provides a reliable shared storage system (HDFS) and data analysis
system (MapReduce) .
 Cost effective, as it can work with commodity hardware and doesn’t need
expensive hardware.
 Flexible and can process both structured as well as un-structured data sets.
 Is optimized for large and very large data sets. It takes a lot less data processing
time due to parallel processing, when compared with traditional data base
management systems.
 Very scalable. As a result the Hadoop cluster can contain hundreds or thousands
of servers.
 Provides a very reliable system as data is replicated across multiple nodes
(replication factor is configurable).
 Hive
Hive is a data warehouse infrastructure which is built on top of the Hadoop
distributed system and it provides tools to enable easy ETL to join, aggregate and filter
different data sets. It also allows programmers to build custom MapReduce functionalities.
Hive provides an SQL like query interface called HiveQL which internally does MapReduce
operations. Hive is extremely useful when processing large amounts of data (terabytes). Hive
is easier to use as it abstracts the complexity of Hadoop. Lots of companies support Hive, a
simple reason being to encourage SQL based queries on top of Hadoop.

 HIVE ARCHITECTURE

Figure 2.4. Hive Architecture block diagram.

When a user logs in to Hive terminal through a CLI (command line interface) or a
Web graphical user interface, it directly connects to Hive drivers through the Thrift server.
The queries which are written by users are received by drivers and sent to Hadoop, where
Hadoop gets the data and divides the work using NameNode, DataNode, JobTracker and
TaskTracker.

 HIVE COMPONENTS

– Thrift server
This component is optional. This allows a remote client to submit requests to Hive to
retrieve results. A variety of programming languages can be used to accomplish this .

– Driver
Driver is a very important component that takes all the requests from CLI (command
line interface) or a web interface, or the Thrift server, and does the compilation, optimization
and execution of the data.
– Meta Store
This component stores all the structure information of various tables and partitions in
the warehouse including column and column type information, serilizers and de-serializers
necessary to read and write data and corresponding HDFS files where the data is stored .

– HDFS
All data is stored in HDFS. A detailed explanation of HDFS has been provided in
section [Link]. Hive currently uses HDFS as its execution engine.

 Disadvantages of Hive
- It’s not designed for online transaction processing.
- There is a built-in latency for every job.
- When Hive compiles a query into a set of MapReduce jobs it has to co-ordinate
and launch the jobs on the cluster.
 Pig

Pig is a high level data flow system that provides a simple language popularly known
as Pig Latin that can be used for manipulating data and queries. Pig Hadoop was developed
by Yahoo in the year 2006 such that they can have an ad-hoc method for creating and
executing Map Reduce jobs on huge data sets . Pig has relational database features and is
built on top of Hadoop and makes it easier to clean and analyze big data sets without having
to write vanilla MapReduce jobs in Hadoop . The Pig tool itself converts all high level operations
into MapReduce jobs. It follows
a multi query approach and helps cut down the number of times the data is scanned.
Performance of Pig is on par with the performance of raw MapReduce. The Pig programs
structure is amenable to substantial parallelization which enables them to handle very large
data sets . Pig could be used for ETL tasks naturally as it can handle unstructured data.

 PIG ARCHITECTURE

Figure 2.5. Pig Architecture block diagram.

 PIG LATIN COMPILER

The Pig Latin compiler converts the Pig Latin code into executable code. The
executable code is in the form of MapReduce jobs .
The sequence of MapReduce programs enables Pig programs to do data processing
and analysis in parallel.
 BENEFITS OF PIG
 Learning curve is not steep.
 Decrease in development time when compared with the vanilla MapReduce jobs
due to reduced complexity and maintenance needs .
 Helps with faster prototyping of algorithms due to the ease of using the Pig Latin
language.
 Effective for unstructured data.
 It’s procedural. Provides better expressiveness in the transformation of data at
every step .

 DISADVANTAGES OF PIG
 It is not very mature. Even though it has been around for quite some time, it’s still
in development.
 Doesn’t clearly distinguish the type of error. It just gives an execution error when
something goes wrong. Doesn’t specify if it’s a syntax error or run time error or
type error.
 Support: Google and Stack overflow doesn’t generally lead to good solutions for
problems.
 Typically for complex business logic involving encryption of data, Pig is not
used. Java API for cryptography is picked over Pig in such cases.
Objective:

 Analysis of Rainfall scenario at national level

 Rainfall in India
 State & Year wise distribution Month distribution of
rainfall.
 Areas/States with max & min Rainfall in India.
 Analysis of Rainfall scenario at state level
 Rainfall in India Mentioned in points
Methodology:

[Link] Flume is a distributed, reliable, and available service for efficiently

collecting, aggregating, and moving large amounts of streaming data into
the Hadoop Distributed File System (HDFS).

[Link] is a processing technique and a program model for distributed

computing based on java. The MapReduce algorithm contains two important tasks,
namely Map and Reduce. Map takes a set of data and converts it into another set of
data, where individual elements are broken down into tuples (key/value pairs).
Secondly, reduce task, which takes the output from a map as an input and
combines those data tuples into a smaller set of tuples. As the sequence of the
name MapReduce implies, the reduce task is always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data processing over
multiple computing nodes. Under the MapReduce model, the data processing
primitives are called mappers and reducers. Decomposing a data processing
application into mappers and reducers is sometimes nontrivial. But, once we write
an application in the MapReduce form, scaling the application to run over
hundreds, thousands, or even tens of thousands of machines in a cluster is merely a
configuration change. This simple scalability is what has attracted many
programmers to use the MapReduce model.

[Link] -Dataflow
The frozen part of the MapReduce framework is a large distributed sort. The hot
spots, which the application defines are:

 An input reader
 A Map function
 A partition function
 A compare function
 A Reduce function
 An output writer
Input reader
The input reader divides the input into appropriate size 'splits' (in practice typically
64 MB to 128 MB) and the framework assigns

one split to each Map function. The input reader reads data from stable storage

(typically a distributed file system) and generates key/value pairs.

A common example will read a directory full of text files and return each line as a
record.

Map function
The Map function takes a series of key/value pairs, processes each, and generates
zero or more output key/value pairs. The input and output types of the map can be
(and often are) different from each other.
If the application is doing a word count, the map function would break the line into
words and output a key/value pair for each word. Each output pair would contain
the word as the key and the number of instances of that word in the line as the
value.

Partition function
Each Map function output is allocated to a particular reducer by the
application's partition function for sharding purposes. The partition function is
given the key and the number of reducers and returns the index of the
desired reducer.
A typical default is to hash the key and use the hash value modulo the number
of reducers. It is important to pick a partition function that gives an approximately
uniform distribution of data per shard for load-balancing purposes, otherwise the
MapReduce operation can be held up waiting for slow reducers to finish (i.e. the
reducers assigned the larger shares of the non-uniformly partitioned data).
Between the map and reduce stages, the data are shuffled (parallel-sorted /
exchanged between nodes) in order to move the data from the map node that
produced them to the shard in which they will be reduced. The shuffle can
sometimes take longer than the computation time depending on network
bandwidth, CPU speeds, data produced and time taken by map and reduce
computations.
Comparison function
The input for each Reduce is pulled from the machine where the Map ran and
sorted using the application's comparison function.

Reduce function
The framework calls the application's Reduce function once for each unique key in
the sorted order. The Reduce can iterate through the values that are associated with
that key and produce zero or more outputs.
In the word count example, the Reduce function takes the input values, sums them
and generates a single output of the word and the final sum

Techniques:

We will be using Hadoop framework to build the project. The Hadoop framework
consists of the many tools like HDFS, MapReduce, Pig, Sqoop, Flume and many
others.

HDFS is used to store the data; MapReduce is a data processing framework. Pig on
the same lines is a processing framework, but code is written in PigLatin.

Whereas code in Mapreduce is written in Java. Our data will be copied in HDFS
and will be processed using Mapreduce as well as HDFS.
Hardware and Software

Hardware:

- 8 GB RAM
- 2.5 GHz clock speed
- 10 GB HDD
- i5 Processor
- Graphics card supported with Virtualization

Software:

Linux

 Hadoop Distributed File System

 MapReduce
 Pig
 Cloudera

Windows

 VM Workstation
 MS Excel
 Notepad++
References:
 [Link]
er_India

 [Link]
term_rainfall_trends_in_India

 [Link]
mperature_and_Rainfall_in_India

 [Link]
_Analysis_of_long-term_rainfall_trends_in_India.pdf

 [Link]

Rainfall Analysis Project in India
No ratings yet
Rainfall Analysis Project in India
21 pages
Overview of Hadoop Ecosystem
No ratings yet
Overview of Hadoop Ecosystem
7 pages
Overview of Hadoop Framework and Features
No ratings yet
Overview of Hadoop Framework and Features
15 pages
Introduction to Hadoop and File Systems
No ratings yet
Introduction to Hadoop and File Systems
47 pages
Hadoop Frameworks and Data Visualization
No ratings yet
Hadoop Frameworks and Data Visualization
71 pages
Hadoop in Cloud Computing Overview
No ratings yet
Hadoop in Cloud Computing Overview
27 pages
Big Data and Mapreduce Challenges, Opportunities and Trends
No ratings yet
Big Data and Mapreduce Challenges, Opportunities and Trends
9 pages
Understanding Big Data and Hadoop
No ratings yet
Understanding Big Data and Hadoop
22 pages
Big Data Analytics Exam Guide
No ratings yet
Big Data Analytics Exam Guide
15 pages
Big Data and Hadoop Architecture Overview
100% (1)
Big Data and Hadoop Architecture Overview
9 pages
Big Data Characteristics and Hadoop Overview
No ratings yet
Big Data Characteristics and Hadoop Overview
15 pages
Introduction to Hadoop Monitoring
No ratings yet
Introduction to Hadoop Monitoring
34 pages
Hadoop vs RDBMS: Key Differences Explained
No ratings yet
Hadoop vs RDBMS: Key Differences Explained
23 pages
Big Data Ecosystem & Hadoop Guide
No ratings yet
Big Data Ecosystem & Hadoop Guide
31 pages
Understanding Hadoop and HDFS Architecture
No ratings yet
Understanding Hadoop and HDFS Architecture
23 pages
Understanding Apache Hadoop Framework
No ratings yet
Understanding Apache Hadoop Framework
19 pages
Understanding HDFS Architecture
No ratings yet
Understanding HDFS Architecture
18 pages
BDA Module 2: Hadoop Components Overview
No ratings yet
BDA Module 2: Hadoop Components Overview
40 pages
Cloud and Grid Computing Features
No ratings yet
Cloud and Grid Computing Features
10 pages
HDFS Architecture and MapReduce Overview
No ratings yet
HDFS Architecture and MapReduce Overview
6 pages
Weather Data Analysis with Hadoop
No ratings yet
Weather Data Analysis with Hadoop
9 pages
Understanding Hadoop Architecture and Benefits
No ratings yet
Understanding Hadoop Architecture and Benefits
30 pages
Introduction to Hadoop and Its Ecosystem
No ratings yet
Introduction to Hadoop and Its Ecosystem
84 pages
Big Data Storage and Processing Overview
No ratings yet
Big Data Storage and Processing Overview
18 pages
Introduction to Hadoop and Big Data
No ratings yet
Introduction to Hadoop and Big Data
100 pages
Hadoop Programming Model Overview
No ratings yet
Hadoop Programming Model Overview
53 pages
MapReduce in Big Data Explained
No ratings yet
MapReduce in Big Data Explained
10 pages
Understanding Hadoop in Big Data Systems
No ratings yet
Understanding Hadoop in Big Data Systems
50 pages
Key Features of Hadoop Explained
No ratings yet
Key Features of Hadoop Explained
12 pages
Big Data Processing Tools Overview
No ratings yet
Big Data Processing Tools Overview
15 pages
Big Data and Hadoop Overview
No ratings yet
Big Data and Hadoop Overview
13 pages
Hadoop Overview and Configuration Guide
No ratings yet
Hadoop Overview and Configuration Guide
8 pages
History and Components of Hadoop
No ratings yet
History and Components of Hadoop
11 pages
NoSQL and Hadoop for Big Data Solutions
No ratings yet
NoSQL and Hadoop for Big Data Solutions
22 pages
Data Processing with Hadoop Overview
No ratings yet
Data Processing with Hadoop Overview
23 pages
CAP Theorem in Big Data and NoSQL
No ratings yet
CAP Theorem in Big Data and NoSQL
16 pages
Overview of Hadoop Components
No ratings yet
Overview of Hadoop Components
19 pages
Introduction to Hadoop and Big Data
No ratings yet
Introduction to Hadoop and Big Data
42 pages
Big Data Processing with Hadoop and Spark
No ratings yet
Big Data Processing with Hadoop and Spark
44 pages
NoSQL and Hadoop for Big Data Processing
No ratings yet
NoSQL and Hadoop for Big Data Processing
19 pages
Introduction to Hadoop and MapReduce
No ratings yet
Introduction to Hadoop and MapReduce
23 pages
Overview of Hadoop Framework Components
100% (1)
Overview of Hadoop Framework Components
19 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Big Data Processing with Apache Hadoop
No ratings yet
Big Data Processing with Apache Hadoop
40 pages
Understanding Hadoop Basics and Benefits
No ratings yet
Understanding Hadoop Basics and Benefits
21 pages
Understanding Big Data and Hadoop Architecture
No ratings yet
Understanding Big Data and Hadoop Architecture
14 pages
Big Data Analytics with Hadoop Overview
No ratings yet
Big Data Analytics with Hadoop Overview
22 pages
Killing EM Dataload in MicroStrategy
No ratings yet
Killing EM Dataload in MicroStrategy
180 pages
Understanding Hadoop Framework Basics
100% (1)
Understanding Hadoop Framework Basics
5 pages
Open Source Distributed File Systems Overview
No ratings yet
Open Source Distributed File Systems Overview
60 pages
Hadoop for Big Data Professionals
No ratings yet
Hadoop for Big Data Professionals
24 pages
Hadoop Overview and GFS Insights
No ratings yet
Hadoop Overview and GFS Insights
37 pages
Understanding Hadoop Architecture
No ratings yet
Understanding Hadoop Architecture
16 pages
Understanding Hadoop and NoSQL Databases
No ratings yet
Understanding Hadoop and NoSQL Databases
4 pages
Effective Conflict Management Strategies
No ratings yet
Effective Conflict Management Strategies
6 pages
Grade 12 Economics Assessment 2025
No ratings yet
Grade 12 Economics Assessment 2025
14 pages
Oracle Techno-Functional Consultant Profile
No ratings yet
Oracle Techno-Functional Consultant Profile
3 pages
Telugu Verbs and Their Meanings
No ratings yet
Telugu Verbs and Their Meanings
25 pages
Saxon Math
100% (3)
Saxon Math
28 pages
Transitive & Intransitive Verbs Worksheet
No ratings yet
Transitive & Intransitive Verbs Worksheet
9 pages
My Passion for Cooking and Creativity
No ratings yet
My Passion for Cooking and Creativity
2 pages
SG3525 PWM Controller Guide
100% (3)
SG3525 PWM Controller Guide
67 pages
Giữa Học Kỳ 1: Ôn Tập Tiếng Anh 7
No ratings yet
Giữa Học Kỳ 1: Ôn Tập Tiếng Anh 7
12 pages
Tyrolean Plaster Specifications
No ratings yet
Tyrolean Plaster Specifications
6 pages
Heat Transfer in Cylinder-Piston System
No ratings yet
Heat Transfer in Cylinder-Piston System
3 pages
Soviet Nuclear Submarine Catastrophe
No ratings yet
Soviet Nuclear Submarine Catastrophe
3 pages
GIMP Photo Editing Techniques Explained
No ratings yet
GIMP Photo Editing Techniques Explained
1 page
M28/M28-I User Manual for Loudspeakers
No ratings yet
M28/M28-I User Manual for Loudspeakers
28 pages
Statistics Review: EDA and Analysis Techniques
No ratings yet
Statistics Review: EDA and Analysis Techniques
21 pages
Indemnity Clauses in Oil & Gas Contracts
No ratings yet
Indemnity Clauses in Oil & Gas Contracts
7 pages
05 LogisticRegression PDF
No ratings yet
05 LogisticRegression PDF
23 pages
Noise By-Laws in Harare
No ratings yet
Noise By-Laws in Harare
7 pages
W.B.C.S Preliminary Model Questions Set 2
No ratings yet
W.B.C.S Preliminary Model Questions Set 2
4 pages
Install AWS CLI v2 on Linux Systems
No ratings yet
Install AWS CLI v2 on Linux Systems
7 pages
Mechanical Engineering Internship Report
No ratings yet
Mechanical Engineering Internship Report
7 pages
Folk Psychological Narratives Explained
No ratings yet
Folk Psychological Narratives Explained
371 pages
Event Timeline
No ratings yet
Event Timeline
5 pages
Weather Journal Guidelines for Students
No ratings yet
Weather Journal Guidelines for Students
2 pages
Minot Airport Airfield Design Standards
No ratings yet
Minot Airport Airfield Design Standards
48 pages
Understanding Earthquake Effects and Causes
No ratings yet
Understanding Earthquake Effects and Causes
58 pages
ISA TR84!00!05 2009 Guidance On The Identification of Safety Instrumented Functions
100% (2)
ISA TR84!00!05 2009 Guidance On The Identification of Safety Instrumented Functions
68 pages
End Prison Expansion in Pennsylvania
No ratings yet
End Prison Expansion in Pennsylvania
2 pages
BS Biology Graduate from Bulacan State University
No ratings yet
BS Biology Graduate from Bulacan State University
1 page
India’s Protected Areas: Ramsar & Reserves
No ratings yet
India’s Protected Areas: Ramsar & Reserves
6 pages

Rainfall Analysis in India Project

Uploaded by

Rainfall Analysis in India Project

Uploaded by

NAAC Accredited ’’A”

Kirti College of Arts, Science And Commerce

“RAINFALL IN INDIA ANALYSIS”

Exam Seat No: 30541

Guided By: Developed By:

[Link] Phadke [Link] Y Patil

This is to certify that the project Synopsis “ RAINFALL ANALYSIS IN INDIA”

Project Guide Head of Department External Examiner

The Climate of India comprises a wide range of weather conditions across a vast

The country's meteorological department follows the international standard of four

A product of southeast trade winds originating from a high-pressure mass centred over

New monthly, seasonal and annual rainfall time series of 36 meteorological

 Evolution of Data/Big Data

 Characteristics of Big Data

Hadoop is an open source framework. It is capable of processing large amounts of

 HDFS (HADOOP DISTRIBUTED FILE SYSTEM)

As in HDFS, MapReduce (computation part) also works in master/slave fashion.

FileInputFormat: This is the input file/data which need to be processed.

Split: Hadoop splits the incoming data into several blocks.

Combiner: This is an optional step often used to improve the performance by

FileOutputFormat: It is a pre-defined class provided by the MapReducer framework

Figure 2.4. Hive Architecture block diagram.

Figure 2.5. Pig Architecture block diagram.

 PIG LATIN COMPILER

 Analysis of Rainfall scenario at national level

[Link] Flume is a distributed, reliable, and available service for efficiently

[Link] is a processing technique and a program model for distributed

one split to each Map function. The input reader reads data from stable storage

 Hadoop Distributed File System

You might also like