MapReduce Program - Weather Data Analysis For Analyzing Hot And Cold Days

In this article, we demonstrate how a MapReduce program can process large-scale weather datasets to identify temperature extremes. By harnessing Hadoop’s parallel processing capabilities, program efficiently pinpoints hot and cold days an essential step for climate trend analysis, anomaly detection and building reliable forecasting systems.

Problem Statement

Analyze semi-structured weather data collected by sensors globally. We will focus on temperature values (maximum and minimum) and identify hot days (temperature > 30°C) and cold days (temperature < 15°C) using MapReduce.

Dataset Overview

We used weather data from the NCEI, available in line-based ASCII text format. Each file contains fields like Date, Latitude, Longitude, Max Temp and Min Temp.

FileName: CRND0103-2020-AK_Fairbanks_11_NE.txt. Download the file from here.

Step-by-Step Implementation

This section walks you through the implementation of the MapReduce program to extract hot and cold days from large-scale weather data using Hadoop.

Step 1: Understand Data Format

Below is the example of our dataset where column 6 and column 7 is showing Maximum and Minimum temperature, respectively.

minnimum-and-maximum-temprature-field-in-dataset

Step 2: Set Up Java Project

Make a project in Eclipse with below steps:

First Open Eclipse -> then, select File -> New -> Java Project -> Name it MyProject -> then, select use an execution environment -> choose, JavaSE-1.8 then, next -> Finish.

In this Project Create Java class with name MyMaxMin -> then, click Finish.

Step 3: Java Source Code

Copy the below source code to this MyMaxMin java class

JAVA

// Required imports for Hadoop MapReduce
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.conf.Configuration;

// Main class
public class MyMaxMin {

    // Mapper class: Extracts max and min temperature from each line
    public static class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, Text> {

        // Sentinel value used in dataset to represent missing temperature
        public static final int MISSING = 9999;

        // Map method called for each line in the input file
        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {

            String line = value.toString(); // Convert line to string

            if (line.length() != 0) {  // Skip empty lines
                // Extract date from line (characters 6 to 14)
                String date = line.substring(6, 14);

                // Extract and trim max and min temperatures
                float temp_Max = Float.parseFloat(line.substring(39, 45).trim());
                float temp_Min = Float.parseFloat(line.substring(47, 53).trim());

                // If max temperature is valid and > 30°C, consider it a hot day
                if (temp_Max != MISSING && temp_Max > 30.0) {
                    context.write(new Text("Hot Day: " + date), new Text(String.valueOf(temp_Max)));
                }

                // If min temperature is valid and < 15°C, consider it a cold day
                if (temp_Min != MISSING && temp_Min < 15.0) {
                    context.write(new Text("Cold Day: " + date), new Text(String.valueOf(temp_Min)));
                }
            }
        }
    }

    // Reducer class: Simply passes through the (key, value) pairs from mapper
    public static class MaxTemperatureReducer extends Reducer<Text, Text, Text, Text> {
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            // Write each value to the output (usually only one value per key in this case)
            for (Text val : values) {
                context.write(key, val);
            }
        }
    }

    // Driver method: Configures and starts the MapReduce job
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();  // Create Hadoop job configuration
        Job job = Job.getInstance(conf, "Weather Analysis");  // Initialize job with name

        job.setJarByClass(MyMaxMin.class);  // Set main class
        job.setMapperClass(MaxTemperatureMapper.class);  // Set mapper class
        job.setReducerClass(MaxTemperatureReducer.class);  // Set reducer class

        // Set output types for mapper
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        // Set input/output formats
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        // Set input and output file paths from command-line arguments
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // Submit job and exit based on completion status
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Step 4: Add External JARs

To ensure imported packages work correctly, you need to add external JAR files to your project. Download the Hadoop Common and Hadoop MapReduce Core JAR files that match your installed Hadoop version.

Check Hadoop version with below command:

hadoop version

Now, to add external jars to MyProject:

Right Click on MyProject -> then, Build Path -> Click on, Configure Build Path and select Add External jars then Add jars from it's download location then click -> Apply and Close.

adding-external-jar-files-to-our-project

Step 5: Export Project as JAR

Now export the project as jar file.

Right-click on MyProject choose Export -> go to, Java -> JAR file -> click, Next then, choose your export destination then click -> Next.

Choose Main Class as MyMaxMin by clicking -> Browse and then click -> Finish -> Ok.

Step 6: Start Hadoop Services

Start HDFS and YARN daemons:

start-dfs.sh
start-yarn.sh

Step 7: Move Dataset to HDFS

Command:

hdfs dfs -put /path/to/CRND0103-2020-AK_Fairbanks_11_NE.txt /

To verify:

hdfs dfs -ls /

Step 8: Run the MapReduce Job

Now Run your Jar File with below command and produce the output in MyOutput File.

Syntax:

hadoop jar /path/to/Project.jar /input_file_in_HDFS /output_directory

Example:

hadoop jar /home/user/Documents/Project.jar /CRND0103-2020-AK_Fairbanks_11_NE.txt /MyOutput

Step 9: View Output

After the MapReduce job completes, you can check the final results through the Hadoop web interface.

Visit:

http://localhost:50070/

Then navigate to: Utilities -> Browse the file system -> /MyOutput -> part-r-00000.

Download the result file.

Step 10: Interpret Output

Each line in the output shows:

Label: Hot Day or Cold Day
Date: yyyyMMdd format (e.g., 20200101 = Jan 1, 2020)
Temperature reading