0% found this document useful (0 votes)
59 views7 pages

Lab Assignment - Dsc650

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views7 pages

Lab Assignment - Dsc650

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES

BACHELOR OF INFORMATION TECHNOLOGY (HONS.)

LAB ASSIGNMENT 1

DSC650 DATA TECHNOLOGY AND FUTURE EMERGENCE


PREPARED BY:

NO NAME CLASS STUDENT ID

2 NUR IZZAH INSYIRAH BINTI SAIFUL CS2406A 2022782509


BAHARI

3 ALIA MAISARAH BINTI ABDUL RAHMAN CS2406A 2022987895

5 MOHAMMAD ARIEF HAKIMI BIN MOHD CS2406A 2022786381


AZDI

PREPARED FOR:
WAN SAIFUL’AZZAM BIN WAN ISMAIL
TABLE OF CONTENT

1.0 TASK 1
2.0 TASK 2
REFERENCES
1.0 TASK 1

Refer Module L4A Running Hadoop with MR.docx. Do the following instructions (a to e).
All outcomes (screenshots) must be shown in the answer sheet together with steps
description.

a. Select an article—any article.

Title: A Critical Analysis of the Article "You Can't Blame the Education System Once
You've Reached University" by Yeung (2015)

Abstract:
The purpose of this analysis is to assess the key arguments presented in the article and to
evaluate the author's perspective on the relationship between the education system and
university-level education. Yeung's article delves into the notion that responsibility for one's
academic success should shift from the education system to the individual upon entering
university. This paper aims to explore the author's main points, assess the validity of the
arguments, and offer insights into the broader implications of such a perspective.
b. Convert the file into a .CSV/.txt format.

Figure 1.1 Figure Process of Transfer PDF file to txt Format

To convert the document to a .CSV or .txt format using Adobe Acrobat Reader, open the
PDF file, navigate to the "File" menu, select "Save As," choose the desired location, select
either Comma Separated Values (.csv) or Text (Plain) (.txt) as the format, enter a filename,
and click "Save."

c. Perform a word count on the chosen article.

In the lab assignment, we are conducting a word count analysis on the "article.txt" file to
unveil the frequency of each word within the text. By employing the Hadoop MapReduce
programming model, we tokenize the words using a Mapper function that emits key-value
pairs (word, 1), and then aggregate these counts with a Reducer function. The process
involves loading the text file into HDFS, executing the MapReduce job, and reviewing the
results. This analysis not only helps us comprehend the prevalence of specific terms but also
serves as a practical application of distributed data processing principles. Through this
exercise, we gain insights into the textual nuances of the article, fostering a deeper
understanding of its thematic focus and prominent vocabulary.

Figure 1.2 Figure of Word Count


d. Identify the most significant words in your selected article.

It appears that the word "the" is the most frequent in the analyzed article, occurring 28 times.

e. Choose the top 10 most important words based on their word count.

To determine the top 10 most important words based on their word count using Hadoop
MapReduce.

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

2.0 TASK 2
a. How many mappers and how many reducers have been used?

Figure 2.1 Map Task and Reduce Task

● Launched map tasks = 1: This line indicates that there was only one map task (or
mapper) launched in the MapReduce job. A map task is responsible for processing a
portion of the input data.
● Launched reduce tasks = 2: This line indicates that there were two reduce tasks (or
reducers) launched in the MapReduce job. Reduce tasks take the intermediate output
generated by the map tasks, perform further processing, and produce the final output.

b. Illustrate the process of MapReduce in performing task 1 using a MapReduce


diagram. Ensure that the number of mappers and reducers is consistent with the
original process.

Figure 2.2 Illustration of MapReduce

1. The "Input Data" is divided into equal-sized splits, and each split is processed by a
separate map task. In your case, there is only one map task (Mapper 1).
2. Mapper 1 processes its input data and produces intermediate key-value pairs. These
intermediate results are shuffled and sorted to be grouped by key, creating the
"Intermediate Data."
3. The "Intermediate Data" is then processed by another map task (Mapper 2). In your
case, there is no explicit mention of a second map task in the logs, but for the sake of
illustrating the general MapReduce flow, I've included it.
4. Mapper 2 also produces intermediate key-value pairs, which are again shuffled and
sorted, creating a new set of "Intermediate Data."
5. The "Intermediate Data" is then processed by two reduce tasks (Reducer 1 and
Reducer 2), as indicated by the logs. Each reducer produces a part of the final output.
6. The output from Reducer 1 and Reducer 2 is the "Final Output Data."

REFERENCES

Yeung, L. (2015, August 31). You can't blame the education system once you've reached
university. South China Morning Post. Retrieved from
http://www.scmp.com/lifestyle/families/article/1853409/you-cant-blame-education-system-
once-youve-reached-university

You might also like