0% found this document useful (0 votes)

59 views7 pages

Lab Assignment - Dsc650

Uploaded by

ALIA MAISARAH ABDUL RAHMAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views7 pages

Lab Assignment - Dsc650

Uploaded by

ALIA MAISARAH ABDUL RAHMAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES

BACHELOR OF INFORMATION TECHNOLOGY (HONS.)

LAB ASSIGNMENT 1

DSC650 DATA TECHNOLOGY AND FUTURE EMERGENCE

PREPARED BY:

NO NAME CLASS STUDENT ID

2 NUR IZZAH INSYIRAH BINTI SAIFUL CS2406A 2022782509

BAHARI

3 ALIA MAISARAH BINTI ABDUL RAHMAN CS2406A 2022987895

5 MOHAMMAD ARIEF HAKIMI BIN MOHD CS2406A 2022786381

AZDI

PREPARED FOR:
WAN SAIFUL’AZZAM BIN WAN ISMAIL
TABLE OF CONTENT

1.0 TASK 1
2.0 TASK 2
REFERENCES
1.0 TASK 1

Refer Module L4A Running Hadoop with MR.docx. Do the following instructions (a to e).
All outcomes (screenshots) must be shown in the answer sheet together with steps
description.

a. Select an article—any article.

Title: A Critical Analysis of the Article "You Can't Blame the Education System Once
You've Reached University" by Yeung (2015)

Abstract:
The purpose of this analysis is to assess the key arguments presented in the article and to
evaluate the author's perspective on the relationship between the education system and
university-level education. Yeung's article delves into the notion that responsibility for one's
academic success should shift from the education system to the individual upon entering
university. This paper aims to explore the author's main points, assess the validity of the
arguments, and offer insights into the broader implications of such a perspective.
b. Convert the file into a .CSV/.txt format.

Figure 1.1 Figure Process of Transfer PDF file to txt Format

To convert the document to a .CSV or .txt format using Adobe Acrobat Reader, open the
PDF file, navigate to the "File" menu, select "Save As," choose the desired location, select
either Comma Separated Values (.csv) or Text (Plain) (.txt) as the format, enter a filename,
and click "Save."

c. Perform a word count on the chosen article.

In the lab assignment, we are conducting a word count analysis on the "article.txt" file to
unveil the frequency of each word within the text. By employing the Hadoop MapReduce
programming model, we tokenize the words using a Mapper function that emits key-value
pairs (word, 1), and then aggregate these counts with a Reducer function. The process
involves loading the text file into HDFS, executing the MapReduce job, and reviewing the
results. This analysis not only helps us comprehend the prevalence of specific terms but also
serves as a practical application of distributed data processing principles. Through this
exercise, we gain insights into the textual nuances of the article, fostering a deeper
understanding of its thematic focus and prominent vocabulary.

Figure 1.2 Figure of Word Count

d. Identify the most significant words in your selected article.

It appears that the word "the" is the most frequent in the analyzed article, occurring 28 times.

e. Choose the top 10 most important words based on their word count.

To determine the top 10 most important words based on their word count using Hadoop
MapReduce.

10.

2.0 TASK 2
a. How many mappers and how many reducers have been used?

Figure 2.1 Map Task and Reduce Task

● Launched map tasks = 1: This line indicates that there was only one map task (or
mapper) launched in the MapReduce job. A map task is responsible for processing a
portion of the input data.
● Launched reduce tasks = 2: This line indicates that there were two reduce tasks (or
reducers) launched in the MapReduce job. Reduce tasks take the intermediate output
generated by the map tasks, perform further processing, and produce the final output.

b. Illustrate the process of MapReduce in performing task 1 using a MapReduce

diagram. Ensure that the number of mappers and reducers is consistent with the
original process.

Figure 2.2 Illustration of MapReduce

1. The "Input Data" is divided into equal-sized splits, and each split is processed by a
separate map task. In your case, there is only one map task (Mapper 1).
2. Mapper 1 processes its input data and produces intermediate key-value pairs. These
intermediate results are shuffled and sorted to be grouped by key, creating the
"Intermediate Data."
3. The "Intermediate Data" is then processed by another map task (Mapper 2). In your
case, there is no explicit mention of a second map task in the logs, but for the sake of
illustrating the general MapReduce flow, I've included it.
4. Mapper 2 also produces intermediate key-value pairs, which are again shuffled and
sorted, creating a new set of "Intermediate Data."
5. The "Intermediate Data" is then processed by two reduce tasks (Reducer 1 and
Reducer 2), as indicated by the logs. Each reducer produces a part of the final output.
6. The output from Reducer 1 and Reducer 2 is the "Final Output Data."

REFERENCES

Yeung, L. (2015, August 31). You can't blame the education system once you've reached
university. South China Morning Post. Retrieved from
http://www.scmp.com/lifestyle/families/article/1853409/you-cant-blame-education-system-
once-youve-reached-university

Ict550 Project
No ratings yet
Ict550 Project
13 pages
ITS665 ISP565 Data Mining Project 2023
No ratings yet
ITS665 ISP565 Data Mining Project 2023
10 pages
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
No ratings yet
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
6 pages
Imr606 Case Study Grouping Assignment
No ratings yet
Imr606 Case Study Grouping Assignment
41 pages
Learning From Failure Hrm645 Script & Outline
No ratings yet
Learning From Failure Hrm645 Script & Outline
12 pages
Isp 550 Use Case Descriptin
No ratings yet
Isp 550 Use Case Descriptin
9 pages
Ict606 - Cartbot
No ratings yet
Ict606 - Cartbot
16 pages
Case Study Planning - Group6 - Am2283i
No ratings yet
Case Study Planning - Group6 - Am2283i
18 pages
E-Commerce Proposal for Khepok Tokku
No ratings yet
E-Commerce Proposal for Khepok Tokku
28 pages
Csc645-July 2024
No ratings yet
Csc645-July 2024
7 pages
LCC402
No ratings yet
LCC402
4 pages
Itt420 - Group 9
No ratings yet
Itt420 - Group 9
64 pages
Elc092 Persuasive Outline
No ratings yet
Elc092 Persuasive Outline
4 pages
CSC584 Storyboard D'JIA
No ratings yet
CSC584 Storyboard D'JIA
15 pages
Assignment 1 - Group 9 - Ba2523a
No ratings yet
Assignment 1 - Group 9 - Ba2523a
45 pages
Ict550 Final Assessment
No ratings yet
Ict550 Final Assessment
4 pages
Business Intelligence Essentials
No ratings yet
Business Intelligence Essentials
4 pages
DSC651 - Assignment 1
No ratings yet
DSC651 - Assignment 1
5 pages
Critical Article Analysis Guide
0% (1)
Critical Article Analysis Guide
1 page
Strategic Audit Report - GD Express Berhad
No ratings yet
Strategic Audit Report - GD Express Berhad
39 pages
Jurnal Boh
No ratings yet
Jurnal Boh
13 pages
Faculty of Computer Science and Mathematics Sta555: Fundamentals of Data Mining
No ratings yet
Faculty of Computer Science and Mathematics Sta555: Fundamentals of Data Mining
50 pages
Assignment MKT
64% (11)
Assignment MKT
36 pages
UCS422 Cyber Security Awareness Campaign
No ratings yet
UCS422 Cyber Security Awareness Campaign
13 pages
CSC584 Project
No ratings yet
CSC584 Project
14 pages
Csc584 Assignment 2 Nur Maisarah Binti Nor Azharludin 2019294714
No ratings yet
Csc584 Assignment 2 Nur Maisarah Binti Nor Azharludin 2019294714
44 pages
Group Project: E-Commerce Sales Plan
No ratings yet
Group Project: E-Commerce Sales Plan
1 page
Notes F.berc 1 - Application Form
No ratings yet
Notes F.berc 1 - Application Form
10 pages
Individual Assignment 1 Opm545 Nur Amirah BT Nasharuddin 2018286896 PDF
No ratings yet
Individual Assignment 1 Opm545 Nur Amirah BT Nasharuddin 2018286896 PDF
9 pages
ICT552 Individual Assignment Oct 2022
No ratings yet
ICT552 Individual Assignment Oct 2022
7 pages
Ims606 Elms
No ratings yet
Ims606 Elms
68 pages
ELC550 Annotated Biblography (Sample Article and Question)
No ratings yet
ELC550 Annotated Biblography (Sample Article and Question)
2 pages
Example Imc651
No ratings yet
Example Imc651
46 pages
Individual Assignment Mkt558
No ratings yet
Individual Assignment Mkt558
15 pages
ICT450 SQL Exercises Overview
No ratings yet
ICT450 SQL Exercises Overview
12 pages
DGM541 Syifa - 3B - Individual Case Study
No ratings yet
DGM541 Syifa - 3B - Individual Case Study
7 pages
Report of Site Visit in ICT554
No ratings yet
Report of Site Visit in ICT554
17 pages
Isp542 Part A
No ratings yet
Isp542 Part A
3 pages
Asm657 Individual Assignment
100% (1)
Asm657 Individual Assignment
7 pages
Isp565: Data Mining Assingment 2: NO. Name Matric No
No ratings yet
Isp565: Data Mining Assingment 2: NO. Name Matric No
21 pages
UBM599 Digital Workforce Insights
0% (1)
UBM599 Digital Workforce Insights
5 pages
Csc577 - SDD Software Design Documents
No ratings yet
Csc577 - SDD Software Design Documents
23 pages
Imd262 Article Review (Individual Assignment)
No ratings yet
Imd262 Article Review (Individual Assignment)
53 pages
Bad Genius: A Study on Cheating Ethics
No ratings yet
Bad Genius: A Study on Cheating Ethics
10 pages
ICT551 Sports Facility Booking Proposal
No ratings yet
ICT551 Sports Facility Booking Proposal
7 pages
Project Management Course Report
No ratings yet
Project Management Course Report
13 pages
Cyber Crime Assignment Oum
No ratings yet
Cyber Crime Assignment Oum
15 pages
Indonesian-Malaysian Confrontation Analysis
No ratings yet
Indonesian-Malaysian Confrontation Analysis
17 pages
Bakery Business Case Study
No ratings yet
Bakery Business Case Study
5 pages
Ent530 - Inahisyam Enterprise
No ratings yet
Ent530 - Inahisyam Enterprise
31 pages
Psychological Safety in Teams
No ratings yet
Psychological Safety in Teams
7 pages
ITT420 TEST1 2020-Hidayah
No ratings yet
ITT420 TEST1 2020-Hidayah
3 pages
Fukuyama Automation SDN BHD V Xin Xin Engineering SDN BHD & Ors
No ratings yet
Fukuyama Automation SDN BHD V Xin Xin Engineering SDN BHD & Ors
34 pages
Aggregate and Material Planning Insights
No ratings yet
Aggregate and Material Planning Insights
3 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Universiti Teknologi Mara College of Computing, Information and Media ICT551 Human Computer Interaction
No ratings yet
Universiti Teknologi Mara College of Computing, Information and Media ICT551 Human Computer Interaction
29 pages
BPR Done Badebom
No ratings yet
BPR Done Badebom
23 pages
DGM541 Individual Self Reflection
No ratings yet
DGM541 Individual Self Reflection
7 pages
Financial Ratio Analysis Guide
No ratings yet
Financial Ratio Analysis Guide
28 pages
Hadoop Architecture & MapReduce Guide
No ratings yet
Hadoop Architecture & MapReduce Guide
7 pages
Alia Maisarah Binti Abdul Rahman - Cs2406a - Term Paper Report
No ratings yet
Alia Maisarah Binti Abdul Rahman - Cs2406a - Term Paper Report
14 pages
Case Study - Shoppee
No ratings yet
Case Study - Shoppee
5 pages
Ent600 3. Technology Blueprint Guidelines - Template
No ratings yet
Ent600 3. Technology Blueprint Guidelines - Template
20 pages
Isp250 FR - Mytecc E-Commerce System
50% (2)
Isp250 FR - Mytecc E-Commerce System
18 pages
Practical Training Report at INTAN
No ratings yet
Practical Training Report at INTAN
36 pages
CSC264 RTX Movie System Proposal
No ratings yet
CSC264 RTX Movie System Proposal
26 pages
MGT 400 Behavioral Management Theory
No ratings yet
MGT 400 Behavioral Management Theory
5 pages
Proposal RTX Csc264
No ratings yet
Proposal RTX Csc264
9 pages
My Virtual Clinic (Proposal)
No ratings yet
My Virtual Clinic (Proposal)
10 pages
Analyzing The Business Case
No ratings yet
Analyzing The Business Case
46 pages
Industrial Training Program Overview
No ratings yet
Industrial Training Program Overview
3 pages
CLASS 9th CBSE COMPUTER TEST - ASSIGNMENT
No ratings yet
CLASS 9th CBSE COMPUTER TEST - ASSIGNMENT
2 pages
CSC 8 - CSE Complete Reviewer For 2019
No ratings yet
CSC 8 - CSE Complete Reviewer For 2019
28 pages
Chrome 94 Enterprise Release Notes
No ratings yet
Chrome 94 Enterprise Release Notes
15 pages
Open Source LiDAR Software Overview
No ratings yet
Open Source LiDAR Software Overview
29 pages
Syllabus-Big Data Visulaization
No ratings yet
Syllabus-Big Data Visulaization
2 pages
Summary of Charges For This Bill Period: Voice S.No Number Start Date/Time Duration (Min:sec) Call Type Charges (RS)
No ratings yet
Summary of Charges For This Bill Period: Voice S.No Number Start Date/Time Duration (Min:sec) Call Type Charges (RS)
7 pages
Digital Citizenship Quiz Results
No ratings yet
Digital Citizenship Quiz Results
1 page
AXIOO
No ratings yet
AXIOO
1 page
Number Guessing Game Project
No ratings yet
Number Guessing Game Project
2 pages
Exploring Latches
No ratings yet
Exploring Latches
14 pages
Perwala Simon Stephen: Education
No ratings yet
Perwala Simon Stephen: Education
1 page
ISI Entrance MOCK 1: 1 Instructions
No ratings yet
ISI Entrance MOCK 1: 1 Instructions
3 pages
Red Teaming Toolkit
No ratings yet
Red Teaming Toolkit
28 pages
4121844
No ratings yet
4121844
48 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
Customer Data Platforms Guide
No ratings yet
Customer Data Platforms Guide
38 pages
saveEditorPS4 en Manual
No ratings yet
saveEditorPS4 en Manual
34 pages
CS402 Quiz-3 by Vu Topper RM
100% (1)
CS402 Quiz-3 by Vu Topper RM
31 pages
Global Virtual Team Dynamics Study
No ratings yet
Global Virtual Team Dynamics Study
20 pages
Latihan Matematik DLP Minggu 1
No ratings yet
Latihan Matematik DLP Minggu 1
3 pages
For More Off Campus Updates and Placement Materials Join Our Telegram Channel Crackkit (Click Here)
No ratings yet
For More Off Campus Updates and Placement Materials Join Our Telegram Channel Crackkit (Click Here)
3 pages
ASM - Notes 3 (Integer Arithmetic)
No ratings yet
ASM - Notes 3 (Integer Arithmetic)
17 pages
Mental Health App for Students
No ratings yet
Mental Health App for Students
10 pages
Privacy-Preserving Data Sharing in Cloud Computing
No ratings yet
Privacy-Preserving Data Sharing in Cloud Computing
14 pages
Bfs - Unit III Short Notes
No ratings yet
Bfs - Unit III Short Notes
9 pages
NSU-104 Lecture 11
No ratings yet
NSU-104 Lecture 11
30 pages
Lecture 4 - Cloud Security
No ratings yet
Lecture 4 - Cloud Security
20 pages
FLP 6 - Pace GK Academy
No ratings yet
FLP 6 - Pace GK Academy
33 pages
Grade 11 E-Waste Upcycling Challenge
No ratings yet
Grade 11 E-Waste Upcycling Challenge
3 pages
Forsa
No ratings yet
Forsa
3 pages

Lab Assignment - Dsc650

Uploaded by

Lab Assignment - Dsc650

Uploaded by

FACULTY OF COMPUTER AND MATHEMATICAL SCIENCES

BACHELOR OF INFORMATION TECHNOLOGY (HONS.)

DSC650 DATA TECHNOLOGY AND FUTURE EMERGENCE

NO NAME CLASS STUDENT ID

2 NUR IZZAH INSYIRAH BINTI SAIFUL CS2406A 2022782509

3 ALIA MAISARAH BINTI ABDUL RAHMAN CS2406A 2022987895

5 MOHAMMAD ARIEF HAKIMI BIN MOHD CS2406A 2022786381

a. Select an article—any article.

Figure 1.1 Figure Process of Transfer PDF file to txt Format

c. Perform a word count on the chosen article.

Figure 1.2 Figure of Word Count

Figure 2.1 Map Task and Reduce Task

b. Illustrate the process of MapReduce in performing task 1 using a MapReduce

Figure 2.2 Illustration of MapReduce

You might also like