DS BigDATA 2ièmeN2TR UVT 2022 2023

The document discusses a homework assignment on big data and Hadoop. It contains two exercises, the first being a multiple choice quiz about Hadoop and big data concepts. The second exercise asks students to write a MapReduce program that finds the highest sales amount for each store, using a sales database stored in HDFS text files with fields like date, store, price, product, and category.

Uploaded by

OUESLATI Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views4 pages

DS BigDATA 2ièmeN2TR UVT 2022 2023

Uploaded by

OUESLATI Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

A.

U : 2022-2023

CLASSES: 2ème année N2TR DUREE :1h

MATIERE : BIGDATA NOMBRE DE PAGE : 5

DEVOIR SURVEILLE
PREPARATION A LA CERTIFICATION : BIG DATA

Exercice 1 ( QCM : Donnez la réponse sur votre feuille) (15pts)

Partie1 ( connaissances générales sur le Big Data) (5.25)

1. Which statement is true about storage of the output of REDUCE task?
Your answer
A It is stored in HDFS using the number of copies specified by replication factor.
B It is stored in HDFS, but only one copy on the. local machine
C It is stored on the local disk.
D It is stored in memory.
2. In the master/slave architecture, what is considered the slave?
Your answer
A DataNode
B FsImage
C NameSpace
D EditLog

3. Which MapReduce task is responsible for reading a portion of input data and producing
<key, value< pairs?
Your answer
A Shuffle
B Reduce
C Map
D Combiner

4. Hadoop is designed for which type of work?

Your answer
A low latency
B random access
C processing many small files
D batch-oriented parallel processing

5. What is the primary benefit of using Hive with Hadoop?

Your answer
A It supports materialized views.
B It provides support for transactions.
C Queries perform much faster than with MapReduce.
D Hadoop data can be accessed through SQL statements.

-1-
A.U : 2022-2023

6. Which two commands are used to copy a data file from the local file system to HDFS?
(Choose two.)
(Please select ALL that apply)
Your answer
A hadoop fs -copyFromLocal test_file test_file
B hadoop fs -put test_file test_file
C hadoop fs -cp test_file test_file
D hadoop fs -get test_file test_file
E hadoop fs -sync test_file test_file

7. Which Big Data technology delivers real-time analytic processing on data in motion?
Your answer
A Data Warehouse
B Hadoop
C Platform computing
D Stream computing

Partie1 ( 9,75pts)
1. Which type of Big Data analysis involves the processing of extremely large volumes of
constantly moving data that is impractical to store?
A. MapReduce
B. Federated Discovery and Navigation
C. Stream Computing
D. Text Analysis

2. Which description identifies the real value of Big Data and Analytics?
A. enabling customers to efficiently index and access large volumes of data
B. providing solutions to help customers manage and grow large database systems
C. using modern technology to efficiently store the massive amounts of data generated
by nsocial networks
D. gaining new insight through the capabilities of the world's interconnected
intelligence

3. What is one of the two technologies that Hadoop uses as its foundation?
A. Apache
B. MapReduce
C. Jaql
D. HBase

4. Which command should be used to list the contents of the root directory in HDFS?
A. hadoop fs list
B. hdfs list /
C. hdfs root
D. hadoop fs -Is /

-2-
A.U : 2022-2023

5. What is the process in MapReduce that moves all data from one key to the same worker
node?
(Select an answer)
A. Map
B. Reduce
C. Shuffle
D. Split

6. Which statement is true about Hadoop Distributed File System (HDFS)?

A. Data can be processed over long distances without a decrease in performance.
B. Data is designed for random access read/write.
C. Data is accessed through MapReduce.
D. Data can be created, updated and deleted.

7. What is one of the two driving principles of MapReduce?

A. increase storage capacity through advanced compression algorithms
B. spread data across a cluster of computers
C. provide a platform for highly efficient transaction processing
D. provide structure to unstructured or semi-structured data

8. What was Hadoop named after?

A. Creator Doug Cutting's favorite circus act
B. Cutting's high school rock band
C. The toy elephant of Cutting's son
D. A sound Cutting's laptop made during Hadoop's development
9. All of the following accurately describe Hadoop, EXCEPT:
A. Open source
B. Real-time
C. Java-based
D. Distributed computing approach
10. True or false? Hadoop can be used to create distributed clusters, based on commodity
servers, that provide low-cost processing and storage for unstructured data, log files and
other forms of big data.
A. True
B. False

11. True or false? MapReduce can best be described as a programming model used to develop
Hadoop-based applications that can process massive amounts of unstructured data.
A. True
B. False
12. What is one of the four characteristics of Big Data?
A. volatility
B. value
C. volume
D. verifiability

-3-
A.U : 2022-2023

13. Which Big Data function improves the decision-making capabilities of organizations by
enabling the organizations to interpret and evaluate structured and unstructured data in
search of valuable business information?
A. distributed file system
B. analytics
C. stream computing
D. data warehousing

Exercice 2 ( 5 pts)
StereoPrix est une entreprise de grande distribution et souhaite faire des statistiques sur les
ventes des douze derniers mois. Elle possède une base de données stockée sur un système
HDFS. Ces données sont stockées dans des _chiers textes. Chaque ligne d'un fichier
correspond à la vente d'un produit et on peut y trouver des informations comme :
- la date et l'heure de vente
- le nom du magasin où le produit a été vendu
- le prix de vente
- la dénomination du produit
- la catégorie du produit (ex : fruits et légumes, électroménager, jouet, ....)

Question
1. Ecrivez les programmes map-reduce qui donne le montant de la vente le plus élevé pour
chaque magasin

Vous donnerez vos solutions en pseudo-code. Les fonctions à spécifier ont les prototypes
suivant :
- la fonction map
Map(<TypeClé> key, <TypeValeur> value)
{
//ici votre pseudo code map
}
- la fonction reduce
Reduce(<TypeClé> key, liste de <TypeValeur> values)
{
//ici votre pseudo code reduce
}
Il est tout à fait possible de considérer qu'un type de clé ou de valeur n'est pas un type basique
(un entier, un réel ou une chaîne de caractère) mais un type structuré agrégeant plusieurs
champs
Précisez à chaque fois :
- ce que fait un map
- ce que fait un reduce
- ce que représente les différentes clés/valeurs transmises avant et après chaque phasede
map et reduce

-4-

DS QCM BigData 2021
No ratings yet
DS QCM BigData 2021
6 pages
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
No ratings yet
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
8 pages
Big Data MCQ
No ratings yet
Big Data MCQ
47 pages
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
No ratings yet
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
3 pages
Hadoop MCQs
75% (8)
Hadoop MCQs
21 pages
Big Data & NoSQL Exam Prep
No ratings yet
Big Data & NoSQL Exam Prep
5 pages
454U8-Big Data Analytics
No ratings yet
454U8-Big Data Analytics
22 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
Bda MCQ Set
No ratings yet
Bda MCQ Set
8 pages
Hadoop Quiz and Exam Answers
No ratings yet
Hadoop Quiz and Exam Answers
10 pages
Nptel Big Data Full Assignment Solution 2021
89% (9)
Nptel Big Data Full Assignment Solution 2021
36 pages
DSBDA Kadak Document
No ratings yet
DSBDA Kadak Document
249 pages
Subject Name:: Knowledge Institute of Technology & Engineering-135
No ratings yet
Subject Name:: Knowledge Institute of Technology & Engineering-135
22 pages
MCQ Da
No ratings yet
MCQ Da
28 pages
Pig
No ratings yet
Pig
24 pages
Bda Bits - Mid I-Qp (2024-25)
No ratings yet
Bda Bits - Mid I-Qp (2024-25)
2 pages
Bda A1
No ratings yet
Bda A1
15 pages
Bits
No ratings yet
Bits
2 pages
BD Question Bank MCQ Answered
No ratings yet
BD Question Bank MCQ Answered
8 pages
IBM Cloud and Big Data Quiz
100% (1)
IBM Cloud and Big Data Quiz
206 pages
MCQ Big
No ratings yet
MCQ Big
7 pages
5877 - 4 MCS 2 Big Data - 4093 - (19-06-2024 01 - 37 - 31 - 626 PM)
No ratings yet
5877 - 4 MCS 2 Big Data - 4093 - (19-06-2024 01 - 37 - 31 - 626 PM)
3 pages
Questions Certif BigData
No ratings yet
Questions Certif BigData
12 pages
Hadoop Big Data Concepts Guide
100% (1)
Hadoop Big Data Concepts Guide
7 pages
Big Data 2020
No ratings yet
Big Data 2020
13 pages
Midterm Solution
0% (1)
Midterm Solution
7 pages
Bigdata MCQ QA Part2
No ratings yet
Bigdata MCQ QA Part2
9 pages
2022 Assignment Answers
No ratings yet
2022 Assignment Answers
37 pages
Bda r16 Csdlo7032 QP
No ratings yet
Bda r16 Csdlo7032 QP
4 pages
Nptel Assignment 1
No ratings yet
Nptel Assignment 1
4 pages
Big Data QCM 1 PDF
100% (1)
Big Data QCM 1 PDF
7 pages
Big Data Questions
100% (1)
Big Data Questions
39 pages
Quiz 1
No ratings yet
Quiz 1
10 pages
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
No ratings yet
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
7 pages
Big Data Architecture & Tools Guide
No ratings yet
Big Data Architecture & Tools Guide
8 pages
Big Data Course: Key Concepts & Tools
No ratings yet
Big Data Course: Key Concepts & Tools
66 pages
Bda MCQ
No ratings yet
Bda MCQ
9 pages
University of Mumbai Sample MCQ Question Bank Course Code and Name: BDA ITC801 /R16 Class: BE Semester:8 Options A B C D
No ratings yet
University of Mumbai Sample MCQ Question Bank Course Code and Name: BDA ITC801 /R16 Class: BE Semester:8 Options A B C D
6 pages
Two Marks
No ratings yet
Two Marks
39 pages
Big Data & Hadoop: Review Questions
No ratings yet
Big Data & Hadoop: Review Questions
7 pages
Mid Term Sample Questions
No ratings yet
Mid Term Sample Questions
8 pages
Big Data and Hadoop Quiz Guide
No ratings yet
Big Data and Hadoop Quiz Guide
21 pages
Big Data Analytics Unit 1 MCQ
90% (10)
Big Data Analytics Unit 1 MCQ
10 pages
BIG DATA ANALYTICS MCQs
No ratings yet
BIG DATA ANALYTICS MCQs
8 pages
Data Systems & Big Data Insights
No ratings yet
Data Systems & Big Data Insights
24 pages
What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?
No ratings yet
What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?
24 pages
coursBUTONLYQA Merged
No ratings yet
coursBUTONLYQA Merged
52 pages
Big Data and Hadoop MCQs and XML Configurations
No ratings yet
Big Data and Hadoop MCQs and XML Configurations
21 pages
BigData Objective
No ratings yet
BigData Objective
93 pages
Test 3
No ratings yet
Test 3
5 pages
Is The World's Most Complete, Tested, and Popular Distribution of Apache Hadoop and Related Projects. A. MDH B. CDH C. ADH
No ratings yet
Is The World's Most Complete, Tested, and Popular Distribution of Apache Hadoop and Related Projects. A. MDH B. CDH C. ADH
21 pages
Hadoop Interview Materials
No ratings yet
Hadoop Interview Materials
28 pages
Mid - 2 Questions & Bits
No ratings yet
Mid - 2 Questions & Bits
5 pages
Cloud Computing Applications Part 1 Final
No ratings yet
Cloud Computing Applications Part 1 Final
130 pages
Week 0 To 8 Assignment
No ratings yet
Week 0 To 8 Assignment
31 pages
Big Data 22 23 24
No ratings yet
Big Data 22 23 24
10 pages
Big Data Cat Questions
No ratings yet
Big Data Cat Questions
7 pages
Chapter 1
No ratings yet
Chapter 1
16 pages