0% found this document useful (0 votes)
69 views4 pages

DS BigDATA 2ièmeN2TR UVT 2022 2023

The document discusses a homework assignment on big data and Hadoop. It contains two exercises, the first being a multiple choice quiz about Hadoop and big data concepts. The second exercise asks students to write a MapReduce program that finds the highest sales amount for each store, using a sales database stored in HDFS text files with fields like date, store, price, product, and category.

Uploaded by

OUESLATI Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views4 pages

DS BigDATA 2ièmeN2TR UVT 2022 2023

The document discusses a homework assignment on big data and Hadoop. It contains two exercises, the first being a multiple choice quiz about Hadoop and big data concepts. The second exercise asks students to write a MapReduce program that finds the highest sales amount for each store, using a sales database stored in HDFS text files with fields like date, store, price, product, and category.

Uploaded by

OUESLATI Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A.

U : 2022-2023

CLASSES: 2ème année N2TR DUREE :1h


MATIERE : BIGDATA NOMBRE DE PAGE : 5

DEVOIR SURVEILLE
PREPARATION A LA CERTIFICATION : BIG DATA

Exercice 1 ( QCM : Donnez la réponse sur votre feuille) (15pts)

Partie1 ( connaissances générales sur le Big Data) (5.25)


1. Which statement is true about storage of the output of REDUCE task?
Your answer
A It is stored in HDFS using the number of copies specified by replication factor.
B It is stored in HDFS, but only one copy on the. local machine
C It is stored on the local disk.
D It is stored in memory.
2. In the master/slave architecture, what is considered the slave?
Your answer
A DataNode
B FsImage
C NameSpace
D EditLog

3. Which MapReduce task is responsible for reading a portion of input data and producing
<key, value< pairs?
Your answer
A Shuffle
B Reduce
C Map
D Combiner

4. Hadoop is designed for which type of work?


Your answer
A low latency
B random access
C processing many small files
D batch-oriented parallel processing

5. What is the primary benefit of using Hive with Hadoop?


Your answer
A It supports materialized views.
B It provides support for transactions.
C Queries perform much faster than with MapReduce.
D Hadoop data can be accessed through SQL statements.

-1-
A.U : 2022-2023

6. Which two commands are used to copy a data file from the local file system to HDFS?
(Choose two.)
(Please select ALL that apply)
Your answer
A hadoop fs -copyFromLocal test_file test_file
B hadoop fs -put test_file test_file
C hadoop fs -cp test_file test_file
D hadoop fs -get test_file test_file
E hadoop fs -sync test_file test_file

7. Which Big Data technology delivers real-time analytic processing on data in motion?
Your answer
A Data Warehouse
B Hadoop
C Platform computing
D Stream computing

Partie1 ( 9,75pts)
1. Which type of Big Data analysis involves the processing of extremely large volumes of
constantly moving data that is impractical to store?
A. MapReduce
B. Federated Discovery and Navigation
C. Stream Computing
D. Text Analysis

2. Which description identifies the real value of Big Data and Analytics?
A. enabling customers to efficiently index and access large volumes of data
B. providing solutions to help customers manage and grow large database systems
C. using modern technology to efficiently store the massive amounts of data generated
by nsocial networks
D. gaining new insight through the capabilities of the world's interconnected
intelligence

3. What is one of the two technologies that Hadoop uses as its foundation?
A. Apache
B. MapReduce
C. Jaql
D. HBase

4. Which command should be used to list the contents of the root directory in HDFS?
A. hadoop fs list
B. hdfs list /
C. hdfs root
D. hadoop fs -Is /

-2-
A.U : 2022-2023

5. What is the process in MapReduce that moves all data from one key to the same worker
node?
(Select an answer)
A. Map
B. Reduce
C. Shuffle
D. Split

6. Which statement is true about Hadoop Distributed File System (HDFS)?


A. Data can be processed over long distances without a decrease in performance.
B. Data is designed for random access read/write.
C. Data is accessed through MapReduce.
D. Data can be created, updated and deleted.

7. What is one of the two driving principles of MapReduce?


A. increase storage capacity through advanced compression algorithms
B. spread data across a cluster of computers
C. provide a platform for highly efficient transaction processing
D. provide structure to unstructured or semi-structured data

8. What was Hadoop named after?


A. Creator Doug Cutting's favorite circus act
B. Cutting's high school rock band
C. The toy elephant of Cutting's son
D. A sound Cutting's laptop made during Hadoop's development
9. All of the following accurately describe Hadoop, EXCEPT:
A. Open source
B. Real-time
C. Java-based
D. Distributed computing approach
10. True or false? Hadoop can be used to create distributed clusters, based on commodity
servers, that provide low-cost processing and storage for unstructured data, log files and
other forms of big data.
A. True
B. False

11. True or false? MapReduce can best be described as a programming model used to develop
Hadoop-based applications that can process massive amounts of unstructured data.
A. True
B. False
12. What is one of the four characteristics of Big Data?
A. volatility
B. value
C. volume
D. verifiability

-3-
A.U : 2022-2023

13. Which Big Data function improves the decision-making capabilities of organizations by
enabling the organizations to interpret and evaluate structured and unstructured data in
search of valuable business information?
A. distributed file system
B. analytics
C. stream computing
D. data warehousing

Exercice 2 ( 5 pts)
StereoPrix est une entreprise de grande distribution et souhaite faire des statistiques sur les
ventes des douze derniers mois. Elle possède une base de données stockée sur un système
HDFS. Ces données sont stockées dans des _chiers textes. Chaque ligne d'un fichier
correspond à la vente d'un produit et on peut y trouver des informations comme :
- la date et l'heure de vente
- le nom du magasin où le produit a été vendu
- le prix de vente
- la dénomination du produit
- la catégorie du produit (ex : fruits et légumes, électroménager, jouet, ....)

Question
1. Ecrivez les programmes map-reduce qui donne le montant de la vente le plus élevé pour
chaque magasin

Vous donnerez vos solutions en pseudo-code. Les fonctions à spécifier ont les prototypes
suivant :
- la fonction map
Map(<TypeClé> key, <TypeValeur> value)
{
//ici votre pseudo code map
}
- la fonction reduce
Reduce(<TypeClé> key, liste de <TypeValeur> values)
{
//ici votre pseudo code reduce
}
Il est tout à fait possible de considérer qu'un type de clé ou de valeur n'est pas un type basique
(un entier, un réel ou une chaîne de caractère) mais un type structuré agrégeant plusieurs
champs
Précisez à chaque fois :
- ce que fait un map
- ce que fait un reduce
- ce que représente les différentes clés/valeurs transmises avant et après chaque phasede
map et reduce

-4-

You might also like