0% found this document useful (0 votes)

11 views

Prachi 20CS111 BDALab File

Uploaded by

aditi.agarwal.2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Prachi 20CS111 BDALab File

Uploaded by

aditi.agarwal.2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

BIG DATA ANALYTICS

LAB
Submitted in partial fulfillment of the requirements for the award

Of the degree of Bachelor of

Technology In

Computer Science & Engineering

(Bikaner Technical University, Bikaner)

SESSION (2023-2024)

SUBMITTED TO: SUBMITTED

BY:
Mr.Sunil Kumar Khinchi Name:Prachi
20EEACS301

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING, ENGINEERING COLLEGE AJMER

1. Implement the following Data structures in Java
i) Linked Lists ii) Stacks iii) Queues iv) Set v) Map

DESCRIPTION:
The java.util package contains all the classes and interfaces for Collection framework

Methods of Collection interface

There are many methods declared in the Collection interface. They are as follows:

No. Method Description

1. public boolean add(Object is used to insert an element in this

element) collection.

2. public boolean is used to insert the specified collection

addAll(Collection c) elements in the invoking collection.

3. public boolean remove(Object is used to delete an element from this

element) collection..

4. public boolean is used to delete all the elements of

removeAll(Collection c) specified collection from the invoking
collection.
5. public boolean is used to delete all the elements of
retainAll(Collection c) invoking collection except the specified
collection.

6. public int size() return the total number of elements in the

collection.

7. public void clear() removes the total no of element from the

collection.

8. public boolean contains(Object is used to search an element.

element)

9. public boolean is used to search the specified collection in

containsAll(Collection c) this collection.

10. public Iterator iterator() returns an iterator.

11. public Object[] toArray() converts collection into array

12. public boolean isEmpty() checks if collection into empty

SKELTON OF JAVA.UTIL.COLLECTION INTERFACE

public interface Collection<E> extends Iterable<E> {

int size();
boolean isEmpty();
boolean contains(Object o);
Iterator<E> iterator();
Object[] toArray();
<T> T[] toArray(T[] a);
boolean add(E e);
boolean remove(Object o);
boolean addAll(Collection<? extends E> c);
boolean removeAll(Collection<?> c);
boolean retainAll(Collection<?> c);
void clear();
boolean equals(Object o);
int hashCode();
}
ALGORITHM for All Collection Data Structures:-

Steps of Creation of Collection

1. Create a Object of Generic Type E,T,K or V
2. Create a Model class or Plain Old Java Object (POJO) of type.
3. Generate Setters and Getters
4. Create a Collection Object of type either Set or List or Map or Queue
5. Add Objects to the collection
Boolean add(E e)
6. Add Collection to the Collection.
Boolean addAll(Collection)
7. Remove or retain data from Collection
Remove(Collection) retailAll(Collection)
8. Iterate Objects using Enumeration or Iterator or ListIterator
Iterator listIterator()
9. Display Objects from Collection
10. END

SAMPLE INPUT:
Sample Employee Data Set:
(employee.txt)
e100,james,asst.prof,cse,8000,16000,4000,8.7
e101,jack,asst.prof,cse,8350,17000,4500,9.2
e102,jane,assoc.prof,cse,15000,30000,8000,7.8
e104,john,prof,cse,30000,60000,15000,8.8
e105,peter,assoc.prof,cse,16500,33000,8600,6.9
e106,david,assoc.prof,cse,18000,36000,9500,8.3
e107,daniel,asst.prof,cse,9400,19000,5000,7.9
e108,ramu,assoc.prof,cse,17000,34000,9000,6.8
e109,rani,asst.prof,cse,10000,21500,4800,6.4
E110,murthy,prof,cse,35000,71500,15000,9,3

EXPECTED OUTPUT:-
Prints the information of employee with all its attributes
2. Perform setting up and Installing Hadoop in its three operating modes: Standalone,
Pseudo distributed, Fully distributed.

DESCRIPTION:
Hadoop is written in Java, so you will need to have Java installed on your machine,
version 6 or later. Sun's JDK is the one most widely used with Hadoop, although others
have been reported to work. Hadoop runs on Unix and on Windows. Linux is the only
supported production platform, but other flavors of Unix (including Mac OS X) can be
used to run Hadoop for development. Windows is only supported as a development
platform, and additionally requires Cygwin to run. During the Cygwin installation
process, you should include the openssh package if you plan to run Hadoop in pseudo-
distributed mode

ALGORITHM

STEPS INVOLVED IN INSTALLING HADOOP IN STANDALONE MODE:-

1. Command for installing ssh is “sudo apt-get install ssh”.

2. Command for key generation is ssh-keygen –t rsa –P “ ”.
3. Store the key into rsa.pub by using the command cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys
4. Extract the java by using the command tar xvfz jdk-8u60-linux-i586.tar.gz.
5. Extract the eclipse by using the command tar xvfz eclipse-jee-mars-R-linux-
gtk.tar.gz
6. Extract the hadoop by using the command tar xvfz hadoop-2.7.1.tar.gz
7. Move the java to /usr/lib/jvm/ and eclipse to /opt/ paths. Configure the java path in the
eclipse.ini file
8. Export java path and hadoop path in ./bashrc
9. Check the installation successful or not by checking the java version and hadoop
version
10. Check the hadoop instance in standalone mode working correctly or not by using an
implicit hadoop jar file named as word count.
11. If the word count is displayed correctly in part-r-00000 file it means that standalone
mode is installed successfully.

ALGORITHM
STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED
MODE:-

1. In order install pseudo distributed mode we need to configure the hadoop

configuration files resides in the directory /home/lendi/hadoop-2.7.1/etc/hadoop.
2. First configure the hadoop-env.sh file by changing the java path.
3. Configure the core-site.xml which contains a property tag, it contains name and
value. Name as fs.defaultFS and value as hdfs://localhost:9000
4. Configure hdfs-site.xml.
5. Configure yarn-site.xml.
6. Configure mapred-site.xml before configure the copy mapred-site.xml.template to
mapred-site.xml.
7. Now format the name node by using command hdfs namenode –format.
8. Type the command start-dfs.sh,start-yarn.sh means that starts the daemons like
NameNode,DataNode,SecondaryNameNode ,ResourceManager,NodeManager.
9. Run JPS which views all daemons. Create a directory in the hadoop by using
command hdfs dfs –mkdr /csedir and enter some data into lendi.txt using command
nano lendi.txt and copy from local directory to hadoop using command hdfs dfs –
copyFromLocal lendi.txt /csedir/and run sample jar file wordcount to check whether
pseudo distributed mode is working or not.
10. Display the contents of file by using command hdfs dfs –cat /newdir/part-r-00000.

FULLY DISTRIBUTED MODE INSTALLATION:

ALGORITHM

1. Stop all single node clusters

$stop-all.sh
2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).
3. Copy public key to all three hosts to get a password less SSH access
$ssh-copy-id –I $HOME/.ssh/id_rsa.pub lendi@l5sys24
4. Configure all Configuration files, to name Master and Slave Nodes.
$cd $HADOOP_HOME/etc/hadoop
$nano core-site.xml
$ nano hdfs-site.xml
5. Add hostnames to file slaves and save it.
$ nano slaves
6. Configure $ nano yarn-site.xml
7. Do in Master Node
$ hdfs namenode –format
$ start-dfs.sh
$start-yarn.sh
8. Format NameNode
9. Daemons Starting in Master and Slave Nodes
10. END

INPUT
ubuntu @localhost> jps
OUTPUT:
Data node, name nodem Secondary name node,
NodeManager, Resource Manager

3. Implement the following file management tasks in Hadoop:

● Adding files and directories
● Retrieving files
● Deleting files Hint: A typical Hadoop workflow creates data files (such as log
files) elsewhere and copies them into HDFS using one of the above command
line utilities.

DESCRIPTION:

HDFS is a scalable distributed filesystem designed to scale to petabytes of data while

running on top of the underlying filesystem of the operating system. HDFS keeps track of where
the data resides in a network by associating the name of its rack (or network switch) with the
dataset. This allows Hadoop to efficiently schedule tasks to those nodes that contain data, or
which are nearest to it, optimizing bandwidth utilization. Hadoop provides a set of command line
utilities that work similarly to the Linux file commands, and serve as your primary interface with
HDFS. We‘re going to have a look into HDFS by interacting with it from the command line. We
will take a look at the most common file management tasks in Hadoop, which include:

● Adding files and directories to HDFS

● Retrieving files from HDFS to local filesystem
● Deleting files from HDFS

ALGORITHM:-

SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS

Step-1
Adding Files and Directories to HDFS

Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory and put a file in it. HDFS has a default working directory of
/user/$USER, where $USER is your login user name. This directory isn‘t automatically created
for you, though, so let‘s create it with the mkdir command. For the purpose of illustration, we
use chuck. You should substitute your user name in the example commands.

hadoop fs -mkdir /user/chuck

hadoop fs -put example.txt
hadoop fs -put example.txt /user/chuck

Step-2
Retrieving Files from HDFS
The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
example.txt, we can run the following command:
hadoop fs -cat example.txt

Step-3
Deleting Files from HDFS
hadoop fs -rm example.txt
● Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.
● Adding directory is done through the command “hdfs dfs –put lendi_english /”.

Step-4
Copying Data from NFS to HDFS
Copying from directory command is “hdfs dfs –copyFromLocal
/home/lendi/Desktop/shakes/glossary /lendicse/”
● View the file by using the command “hdfs dfs –cat /lendi_english/glossary”
● Command for listing of items in Hadoop is “hdfs dfs –ls hdfs://localhost:9000/”.
● Command for Deleting files is “hdfs dfs –rm r /kartheek”.

SAMPLE INPUT:
Input as any data format of type structured, Unstructured or Semi Structured

EXPECTED OUTPUT:

4. Run a basic Word Count Map Reduce program to understand Map Reduce
Paradigm.
DESCRIPTION:--
MapReduce is the heart of Hadoop. It is this programming paradigm that allows for
massive scalability across hundreds or thousands of servers in a Hadoop cluster.
The MapReduce concept is fairly simple to understand for those who are familiar with clustered
scale-out data processing solutions. The term MapReduce actually refers to two separate and
distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data
and converts it into another set of data, where individual elements are broken down into tuples
(key/value pairs). The reduce job takes the output from a map as input and combines those data
tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce
job is always performed after the map job.

ALGORITHM

MAP REDUCE PROGRAM

WordCount is a simple program which counts the number of occurrences of each word in a given
text input data set. WordCount fits very well with the MapReduce programming model making it
a great example to understand the Hadoop Map/Reduce programming style. Our implementation
consists of three main parts:
1. Mapper
2. Reducer
3. Driver

Step-1. Write a Mapper

A Mapper overrides the ―map‖ function from the Class
"org.apache.hadoop.mapreduce.Mapper" which provides <key, value> pairs as the input. A
Mapper implementation may output
<key,value> pairs using the provided Context .
Input value of the WordCount Map task will be a line of text from the input data file and the key
would be the line number <line_number, line_of_text> . Map task outputs <word, one> for each
word in the line of text.
Pseudo-code
void Map (key, value){
for each word x in value:
output.collect(x, 1);
}
Step-2. Write a Reducer
A Reducer collects the intermediate <key,value> output from multiple map tasks and assemble a
single result. Here, the WordCount program will sum up the occurrence of each word to pairs as
<word, occurrence>.
Pseudo-code
void Reduce (keyword, <list of value>){
for each x in <list of value>:
sum+=x;
final_output.collect(keyword, sum);
}
Step-3. Write Driver
The Driver program configures and run the MapReduce job. We use the main program to
perform basic configurations such as:
● Job Name : name of this Job
● Executable (Jar) Class: the main executable class. For here, WordCount.
● Mapper Class: class which overrides the "map" function. For here, Map.
● Reducer: class which override the "reduce" function. For here , Reduce.
● Output Key: type of output key. For here, Text.
● Output Value: type of output value. For here, IntWritable.
● File Input Path
● File Output Path

INPUT:-
Set of Data Related Shakespeare Comedies, Glossary, Poems

OUTPUT:-
5. Write a Map Reduce program that mines weather data. Weather sensors collecting data
every hour at many locations across the globe gather a large volume of log data, which is a
good candidate for analysis with MapReduce, since it is semi structured and record-
oriented.

DESCRIPTION:
Climate change has been seeking a lot of attention since long time. The antagonistic
effect of this climate is being felt in every part of the earth. There are many examples for these,
such as sea levels are rising, less rainfall, increase in humidity. The propose system overcomes
the some issues that occurred by using other techniques. In this project we use the concept of
Big data Hadoop. In the proposed architecture we are able to process offline data, which is
stored in the National Climatic Data Centre (NCDC). Through this we are able to find out the
maximum temperature and minimum temperature of year, and able to predict the future weather
forecast. Finally, we plot the graph for the obtained MAX and MIN temperature for each moth
of the particular year to visualize the temperature. Based on the previous year data weather data
of coming year is predicted.

ALGORITHM:-
MAPREDUCE PROGRAM

1. Mapper
2. Reducer
3. Main program

Step-1. Write a Mapper

A Mapper overrides the ―map‖ function from the Class"org.apache.hadoop.mapreduce.Mapper"

which provides <key, value> pairs as the input. A Mapper implementation may output
<key,value> pairs using the provided Context .

Input value of the WordCount Map task will be a line of text from the input data file and the key
would be the line number <line_number, line_of_text> . Map task outputs <word, one> for each
word in the line of text.

Pseudo-code
void Map (key, value){
for each max_temp x in value:
output.collect(x, 1);
}
void Map (key, value){
for each min_temp x in value:
output.collect(x, 1);
}

Step-2 Write a Reducer

A Reducer collects the intermediate <key,value> output from multiple map tasks and assemble a
single result. Here, the WordCount program will sum up the occurrence of each word to pairs as
<word, occurrence>.

Pseudo-code

void Reduce (max_temp, <list of value>){

for each x in <list of value>:
sum+=x;
final_output.collect(max_temp, sum);
}
void Reduce (min_temp, <list of value>){
for each x in <list of value>:
sum+=x;
final_output.collect(min_temp, sum);
}

3. Write Driver

The Driver program configures and run the MapReduce job. We use the main program to
perform basic configurations such as:

● Job Name : name of this Job

● Executable (Jar) Class: the main executable class. For here, WordCount.
● Mapper Class: class which overrides the "map" function. For here, Map.
● Reducer: class which override the "reduce" function. For here , Reduce.
● Output Key: type of output key. For here, Text.
● Output Value: type of output value. For here, IntWritable.
● File Input Path
● File Output Path
INPUT:-
Set of Weather Data over the years

OUTPUT:-

6. Implement Matrix Multiplication with Hadoop Map Reduce.

DESCRIPTION:
We can represent a matrix as a relation (table) in RDBMS where each cell in the
matrix can be represented as a record (i,j,value). As an example let us consider the following
matrix and its representation. It is important to understand that this relation is a very inefficient
relation if the matrix is dense. Let us say we have 5 Rows and 6 Columns , then we need to store
only 30 values. But if you consider above relation we are storing 30 rowid, 30 col_id and 30
values in other sense we are tripling the data. So a natural question arises why we need to store in
this format ? In practice most of the matrices are sparse matrices . In sparse matrices not all cells
used to have any values , so we don‘t have to store those cells in DB. So this turns out to be very
efficient in storing such matrices.

MapReduceLogic

Logic is to send the calculation part of each output cell of the result matrix to a reducer.
So in matrix multiplication the first cell of output (0,0) has multiplication and summation of
elements from row 0 of the matrix A and elements from col 0 of matrix B. To do the
computation of value in the output cell (0,0) of resultant matrix in a seperate reducer we need to
use (0,0) as output key of mapphase and value should have array of values from row 0 of matrix
A and column 0 of matrix B. Hopefully this picture will explain the point. So in this algorithm
output from map phase should be having a <key,value> , where key represents the output cell
location (0,0) , (0,1) etc.. and value will be list of all values required for reducer to do
computation. Let us take an example for calculatiing value at output cell (00). Here we need to
collect values from row 0 of matrix A and col 0 of matrix B in the map phase and pass (0,0) as
key. So a single reducer can do the calculation.

ALGORITHM
We assume that the input files for A and B are streams of (key,value) pairs in sparse
matrix format, where each key is a pair of indices (i,j) and each value is the corresponding matrix
element value. The output files for matrix C=A*B are in the same format.
We have the following input parameters:

The path of the input file or directory for matrix A.

The path of the input file or directory for matrix B.
The path of the directory for the output files for matrix C.
strategy = 1, 2, 3 or 4.

1. R = the number of reducers.

2. I = the number of rows in A and C.
3. K = the number of columns in A and rows in B.
4. J = the number of columns in B and C.
5. IB = the number of rows per A block and C block.
6. KB = the number of columns per A block and rows per B block.
7. JB = the number of columns per B block and C block.
In the pseudo-code for the individual strategies below, we have intentionally avoided
factoring common code for the purposes of clarity.

Note that in all the strategies the memory footprint of both the mappers and the
reducers is flat at
Scale.

Note that the strategies all work reasonably well with both dense and sparse matrices. For sparse
matrices we do not emit zero elements. That said, the simple pseudo-code for multiplying the
individual blocks shown here is certainly not optimal for sparse matrices. As a learning exercise,
our focus here is on mastering the MapReduce complexities, not on optimizing the sequential
matrix multiplication algorithm for the individual blocks.

Steps
1. setup ()
2. var NIB = (I-1)/IB+1
3. var NKB = (K-1)/KB+1
4. var NJB = (J-1)/JB+1
5. map (key, value)
6. if from matrix A with key=(i,k) and value=a(i,k)
7. for 0 <= jb < NJB
8. emit (i/IB, k/KB, jb, 0), (i mod IB, k mod KB, a(i,k))
9. if from matrix B with key=(k,j) and value=b(k,j)
10. for 0 <= ib < NIB
emit (ib, k/KB, j/JB, 1), (k mod KB, j mod JB, b(k,j))
Intermediate keys (ib, kb, jb, m) sort in increasing order first by ib, then by kb, then by
jb, then by m. Note that m = 0 for A data and m = 1 for B data.

The partitioner maps intermediate key (ib, kb, jb, m) to a reducer r as follows:

11. r = ((ibJB + jb)KB + kb) mod R

12. These definitions for the sorting order and partitioner guarantee that each reducer
R[ib,kb,jb] receives the data it needs for blocks A[ib,kb] and B[kb,jb], with the data for
the A block immediately preceding the data for the B block.
13. var A = new matrix of dimension IBxKB
14. var B = new matrix of dimension KBxJB
15. var sib = -1
16. var skb = -1

Reduce (key, valueList)

17. if key is (ib, kb, jb, 0)

18. // Save the A block.
19. sib = ib
20. skb = kb
21. Zero matrix A
22. for each value = (i, k, v) in valueList A(i,k) = v
23. if key is (ib, kb, jb, 1)
24. if ib != sib or kb != skb return // A[ib,kb] must be zero!
25. // Build the B block.
26. Zero matrix B
27. for each value = (k, j, v) in valueList B(k,j) = v
28. // Multiply the blocks and emit the result.
29. ibase = ib*IB
30. jbase = jb*JB
31. for 0 <= i < row dimension of A
32. for 0 <= j < column dimension of B
33. sum = 0
34. for 0 <= k < column dimension of A = row dimension of B
a. sum += A(i,k)*B(k,j)
35. if sum != 0 emit (ibase+i, jbase+j), sum

INPUT:-

Set of data sets over different Clusters are taken as Rows and Columns.
OUTPUT:-

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
MAN-0024 Training Course Stage One PDF
100% (1)
MAN-0024 Training Course Stage One PDF
92 pages
Machine Learning Syllabus PDF
0% (1)
Machine Learning Syllabus PDF
4 pages
Pro 3
No ratings yet
Pro 3
45 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Hadoop_Lab program
No ratings yet
Hadoop_Lab program
54 pages
Hadoop Lab Manual
No ratings yet
Hadoop Lab Manual
54 pages
Big Data Lab Material
No ratings yet
Big Data Lab Material
45 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
Big Data
No ratings yet
Big Data
67 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
big datalab
No ratings yet
big datalab
4 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
HADOOP AND BIG DATA - Final
No ratings yet
HADOOP AND BIG DATA - Final
26 pages
Bdafile
No ratings yet
Bdafile
9 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Hadoop
No ratings yet
Hadoop
71 pages
BDA Lab Manual_organized (2) (1) - Copy
No ratings yet
BDA Lab Manual_organized (2) (1) - Copy
69 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
Bda Record
No ratings yet
Bda Record
83 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
BDA LAB MANUEL
No ratings yet
BDA LAB MANUEL
9 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Bda Lab Ex No 01 & 02
No ratings yet
Bda Lab Ex No 01 & 02
25 pages
BIG data master
No ratings yet
BIG data master
24 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Big Data File
No ratings yet
Big Data File
16 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
Data Science
No ratings yet
Data Science
82 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Bda Aat
No ratings yet
Bda Aat
18 pages
bda-manual
No ratings yet
bda-manual
33 pages
Bda Record
No ratings yet
Bda Record
46 pages
02 HDFS - 3 JavaAPI
No ratings yet
02 HDFS - 3 JavaAPI
26 pages
Big Data
No ratings yet
Big Data
23 pages
lab2_BD
No ratings yet
lab2_BD
20 pages
ADM Hadoop
No ratings yet
ADM Hadoop
25 pages
Unit1 Remainingtopics 6feb
No ratings yet
Unit1 Remainingtopics 6feb
13 pages
BDA lab Manual
No ratings yet
BDA lab Manual
62 pages
Lab File Format
No ratings yet
Lab File Format
60 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Big Data Journal
No ratings yet
Big Data Journal
50 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
BDA UNIT -3 Updated (1).docx
No ratings yet
BDA UNIT -3 Updated (1).docx
25 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Bda Lab 1
No ratings yet
Bda Lab 1
9 pages
Unit 3
No ratings yet
Unit 3
61 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Boston Town With Departures: THE Bus Service Every Minutes
No ratings yet
Boston Town With Departures: THE Bus Service Every Minutes
2 pages
Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow 1st Edition Hisham El-Amir - Instantly access the complete ebook with just one click
100% (4)
Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow 1st Edition Hisham El-Amir - Instantly access the complete ebook with just one click
59 pages
Economics The Basics 3rd Edition Michael Mandel download
100% (4)
Economics The Basics 3rd Edition Michael Mandel download
24 pages
IS 210 - 2009 - Reff2020
No ratings yet
IS 210 - 2009 - Reff2020
13 pages
CallRecord Log
No ratings yet
CallRecord Log
22 pages
RBI
No ratings yet
RBI
7 pages
Phd-Shareholder Protection in Libya
No ratings yet
Phd-Shareholder Protection in Libya
328 pages
Lec 5 DC Armature Reaction
No ratings yet
Lec 5 DC Armature Reaction
37 pages
Digital Solution Business Continuity Planning and RCM During COVID 19 Beyond
No ratings yet
Digital Solution Business Continuity Planning and RCM During COVID 19 Beyond
10 pages
Rules of Business 1973 (Updated)
No ratings yet
Rules of Business 1973 (Updated)
90 pages
COBOL Interview Question
100% (1)
COBOL Interview Question
13 pages
GMAT Waiver Request Form 3-10-16 Read Only
No ratings yet
GMAT Waiver Request Form 3-10-16 Read Only
2 pages
Assessment of Small Indigenous Fish Species in Human Nutrition
No ratings yet
Assessment of Small Indigenous Fish Species in Human Nutrition
20 pages
Metaverse - A Vision, Architectural Elements, and Future Directions For Scalable and Realtime Virtual Worlds
No ratings yet
Metaverse - A Vision, Architectural Elements, and Future Directions For Scalable and Realtime Virtual Worlds
27 pages
Unilever
No ratings yet
Unilever
10 pages
Module 1 - Conflict Resolution Strategy
No ratings yet
Module 1 - Conflict Resolution Strategy
53 pages
Construction and Maintenance of Masonry Houses
No ratings yet
Construction and Maintenance of Masonry Houses
91 pages
Roblox Adopt Me Trading Values - Win Fair Lose WFL 2
No ratings yet
Roblox Adopt Me Trading Values - Win Fair Lose WFL 2
1 page
Abdominal CT Attenuation
No ratings yet
Abdominal CT Attenuation
1 page
Tablet TP Operator Manual B-84274EN 04
No ratings yet
Tablet TP Operator Manual B-84274EN 04
322 pages
Lesson 1.1 or
No ratings yet
Lesson 1.1 or
3 pages
PCRF Brochure
No ratings yet
PCRF Brochure
4 pages
Ascon Manual
No ratings yet
Ascon Manual
76 pages
William F. Higgins, Jr. v. Clarence M. Kelley, Director, Federal Bureau of Investigation and Federal Bureau of Investigation, An Agency of The United States, 574 F.2d 789, 3rd Cir. (1978)
No ratings yet
William F. Higgins, Jr. v. Clarence M. Kelley, Director, Federal Bureau of Investigation and Federal Bureau of Investigation, An Agency of The United States, 574 F.2d 789, 3rd Cir. (1978)
7 pages
Inside Orch
No ratings yet
Inside Orch
5 pages
Longmont Winter/Spring 2013 Brochure
No ratings yet
Longmont Winter/Spring 2013 Brochure
59 pages
Abdul Karim Ltd (Dredging Unit)_1.2
No ratings yet
Abdul Karim Ltd (Dredging Unit)_1.2
1 page
Assignments PSOC
100% (2)
Assignments PSOC
4 pages

Prachi 20CS111 BDALab File

Uploaded by

Prachi 20CS111 BDALab File

Uploaded by

BIG DATA ANALYTICS

Of the degree of Bachelor of

Computer Science & Engineering

SUBMITTED TO: SUBMITTED

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING, ENGINEERING COLLEGE AJMER

Methods of Collection interface

No. Method Description

1. public boolean add(Object is used to insert an element in this

2. public boolean is used to insert the specified collection

3. public boolean remove(Object is used to delete an element from this

4. public boolean is used to delete all the elements of

6. public int size() return the total number of elements in the

7. public void clear() removes the total no of element from the

8. public boolean contains(Object is used to search an element.

9. public boolean is used to search the specified collection in

10. public Iterator iterator() returns an iterator.

11. public Object[] toArray() converts collection into array

12. public boolean isEmpty() checks if collection into empty

SKELTON OF JAVA.UTIL.COLLECTION INTERFACE

public interface Collection<E> extends Iterable<E> {

Steps of Creation of Collection

STEPS INVOLVED IN INSTALLING HADOOP IN STANDALONE MODE:-

1. Command for installing ssh is “sudo apt-get install ssh”.

1. In order install pseudo distributed mode we need to configure the hadoop

FULLY DISTRIBUTED MODE INSTALLATION:

1. Stop all single node clusters

3. Implement the following file management tasks in Hadoop:

HDFS is a scalable distributed filesystem designed to scale to petabytes of data while

● Adding files and directories to HDFS

hadoop fs -mkdir /user/chuck

MAP REDUCE PROGRAM

Step-1. Write a Mapper

Step-1. Write a Mapper

A Mapper overrides the ―map‖ function from the Class"org.apache.hadoop.mapreduce.Mapper"

Step-2 Write a Reducer

void Reduce (max_temp, <list of value>){

● Job Name : name of this Job

6. Implement Matrix Multiplication with Hadoop Map Reduce.

The path of the input file or directory for matrix A.

1. R = the number of reducers.

11. r = ((ib*JB + jb)*KB + kb) mod R

Reduce (key, valueList)

17. if key is (ib, kb, jb, 0)

You might also like

11. r = ((ibJB + jb)KB + kb) mod R