0% found this document useful (0 votes)
1 views

Big data analytics lab-JD

The Big Data Analytics Lab Manual outlines various experiments related to Hadoop, including installation, file management tasks, matrix multiplication, and word count using MapReduce. Each experiment includes aims, descriptions, algorithms, and commands necessary for execution. The manual serves as a comprehensive guide for students to understand and implement big data concepts using Hadoop.

Uploaded by

ramyakrishnan201
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Big data analytics lab-JD

The Big Data Analytics Lab Manual outlines various experiments related to Hadoop, including installation, file management tasks, matrix multiplication, and word count using MapReduce. Each experiment includes aims, descriptions, algorithms, and commands necessary for execution. The manual serves as a comprehensive guide for students to understand and implement big data concepts using Hadoop.

Uploaded by

ramyakrishnan201
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

RENGANAYAGI VARATHARAJ COLLEGE OF ENGINEERING

SALVARPATTI, SIVAKASI – 626 128

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

BIG DATA ANALYTICS


LAB MANUAL
(CCS334)

SUBJECT HANDLED BY :
MS.JAIDHARNI AP/CSE
BIG DATA ANALYTICS LAB
(CCS334)

List of Experiments

1. Downloading and installing Hadoop; Understanding different Hadoop modes.


Startup scripts .Configuration files.

2. Hadoop implementation of the management tasks, such as Adding files and directories,
Retrieving files and deleting files.

3. Implemention of Matrix Multiplication with hadoop Map Reduce.

4. Run a basic word Count Map Reduce program to understand Map Reduce Paradigm.

5. Installation of Hive along with practice examples.

6. Installation of HBase, Installing thrift along with Practice examples.

7. Practice importing and exporting data from various databases.


EXPNO:1
1.Install Apache Hadoop
Date:

AIM:-
i) Perform setting up and Installing Hadoop in its three operating modes:

 Standalone

 Pseudo Distributed

 Fully Distributed

DESCRIPTION:
Hadoop is written in Java, so you will need to have Java installed on your machine,

version 6 or later. Sun's JDK is the one most widely used with Hadoop, although others have

been reported to work.

Hadoop runs on Unix and on Windows. Linux is the only supported production platform,

but other flavors of Unix (including Mac OS X) can be used to run Hadoop for development.

Windows is only supported as a development platform, and additionally requires Cygwin to run.

During the Cygwin installation process, you should include the openssh package if you plan to

run Hadoop in pseudo-distributed mode

ALGORITHM
STEPS INVOLVED IN INSTALLING HADOOP IN STANDALONE MODE:-

1. Command for installing ssh is “sudo apt-get install ssh”.

2. Command for key generation is ssh-keygen –t rsa –P “ ”.

3. Store the key into rsa.pub by using the command cat $HOME/.ssh/id_rsa.pub >>

$HOME/.ssh/authorized_keys

4. Extract the java by using the command tar xvfz jdk-8u60-linux-i586.tar.gz.

5. Extract the eclipse by using the command tar xvfz eclipse-jee-mars-R-linux-gtk.tar.gz

6. Extract the hadoop by using the command tar xvfz hadoop-2.7.1.tar.gz

HADOOP AND BIG DATA


7. Move the java to /usr/lib/jvm/ and eclipse to /opt/ paths. Configure the java path in the

eclipse.ini file
8. Export java path and hadoop path in ./bashrc

9. Check the installation successful or not by checking the java version and hadoop version

10. Check the hadoop instance in standalone mode working correctly or not by using an

implicit hadoop jar file named as word count.

11. If the word count is displayed correctly in part-r-00000 file it means that standalone mode

is installed successfully.

ALGORITHM
STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED MODE:-

1. In order install pseudo distributed mode we need to configure the hadoop

configuration files resides in the directory /home/lendi/hadoop-2.7.1/etc/hadoop.

2. First configure the hadoop-env.sh file by changing the java path.

3. Configure the core-site.xml which contains a property tag, it contains name and

value. Name as fs.defaultFS and value as hdfs://localhost:9000

4. Configure hdfs-site.xml.

5. Configure yarn-site.xml.

6. Configure mapred-site.xml before configure the copy mapred-site.xml.template to

mapred-site.xml.

7. Now format the name node by using command hdfs namenode –format.

8. Type the command start-dfs.sh,start-yarn.sh means that starts the daemons like

NameNode,DataNode,SecondaryNameNode ,ResourceManager,NodeManager.

9. Run JPS which views all daemons. Create a directory in the hadoop by using

command hdfs dfs –mkdr /csedir and enter some data into lendi.txt using command

nano lendi.txt and copy from local directory to hadoop using command hdfs dfs –

copyFromLocal lendi.txt /csedir/and run sample jar file wordcount to check whether

pseudo distributed mode is working or not.

10. Display the contents of file by using command hdfs dfs –cat /newdir/part-r-00000.

FULLY DISTRIBUTED MODE INSTALLATION:


ALGORITHM

1. Stop all single node clusters

$stop-all.sh
2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).

3. Copy public key to all three hosts to get a password less SSH access

$ssh-copy-id –I $HOME/.ssh/id_rsa.pub lendi@l5sys24

4. Configure all Configuration files, to name Master and Slave Nodes.

$cd $HADOOP_HOME/etc/hadoop

$nano core-site.xml

$ nano hdfs-site.xml

5. Add hostnames to file slaves and save it.

$ nano slaves

6. Configure $ nano yarn-site.xml

7. Do in Master Node

$ hdfs namenode –format

$ start-dfs.sh

$start-yarn.sh

8. Format NameNode

9. Daemons Starting in Master and Slave Nodes

10. END

INPUT
ubuntu @localhost> jps

OUTPUT:
Data node, name nodem Secondary name node,

NodeManager, Resource Manager

Result:

We've installed Hadoop in standalone mode and verified it by running an example program it
provided
EXPNO:2
Hadoop Implementation of file management tasks
Date:

AIM:-
Implement the following file management tasks in Hadoop:
 Adding files and directories
 Retrieving files
 Deleting Files
DESCRIPTION:-
HDFS is a scalable distributed filesystem designed to scale to petabytes of data while
running on top of the underlying filesystem of the operating system. HDFS keeps track of where
the data resides in a network by associating the name of its rack (or network switch) with the
dataset. This allows Hadoop to efficiently schedule tasks to those nodes that contain data, or
which are nearest to it, optimizing bandwidth utilization. Hadoop provides a set of command line
utilities that work similarly to the Linux file commands, and serve as your primary interface with
HDFS. We‘re going to have a look into HDFS by interacting with it from the command line. We
will take a look at the most common file management tasks in Hadoop, which include:
 Adding files and directories to HDFS
 Retrieving files from HDFS to local filesystem
 Deleting files from HDFS
ALGORITHM:-
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS
Step-1
Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory and put a file in it. HDFS has a default working directory of
/user/$USER, where $USER is your login user name. This directory isn‘t automatically created
for you, though, so let‘s create it with the mkdir command. For the purpose of illustration, we
use chuck. You should substitute your user name in the example commands.

hadoop fs -mkdir /user/chuck


hadoop fs -put example.txt
hadoop fs -put example.txt /user/chuck

Step-2

Retrieving Files from HDFS


The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
example.txt, we can run the following command:
hadoop fs -cat example.txt

Step-3

Deleting Files from HDFS


hadoop fs -rm example.txt
 Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.
 Adding directory is done through the command “hdfs dfs –put lendi_english /”.

Step-4

Copying Data from NFS to HDFS


Copying from directory command is “hdfs dfs –copyFromLocal
/home/lendi/Desktop/shakes/glossary /lendicse/”

 View the file by using the command “hdfs dfs –cat /lendi_english/glossary”
 Command for listing of items in Hadoop is “hdfs dfs –ls hdfs://localhost:9000/”.
 Command for Deleting files is “hdfs dfs –rm r /kartheek”

SAMPLE INPUT:
Input as any data format of type structured, Unstructured or Semi Structured

EXPECTED OUTPUT:

Result:
Thus the implementation for program is executed successfully.
EXPNO:3
Implementation of Matrix Multiplication with hadoop
Date:

AIM:-
Write a Map Reduce Program that implements Matrix Multiplication.

DESCRIPTION:
We can represent a matrix as a relation (table) in RDBMS where each cell in the matrix
can be represented as a record (i,j,value). As an example let us consider the following matrix
and its representation. It is important to understand that this relation is a very inefficient relation
if the matrix is dense. Let us say we have 5 Rows and 6 Columns , then we need to store only 30
values. But if you consider above relation we are storing 30 rowid, 30 col_id and 30 values in
other sense we are tripling the data. So a natural question arises why we need to store in this
format ? In practice most of the matrices are sparse matrices . In sparse matrices not all cells
used to have any values , so we don‘t have to store those cells in DB. So this turns out to be very
efficient in storing such matrices.

MapReduceLogic:
Logic is to send the calculation part of each output cell of the result matrix to a reducer.
So in matrix multiplication the first cell of output (0,0) has multiplication and summation of
elements from row 0 of the matrix A and elements from col 0 of matrix B. To do the
computation of value in the output cell (0,0) of resultant matrix in a seperate reducer we need to
use (0,0) as output key of mapphase and value should have array of values from row 0 of matrix
A and column 0 of matrix B. Hopefully this picture will explain the point. So in this algorithm
output from map phase should be having a <key,value> , where key represents the output cell
location (0,0) , (0,1) etc.. and value will be list of all values required for reducer to do
computation. Let us take an example for calculatiing value at output cell (00). Here we need to
collect values from row 0 of matrix A and col
0 of matrix B in the map phase and pass (0,0) as
key. So a single reducer can do the calculation

ALGORITHM
We assume that the input files for A and B are streams of (key,value) pairs in sparse
matrix format, where each key is a pair of indices (i,j) and each value is the corresponding matrix
element value. The output files for matrix C=A*B are in the same format.

We have the following input parameters:


The path of the input file or directory for matrix A.
The path of the input file or directory for matrix B.
The path of the directory for the output files for matrix C.
strategy = 1, 2, 3 or 4.
R = the number of reducers.
I = the number of rows in A and C.
K = the number of columns in A and rows in B.
J = the number of columns in B and C.
IB = the number of rows per A block and C block.
KB = the number of columns per A block and rows per B block.
JB = the number of columns per B block and C block.
In the pseudo-code for the individual strategies below, we have intentionally avoided
factoring common code for the purposes of clarity.
Note that in all the strategies the memory footprint of both the mappers and the reducers is flat at
scale.
Note that the strategies all work reasonably well with both dense and sparse matrices. For sparse
matrices we do not emit zero elements. That said, the simple pseudo-code for multiplying the
individual blocks shown here is certainly not optimal for sparse matrices. As a learning exercise,
our focus here is on mastering the MapReduce complexities, not on optimizing the sequential
matrix multipliation algorithm for the individual blocks.

Steps
1. setup ()
2. var NIB = (I-1)/IB+1
3. var NKB = (K-1)/KB+1
4. var NJB = (J-1)/JB+1
5. map (key, value)
6. if from matrix A with key=(i,k) and value=a(i,k)
7. for 0 <= jb < NJB
8. emit (i/IB, k/KB, jb, 0), (i mod IB, k mod KB, a(i,k))
9. if from matrix B with key=(k,j) and value=b(k,j)
10. for 0 <= ib < NIB
emit (ib, k/KB, j/JB, 1), (k mod KB, j mod JB, b(k,j))
Intermediate keys (ib, kb, jb, m) sort in increasing order first by ib, then by kb, then by jb,
then by m. Note that m = 0 for A data and m = 1 for B data.
The partitioner maps intermediate key (ib, kb, jb, m) to a reducer r as follows:
11. r = ((ib*JB + jb)*KB + kb) mod R
12. These definitions for the sorting order and partitioner guarantee that each reducer
R[ib,kb,jb] receives the data it needs for blocks A[ib,kb] and B[kb,jb], with the data for
the A block immediately preceding the data for the B block.
13. var A = new matrix of dimension IBxKB
14. var B = new matrix of dimension KBxJB
15. var sib = -1
16. var skb = -1

Reduce (key, valueList)


17. if key is (ib, kb, jb, 0)
18. // Save the A block.
19. sib = ib
20. skb = kb
21. Zero matrix A
22. for each value = (i, k, v) in valueList A(i,k) = v
23. if key is (ib, kb, jb, 1)
24. if ib != sib or kb != skb return // A[ib,kb] must be zero!
25. // Build the B block.
26. Zero matrix B
27. for each value = (k, j, v) in valueList B(k,j) = v
28. // Multiply the blocks and emit the result.
29. ibase = ib*IB
30. jbase = jb*JB
31. for 0 <= i < row dimension of A
32. for 0 <= j < column dimension of B
33. sum = 0
34. for 0 <= k < column dimension of A = row dimension of B
a. sum += A(i,k)*B(k,j)
35. if sum != 0 emit (ibase+i, jbase+j), sum
INPUT:-
Set of Data sets over different Clusters are taken as Rows and Columns

OUTPUT

Result:
Thus the implementation for program is executed successfully.
EXPNO:4
Word count Map Reduce program
Date:

AIM: To Develop a MapReduce program to calculate the frequency of a given word in a given
file. Map Function – It takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (Key-Value pair).

Example – (MapfunctioninWord Count)

Input

Setof data
Bus,Car,bus,car,train, car,bus,car,train,bus,TRAIN,BUS,buS, caR,CAR,car,BUS,TRAIN
Output

Convertintoanothersetofdata
(Key,Value)
(Bus,1),(Car,1), (bus,1),(car,1),(train,1),(car,1), (bus,1),(car,1), (train,1),(bus,1),
(TRAIN,1),(BUS,1),(buS,1),(caR,1),(CAR,1),(car,1), (BUS,1), (TRAIN,1)
ReduceFunction–TakestheoutputfromMapasaninputandcombinesthosedatatuples into a
smaller set of tuples.
Example– (Reducefunction in Word Count)
Input Set of Tuples
(output of Map function)
(Bus,1),(Car,1),(bus,1),(car,1),(train,1),(car,1),(bus,1),(car,1),(train,1),(bus,1), (TRAIN,1),
(BUS,1),
(buS,1),(caR,1),(CAR,1),(car,1),(BUS,1), (TRAIN,1)

Output Convertsintosmallersetoftuples

(BUS,7),(CAR,7),(TRAIN,4)
WorkFlowof Program

Workflow of MapReduce consists of 5steps

1. Splitting–Thesplittingparametercanbeanything,e.g.splittingbyspace,
comma, semicolon, or even by a new line (‘\n’).
2. Mapping–asexplainedabove
3. Intermediatesplitting –theentireprocessinparallelondifferentclusters.Inorder
togroupthemin“ReducePhase”thesimilarKEYdatashouldbeonsamecluster.
4. Reduce–itisnothingbutmostlygroupbyphase
5. Combining–Thelastphasewhereallthedata(individualresultsetfromeach
cluster) is combine together to form a Result

NowLet’sSeetheWordCountProgramin Java

MakesurethatHadoopisinstalledonyoursystemwithjavaidk Steps to

follow

Step1.OpenEclipse>File>New>JavaProject >(Nameit–MRProgramsDemo)>Finish
Step2.RightClick>New>Package(Nameit -PackageDemo)>Finish Step 3. Right
Click on Package > New > Class (Name it - WordCount) Step 4. Add
Following Reference Libraries –

Right Click on Project>BuildPath>AddExternalArchivals

 /usr/lib/hadoop-0.20/hadoop-core.jar
 Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

Program : Step5 TypefollowingProgram:

package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
publicclassWordCount{
publicstaticvoidmain(String[]args)throwsException
{
Configurationc=newConfiguration();
String[]files=newGenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);j.setOutputKey
Class(Text.class); j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
publicstaticclassMapForWordCountextendsMapper<LongWritable,Text,Text, Int
Writable>{
publicvoidmap(LongWritablekey,Textvalue,Contextcon)throwsIOException,
InterruptedException
{
Stringline=value.toString();

String[]words=line.split(",");
for(String word: words )
{
TextoutputKey=newText(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
publicstaticclassReduceForWordCountextendsReducer<Text,IntWritable,Text,
IntWritable>
{
publicvoidreduce(Textword,Iterable<IntWritable>values,Contextcon)throws
IOException,
InterruptedException
{
intsum=0;
for(IntWritablevalue:values)
{
sum+=value.get();
}
con.write(word,newIntWritable(sum));
}
}
}

MakeJarFile
RightClickonProject>Export>SelectexportdestinationasJarFile>next>Finish
ToMovethisintoHadoopdirectly,opentheterminalandenterthefollowing commands:
[training@localhost~]$hadoopfs-putwordcountFilewordCountFile

RunJarfile
(Hadoopjarjarfilename.jarpackageName.ClassNamePathToInputTextFile
PathToOutputDirectry)

[training@localhost~]$HadoopjarMRProgramsDemo.jar PackageDemo.WordCount
wordCountFile MRDir1

Result : OpenResult

[training@localhost~]$hadoopfs-lsMRDir1
Found 3 items
-rw-r--r--1trainingsupergroup
02016-02-2303:36/user/training/MRDir1/_SUCCESS
drwxr-xr-x - training supergroup
02016-02-2303:36/user/training/MRDir1/_logs
-rw-r--r--1trainingsupergroup
20 2016-02-23 03:36 /user/training/MRDir1/part-r-00000
[training@localhost~]$hadoopfs-catMRDir1/part-r-00000
BUS 7
CAR 4
TRAIN6

Result:
Thus the implementation for program is executed successfully.

EXPNO:5
Date:
Installation of Hive

Downloading Hive
We use hive-0.14.0 in this tutorial. You can download it by visiting the following link
http://apache.petsads.us/hive/hive-0.14.0/. Let us assume it gets downloaded onto the /Downloads directory.
Here, we download Hive archive named “apache-hive-0.14.0-bin.tar.gz” for this tutorial. The following
command is used to verify the download:

$ cd Downloads
$ ls

On successful download, you get to see the following response:

apache-hive-0.14.0-bin.tar.gz

Installing Hive
The following steps are required for installing Hive on your system. Let us assume the Hive archive is
downloaded onto the /Downloads directory.

Extracting and verifying Hive Archive

The following command is used to verify the download and extract the hive archive:

$ tar zxvf apache-hive-0.14.0-bin.tar.gz


$ ls

On successful download, you get to see the following response:

apache-hive-0.14.0-bin apache-hive-0.14.0-bin.tar.gz

Copying files to /usr/local/hive directory

We need to copy the files from the super user “su -”. The following commands are used to copy the files from
the extracted directory to the /usr/local/hive” directory.

$ su -
passwd:

# cd /home/user/Download
# mv apache-hive-0.14.0-bin /usr/local/hive
# exit

Setting up environment for Hive

You can set up the Hive environment by appending the following lines to ~/.bashrc file:

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.

The following command is used to execute ~/.bashrc file.

$ source ~/.bashrc

Configuring Hive
To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the
$HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the
template file:

$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh

Edit the hive-env.sh file by appending the following line:

export HADOOP_HOME=/usr/local/hadoop

Hive installation is completed successfully. Now you require an external database server to configure
Metastore. We use Apache Derby database.

Result:
Thus the implementation for program is executed successfully.
EXPNO:6
Installation of HBase
Date:

INSTALLATION OF HBASE, INSTALLING THIFT ALONG WITH PRACTICE EXAMPLES.

Installing HBase

We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and Fully
Distributed mode.

Installing HBase in Standalone Mode

Download the latest stable version of HBase form http://www.interior-dsgn.com/apache/hbase/stable/ using


“wget” command, and extract it using the tar “zxvf” command. See the following command.

$cd usr/local/
$wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.8-
hadoop2-bin.tar.gz
$tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz

Shift to super user mode and move the HBase folder to /usr/local as shown below.

$su
$password: enter your password here
mv hbase-0.99.1/* Hbase/

Configuring HBase in Standalone Mode

Before proceeding with HBase, you have to edit the following files and configure HBase.

hbase-env.sh

Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME
environment variable and change the existing path to your current JAVA_HOME variable as shown below.

cd /usr/local/Hbase/conf
gedit hbase-env.sh

This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your current
value as shown below.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0

hbase-site.xml

This is the main configuration file of HBase. Set the data directory to an appropriate location by opening the
HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files, open the hbase-
site.xml file as shown below.

#cd /usr/local/HBase/
#cd conf
# gedit hbase-site.xml

Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags. Within them, set
the HBase directory under the property key with the name “hbase.rootdir” as shown below.

<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property>

//Here you have to set the path where you want HBase to store its built in zookeeper
files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>

With this, the HBase installation and configuration part is successfully complete. We can start HBase by using
start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase
start script as shown below.

$cd /usr/local/HBase/bin
$./start-hbase.sh

If everything goes well, when you try to run HBase start script, it will prompt you a message saying that
HBase has started.

starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-


localhost.localdomain.out

Installing HBase in Pseudo-Distributed Mode

Let us now check how HBase is installed in pseudo-distributed mode.

Configuring HBase

Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and
make sure they are running. Stop HBase if it is running.

hbase-site.xml

Edit hbase-site.xml file to add the following properties.

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
It will mention in which mode HBase should be run. In the same file from the local file system, change the
hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are running HDFS on the
localhost at port 8030.

<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8030/hbase</value>
</property>

Starting HBase

After configuration is over, browse to HBase home folder and start HBase using the following command.

$cd /usr/local/HBase
$bin/start-hbase.sh

Note: Before starting HBase, make sure Hadoop is running.

Checking the HBase Directory in HDFS

HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the
following command.

$ ./bin/hadoop fs -ls /hbase

If everything goes well, it will give you the following output.

Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs

Starting and Stopping a Master


Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of HBase, master and
execute the following command to start it.

$ ./bin/local-master-backup.sh 2 4

To kill a backup master, you need its process id, which will be stored in a file named “/tmp/hbase-USER-X-
master.pid.” you can kill the backup master using the following command.

$ cat /tmp/hbase-user-1-master.pid |xargs kill -9

Starting and Stopping RegionServers


You can run multiple region servers from a single system using the following command.
$ .bin/local-regionservers.sh start 2 3

To stop a region server, use the following command.

$ .bin/local-regionservers.sh stop 3

Starting HBaseShell
After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of steps that are
to be followed to start the HBase shell. Open the terminal, and login as super user.

Start Hadoop File System

Browse through Hadoop home sbin folder and start Hadoop file system as shown below.

$cd $HADOOP_HOME/sbin
$start-all.sh

Start HBase

Browse through the HBase root directory bin folder and start HBase.

$cd /usr/local/HBase
$./bin/start-hbase.sh

Start HBase Master Server

This will be the same directory. Start it as shown below.

$./bin/local-master-backup.sh start 2 (number signifies specific


server.)

Start Region

Start the region server as shown below.

$./bin/./local-regionservers.sh start 3

Start HBase Shell

You can start HBase shell using the following command.

$cd bin
$./hbase shell

This will give you the HBase Shell Prompt as shown below.

2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation:


hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri
Nov 14 18:26:29 PST 2014

hbase(main):001:0>

HBase Web Interface

To access the web interface of HBase, type the following url in the browser.

http://localhost:60010

This interface lists your currently running Region servers, backup masters and HBase tables.

HBase Region servers and Backup Masters

HBase Tables

Setting Java Environment


We can also communicate with HBase using Java libraries, but before accessing HBase using Java API you
need to set classpath for those libraries.

Setting the Classpath

Before proceeding with programming, set the classpath to HBase libraries in .bashrc file. Open .bashrc in any
of the editors as shown below.

$ gedit ~/.bashrc

Set classpath for HBase libraries (lib folder in HBase) in it as shown below.

export CLASSPATH = $CLASSPATH://home/hadoop/hbase/lib/*

This is to prevent the “class not found” exception while accessing the HBase using java API.

Result:
Thus the implementation for program is executed successfully.
EXPNO:7
Importing and exporting data from various database
Date:

SQL Server is very popular in Relational Database and it is used across many software industries. In MS SQL
Server, two sorts of databases are available.

 System databases
 User Databases

In this, we will learn to export and import SQL Server databases using Microsoft SQL Server. Exporting and
Importing stands as a backup plan for developers.

Step 1: Open the “Microsoft SQL Server” Click on “File”, “New” and select “Database Engine Query”.

Step 2: Create a Database

Query :

CREATE DATABASE college;

Output:
Step 3: Select the newly created table “college”.

Query :

USE college;

Output:

Step 4: Creating a Table

Query :

CREATE TABLE students( id INT NOT NULL PRIMARY KEY,


name VARCHAR(300) NOT NULL , reg _no INT NOT NULL ,
semester INT NOT NULL );

Output:
Step 5: Insert the Records

Query :

INSERT INTO students VALUES


(1,'priya',31,3),(2,'keerthi',12,1),
(3,'rahul',29,2),(4,'reyansh',38,3),
5,'lasya',47,2);

Output:
Exporting SQL Server Database:

After creating a database in “Microsoft SQL Server”, Let’s see how exporting takes place.

Step 1: Open the Object Explorer, Right-click on the Database that you want to export and click the “task”
option and select “Export Data-Tier Application”.
Step 2: Click Next and by browsing, select the destination folder in which you have to save the database file.
The filename should be as same as the database name ( here “college” ) and click “Next ” and “Finish”. You
will get a dialogue box showing the result of exporting.

Importing SQL Server Database :

Step 1: Right Click on the Database folder and select “Import Data-Tier Application” and click “Next.
Step 2: Select the file which you have exported and change the name of the database

here we changed the database name from “college” to “college_ info” and click “Next” and a dialogue box
appears showing the result of importing.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared
towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're
here to do the same for you

Result:
Thus the implementation for program is executed successfully.
.

You might also like