Apache Hive Installation With Derby Database And Beeline

Last Updated : 29 Jun, 2021

Apache hive is a data warehousing and powerful ETL(Extract Transform And Load) tool built on top of Hadoop that can be used with relational databases for managing and performing the operations on RDBMS. It is written in Java and was released by the Apache Foundation in the year 2012 for the people who are not very much comfortable with java. Hive uses HIVEQL language which syntax is very much similar to SQL syntax. HIVE supports C++, Java, and Python programming language. We can handle or query petabytes of data with hive and SQL.

Derby is also an open-source relational database tool that comes with a hive(by default) and is owned by apache. Nowadays, From an industry perspective, the derby is used for only testing purposes, and for deployment purposes, Metastore of MySql is used.

Prerequisite: Hadoop should be pre-installed.

Step 1: Download the Hive version 3.1.2 from this L ink

Step 2: Place the downloaded tar file at your desired location(In our case we are placing it in the /home directory).

Step 3: Now extract the tar file with the help of the command shown below.

tar -xvzf apache-hive-3.1.2-bin.tar.gz

Step 4: Now we have to place the hive path in the .bashrc file For that use the below command.

sudo gedit ~/.bashrc

HIVE path (add the correct path and hive version name)

export HIVE_HOME="/home/dikshant/apache-hive-3.1.2-bin"

export PATH=$PATH:$HIVE_HOME/bin

Place the HIVE path inside this .bashrc file (don't forget to save, press CTRL + S). Check lines 122 and 123 in the below image for reference.

Step 5: Now add the below property to the core-site.xml file. We can find the file in /home/{user-name}/hadoop/etc/hadoop directory. For simplicity, we have renamed my hadoop-3.1.2 folder to Hadoop only.

# to change the directory
cd /home/dikshant/hadoop/etc/hadoop/

# to list the directory content
ls

# to open and edit core-site.xml
sudo gedit core-site.xml

Property's (do not remove the previously added Hadoop properties)

<property>
<name>hadoop.proxyuser.dikshant.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.dikshant.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.groups</name>
<value>*</value>
</property>

Step 6: Now create a directory with name /tmp in HDFS with the help of the below command.

hdfs dfs -mkdir /tmp

Step 7: Use the below-given command to create a warehouse, hive, and user directory which we will use to store our tables and other data.

hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/hive
hdfs dfs -mkdir /user/hive/warehouse

Now, Check whether the directories are created successfully or not with the help of the below command.

hdfs dfs -ls -R /     #switch -R will help -ls to recursively show /(root) hdfs data

Step 8: Now give read, write, and execute permission to all the users of these created directories with the help of the below commands.

hdfs dfs -chmod ugo+rwx /tmp 

hdfs dfs -chmod ugo+rwx /user/hive/warehouse

Step 9: Go to /apache-hive-3.1.2-bin/conf directory and change the file name of hive-default.xml.template to hive-site.xml. Now in this file go to line no. 3215 and remove  because this will give you an error while initializing the derby database and since it is in the description it is not very much important for us.

Then,

Now,

Step 10: Now initialize the derby database since HIVE by default uses the derby database for storage and other perspectives. Use the below-given command (make sure you are in the apache-hive-3.1.2-bin directory ).

bin/schematool -dbType derby -initSchema

Step 11: Now launch the HiveServer2 using the below command.

hiveserver2

Step 12: Type the below commands on the different tab, To launch the beeline command shell.

cd /home/dikshant/apache-hive-3.1.2-bin/bin/

beeline -n dikshant -u jdbc:hive2://localhost:10000   (If you face any problem try to use hadoop instead of your user name)