0% found this document useful (0 votes)

8 views8 pages

Lab Experiments 1,2&4

The document outlines basic commands for Apache Sqoop, including connecting to a database, importing specific data, controlling parallelism, and managing imports. It also covers advanced operations like handling file formats, large objects, importing data into Hive, and exporting data back to a relational database. Additionally, it includes a PySpark SQL implementation for a Word Count program to process a text file and count word occurrences.

Uploaded by

sai.hemalatha2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

Lab Experiments 1,2&4

Uploaded by

sai.hemalatha2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1.

Apache Sqoop Basic Commands

Perform / Execute below sets of Apache Sqoop basic commands:

• Connecting a Database Server

• Selecting the Data to Import
• Free-form Query Imports
• Controlling Parallelism
• Controlling Imports

AIM

To execute fundamental Apache Sqoop operations for connecting to a

database, selecting specific data for import, performing free-form query
imports, controlling parallel execution, and managing data import processes
efficiently in a Hadoop environment.

ALGORITHM

Step 1: Connect to the Database Server

1. Open the terminal or command prompt.

2. Use the sqoop import command.
3. Specify the database connection details using --connect, --username,
and --password.
4. Define the table name using --table.

Step 2: Select Specific Data for Import

5. Use the --columns option to specify the required columns.

6. Execute the command to import only the selected columns.

Step 3: Perform Free-form Query Imports

7. Use the --query option to execute a SQL query.

8. Ensure the query contains WHERE \$CONDITIONS for parallel
execution.

Step 4: Control Parallel Execution (Parallelism)

9. Use the --num-mappers option to define the number of parallel tasks.

10. Execute the command for parallel data import.

Step 5: Control Data Imports

11. Use the --where clause to filter data.
12. Specify the target directory using --target-dir.
13. Use --delete-target-dir to remove existing data before importing.

Source code:

• hostname
• hdfs dfs -ls
• service cloudera-scm-server status
• su
• service cloudera-scm-server status
• mysql -u root -pcloudera
• show databases;
• use retail_db;
• show tables;
• select * from departments;
• hostname -f

Connect to the Database Server

• sqoop list-databases --connect jdbc:mysql:// quickstart:3306/ --

password cloudera --username root;
• sqoop list-tables --connect jdbc:mysql://quickstart:3306/retail_db --
password cloudera --username root;

Select Specific Data for Import

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

password cloudera --username root --table departments --columns
department_id, department_name;
• hadoop fs -ls /user/cloudera
• hadoop fs -cat /user/cloudera/departments/part*

Perform Free-form Query Imports

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

password cloudera --username root --table departments --target-dir
/user/cloudera/dept1;
• hadoop fs -cat /user/cloudera/dept1/part*
• sqoop import -–connect jdbc:mysql://quickstart:3306/retail_db --
password cloudera --username root --table departments –-m 3 --where
“department_id>4” --target-dir /user/cloudera/dept2;
• hadoop fs -cat /user/cloudera/dept2/part*

Control Parallel Execution

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

password cloudera --username root --table departments --num-
mappers 4

Control Data Imports

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

password cloudera --username root --table departments --target-dir
/user/cloudera/dept1;
• hadoop fs -cat /user/cloudera/dept1/part*

2. Apache Sqoop Basic Commands

Perform / Execute below sets of Apache Sqoop basic commands:

• Controlling Mapper
• File Formats
• Large Objects
• Importing Data Into Hive
• Import all tables
• Sqoop Export

AIM

To execute Apache Sqoop commands for controlling mappers, handling file

formats, processing large objects (LOBs), importing data into Hive, importing
all tables from a database, and exporting data from Hadoop to a relational
database.

ALGORITHM

Controlling Mappers
1. Open the terminal and ensure Sqoop is installed and configured.
2. Use the sqoop import command with the --num-mappers option.
3. Define the database connection details (--connect, --username, --
password).
4. Specify the table name.
5. Set the number of mappers (e.g., --num-mappers 2).
6. Execute the command to control parallel execution.

Handling File Formats

7. Use either --as-textfile, --as-avrodatafile, or --as-parquetfile to specify

the file format.
8. Define the target directory using --target-dir.
9. Execute the command to store data in the specified format.

Handling Large Objects (LOBs)

10. Use the --direct mode to speed up import (for supported

databases).
11. Use the --lob-limit option to define the maximum LOB size (e.g.,
--lob-limit 10485760 for 10MB).
12. Execute the command to import large objects.

Importing Data Into Hive

15. Use the --hive-import option to load data into Hive.

16. Define the Hive database and table using --hive-database and --
hive-table.
17. Execute the command to import data into Hive.

Importing All Tables

18. Use the sqoop import-all-tables command.

19. Specify the database connection details (--connect, --username,
--password).
20. Define the target directory using --warehouse-dir.
21. Execute the command to import all tables into Hadoop.
Sqoop Export

22. Use the sqoop export command.

23. Define the database connection details (--connect, --username, -
-password).
24. Specify the table name and HDFS source directory (--export-dir).
25. Execute the command to transfer data from HDFS to the
database.

Source Code:

1. Login to MySQL
a. mysql -u root -pcloudera
b. CREATE DATABASE retail_db; USE retail_db; CREATE TABLE
customers ( customer_id INT PRIMARY KEY, first_name
VARCHAR(50), last_name VARCHAR(50), email VARCHAR(100)
); INSERT INTO customers VALUES (1, 'John', 'Doe',
'[email protected]'); INSERT INTO customers VALUES (2,
'Jane', 'Smith', '[email protected]');
c. Exit;
2. Import Data from MySQL to HDFS (With Mapper Control)
a. sqoop import --connect
jdbc:mysql://quickstart.cloudera/retail_db --username root --
password cloudera --table customers --num-mappers 2 --
target-dir /user/cloudera/customers_data
3. Import Data in Different File Formats
a. sqoop import --connect
jdbc:mysql://quickstart.cloudera/retail_db --username root --
password cloudera --table customers --as-avrodatafile --
target-dir /user/cloudera/customers_avro
4. Handling Large Objects (BLOBs/CLOBs)
a. sqoop import --connect
jdbc:mysql://quickstart.cloudera/retail_db --username root --
password cloudera --table customers --split-by customer_id --
target-dir /user/cloudera/customers_large.
5. Import Data into Hive
a. sqoop import --connect
jdbc:mysql://quickstart.cloudera/retail_db --username root --
password cloudera --table customers --hive-import --hive-
database retail_hive --hive-table customers
6. Import All Tables from MySQL to Hive
a. sqoop import-all-tables --connect
jdbc:mysql://quickstart.cloudera/retail_db \ --username root --
password cloudera --hive-import --hive-database retail_hive
7. Export Data from HDFS to MySQL
Create a Table in MySQL to Store Exported Data
CREATE TABLE customers_export (customer_id INT PRIMARY KEY,
first_name VARCHAR(50), last_name VARCHAR(50), email
VARCHAR(100) );
8. Run Sqoop Export
a. sqoop export --connect
jdbc:mysql://quickstart.cloudera/retail_db --username root --
password cloudera --table customers_export --export-dir
/user/cloudera/customers_data --input-fields-terminated-by ','
9. Verify Data
a. hdfs dfs -ls /user/cloudera/
b. hdfs dfs -cat /user/cloudera/customers_data/part-m-00000
c. USE retail_hive;
SHOW TABLES; SELECT * FROM customers LIMIT 5;
d. mysql -u root -pcloudera
USE retail_db;
SELECT * FROM customers_export;
3. Perform word count job for a given input file using Spark SQL.

Aim

To implement a Word Count program using PySpark SQL to process a text

file, tokenize words, and count their occurrences efficiently.

Algorithm

Step 1: Import necessary libraries from pyspark.sql.

Step 2: Create a SparkSession using

SparkSession.builder.appName("WordCountSQL").getOrCreate().

Step 3: This initializes a Spark environment to process data.

Step 4: Use spark.read.text("sample.txt") to read the text file into a

DataFrame.

Step 5: The DataFrame has a single column named "value", where each row
contains a line from the file.

Step 6: Use split(col("value"), " ") to split each line into words based on
spaces.

Step 7: Use explode() to flatten the list of words, creating a row for each
word.

Step 8: Use .groupBy("word").count() to count the occurrences of each word

in the dataset.

Step 9: Use .show() to display the word counts.

Source Code:

from pyspark.sql import SparkSession

from pyspark.sql.functions import explode, split, col

# Step 1: Initialize Spark Session

spark = SparkSession.builder.appName("WordCountSQL").getOrCreate()

# Step 2: Load the Text File into DataFrame

df = spark.read.text("sample.txt") # Change to your file path

# Step 3: Process Data using SQL Functions

word_counts = (
df.select(explode(split(col("value"), " ")).alias("word")) # Split and flatten
words

.groupBy("word")

.count() # Count occurrences

# Step 4: Show Results

word_counts.show()

Result : The Word Count program using PySpark SQL to process a text file,
tokenize words, and count their occurrences was executed successfully.

Sqoop 1
No ratings yet
Sqoop 1
29 pages
This Documents Are About Apache Sqoop
No ratings yet
This Documents Are About Apache Sqoop
23 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Data Ingest
No ratings yet
Data Ingest
5 pages
Sqoop Import Techniques Guide
No ratings yet
Sqoop Import Techniques Guide
18 pages
Sqoop: Interface for RDBMS & Hadoop
No ratings yet
Sqoop: Interface for RDBMS & Hadoop
39 pages
Sqoop Demo
No ratings yet
Sqoop Demo
7 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
18 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
9 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Using Sqooptool To Transfer Data Between Hadoop and Mysql: Implementation
No ratings yet
Using Sqooptool To Transfer Data Between Hadoop and Mysql: Implementation
4 pages
Week 3
No ratings yet
Week 3
11 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
Sqoop
No ratings yet
Sqoop
4 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Sqoop Practice
No ratings yet
Sqoop Practice
5 pages
Sqoop Implementation Revised
No ratings yet
Sqoop Implementation Revised
7 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Sqoop Tool for AI & DS Students
No ratings yet
Sqoop Tool for AI & DS Students
10 pages
Sqoop Commands for MySQL Import
No ratings yet
Sqoop Commands for MySQL Import
12 pages
Fundamentals of Apache Sqoop Notes
100% (1)
Fundamentals of Apache Sqoop Notes
66 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Ex 7
No ratings yet
Ex 7
7 pages
Sqoop Import & Export Guide
No ratings yet
Sqoop Import & Export Guide
9 pages
Bda U3
No ratings yet
Bda U3
59 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
MySQL to Hive Data Import Guide
No ratings yet
MySQL to Hive Data Import Guide
14 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
Sqoop MySQL to HDFS Data Transfer Guide
No ratings yet
Sqoop MySQL to HDFS Data Transfer Guide
7 pages
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
No ratings yet
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
5 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Session9 DataIngestion SQOOP
No ratings yet
Session9 DataIngestion SQOOP
4 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Module IV
No ratings yet
Module IV
5 pages
BC Ca1,2
No ratings yet
BC Ca1,2
31 pages
Big Data Practice
No ratings yet
Big Data Practice
93 pages
Sqoop Commands for Data Engineers
No ratings yet
Sqoop Commands for Data Engineers
2 pages
Sqoop Commands for SQL Server Integration
No ratings yet
Sqoop Commands for SQL Server Integration
3 pages
Revision Solution
No ratings yet
Revision Solution
5 pages
32 BDA Exp2
No ratings yet
32 BDA Exp2
24 pages
Sqoop
No ratings yet
Sqoop
15 pages
Cloudera Msazure Hadoop Deployment Guide
No ratings yet
Cloudera Msazure Hadoop Deployment Guide
39 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
Hadoop Data Transfer with Sqoop
No ratings yet
Hadoop Data Transfer with Sqoop
21 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
Chatgptplus 1
100% (1)
Chatgptplus 1
5 pages
Power BI: M, DAX, and MDX Explained
No ratings yet
Power BI: M, DAX, and MDX Explained
18 pages
Notes Chapter 1.3 Lecture 1.3.3 (Tuple Relational Calculus)
No ratings yet
Notes Chapter 1.3 Lecture 1.3.3 (Tuple Relational Calculus)
5 pages
Dbms Answers
No ratings yet
Dbms Answers
7 pages
Ques Bank
No ratings yet
Ques Bank
10 pages
Soal Latihan Prakti Oracle Menggunakan SQL Developer
No ratings yet
Soal Latihan Prakti Oracle Menggunakan SQL Developer
21 pages
Object Databases Explained
100% (5)
Object Databases Explained
78 pages
SQL Constraints & Functions Guide
33% (3)
SQL Constraints & Functions Guide
20 pages
SQL Database Management Guide
No ratings yet
SQL Database Management Guide
34 pages
Relational Databases and SQL: Accounting Information Systems, 9e
No ratings yet
Relational Databases and SQL: Accounting Information Systems, 9e
46 pages
SQL Class 2 Assignment 2 Solution
No ratings yet
SQL Class 2 Assignment 2 Solution
5 pages
Ch06-The Relational Algebra and Calculus (Compatibility Mode) (Repaired)
No ratings yet
Ch06-The Relational Algebra and Calculus (Compatibility Mode) (Repaired)
80 pages
Football Tournament DB Design
No ratings yet
Football Tournament DB Design
12 pages
Practical File Bca-Dbms Section A
No ratings yet
Practical File Bca-Dbms Section A
20 pages
OOP Database Connectivity
No ratings yet
OOP Database Connectivity
4 pages
Comandos MYSQL
No ratings yet
Comandos MYSQL
10 pages
SQL Exercise
100% (7)
SQL Exercise
5 pages
SQL Server & Elasticsearch Maintenance Guide
No ratings yet
SQL Server & Elasticsearch Maintenance Guide
5 pages
Relational Data Model Basics
100% (2)
Relational Data Model Basics
11 pages
Encrypted Data Analysis
No ratings yet
Encrypted Data Analysis
29 pages
Cost Distributions Query
No ratings yet
Cost Distributions Query
3 pages
SQL Commands and Syntax
No ratings yet
SQL Commands and Syntax
5 pages
Exam Dbms
No ratings yet
Exam Dbms
7 pages
Current Log
No ratings yet
Current Log
29 pages
Structured SQL Query Design Guide
No ratings yet
Structured SQL Query Design Guide
21 pages
Auth0 Com Blog Sqlalchemy Orm Tutorial For Python Developers
No ratings yet
Auth0 Com Blog Sqlalchemy Orm Tutorial For Python Developers
38 pages
Differences Between Parallel and Distributed DB
100% (3)
Differences Between Parallel and Distributed DB
1 page
Hive Database Setup Guide
No ratings yet
Hive Database Setup Guide
2 pages
58 Vivek - Kumar.Gupta DBMS
No ratings yet
58 Vivek - Kumar.Gupta DBMS
27 pages
14 Database Management With Microsoft ODBC
No ratings yet
14 Database Management With Microsoft ODBC
13 pages

Lab Experiments 1,2&4

Uploaded by

Lab Experiments 1,2&4

Uploaded by

1.

Apache Sqoop Basic Commands

Perform / Execute below sets of Apache Sqoop basic commands:

• Connecting a Database Server

To execute fundamental Apache Sqoop operations for connecting to a

Step 1: Connect to the Database Server

1. Open the terminal or command prompt.

Step 2: Select Specific Data for Import

5. Use the --columns option to specify the required columns.

Step 3: Perform Free-form Query Imports

7. Use the --query option to execute a SQL query.

Step 4: Control Parallel Execution (Parallelism)

9. Use the --num-mappers option to define the number of parallel tasks.

Step 5: Control Data Imports

Connect to the Database Server

• sqoop list-databases --connect jdbc:mysql:// quickstart:3306/ --

Select Specific Data for Import

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

Perform Free-form Query Imports

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

Control Parallel Execution

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

Control Data Imports

• sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --

2. Apache Sqoop Basic Commands

Perform / Execute below sets of Apache Sqoop basic commands:

To execute Apache Sqoop commands for controlling mappers, handling file

Handling File Formats

7. Use either --as-textfile, --as-avrodatafile, or --as-parquetfile to specify

Handling Large Objects (LOBs)

10. Use the --direct mode to speed up import (for supported

Importing Data Into Hive

15. Use the --hive-import option to load data into Hive.

Importing All Tables

18. Use the sqoop import-all-tables command.

22. Use the sqoop export command.

To implement a Word Count program using PySpark SQL to process a text

Step 1: Import necessary libraries from pyspark.sql.

Step 2: Create a SparkSession using

Step 3: This initializes a Spark environment to process data.

Step 4: Use spark.read.text("sample.txt") to read the text file into a

Step 8: Use .groupBy("word").count() to count the occurrences of each word

Step 9: Use .show() to display the word counts.

from pyspark.sql import SparkSession

from pyspark.sql.functions import explode, split, col

# Step 1: Initialize Spark Session

# Step 2: Load the Text File into DataFrame

df = spark.read.text("sample.txt") # Change to your file path

# Step 3: Process Data using SQL Functions

.count() # Count occurrences

# Step 4: Show Results

You might also like