0% found this document useful (0 votes)
15 views13 pages

160 P16cse5a-P16ite3a 2020052411232116

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views13 pages

160 P16cse5a-P16ite3a 2020052411232116

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

WELCOME

TOOLS-
SQOOP
Sub- topics

 Introduction
 Sqoop- definition
 Architecture of sqoop
 Working of sqoop
 Sqoop import
 Sqoop export
 feATURES OF SQOOP
 ADVANTAGES OF SQOOP
 DISADVANTAGES OF SQOOP
INTRODUCTION

 When Big Data storages and analyzers such


as MapReduce, Hive, HBase, Cassandra, Pig,
etc. of the Hadoop ecosystem came into
picture.

 They required a tool to interact with the


relational database servers for importing
and exporting the Big Data residing in them.

 Sqoop occupies a place in the Hadoop


ecosystem to provide feasible interaction
between relational database server and
Hadoop’ s HDFS.
SQOOP- DEFINITON

 Sqoop: “SQL to Hadoop and


Hadoop to SQL”.

 Tool to transfer data from


relational databases Teradata,
MySQL, PostgreSQL, Oracle,
Netezza.

 It is provided by the Apache


Software Foundation.
ARCHITECTURE OF SQOOP
WORKING OF SQOOP
SQOOP IMPORT

 The import tool imports individual


tables from RDBMS to HDFS.

 Each row in a table is treated as a


record in HDFS.

 All records are stored as text data in


text files or as binary data in Avro
and Sequence files.
SQOOP EXPORT

 The export tool exports a set of files


from HDFS back to an RDBMS.

 The files given as input to Sqoop


contain records, which are called as
rows in table.

 Those are read and parsed into a


set of records and delimited with
user-specified delimiter.
FEATURES OF SQOOP

o Full Load.
o Incremental Load.
o Parallel import/export.
o Import results of SQL query.
o Compression.
o Connectors for all major RDBMS
Databases.
o Kerberos Security Integration.
ADVANTAGES OF SQOOP

 Allows the transfer of data with a variety of


structured data stores like Postgres,
Oracle, Teradata, and so on.

 Sqoop can execute the data transfer in


parallel, so execution can be quick and
more cost effective.

 Helps to integrate with sequential data


from the mainframe.
DISADVANTAGES OF SQOOP

 It uses a JDBC connection to connect


with RDBMS based data stores, and
this can be inefficient and less
performant.

 For performing analysis, it executes


various map-reduce jobs and, at times,
this can be time consuming when
there are lot of joins if the data is in a
denormalized fashion.
THANK YOU

You might also like