0% found this document useful (0 votes)

789 views6 pages

Hive Interview Questions Answers

Hive is a data warehouse software that facilitates querying and managing large datasets residing in distributed storage. It uses a metastore database to store metadata about Hive tables. When queries are executed, Hive uses various classes to read from and write to HDFS files in different formats like text, sequence files, RCFiles etc. The query processor component converts SQL queries to mapreduce jobs. The metastore can be embedded, local or remote and typically uses databases like Derby, MySQL or PostgreSQL. Common SerDe classes used include MetadataTypedColumnsetSerDe for delimited formats, ThriftSerDe for thrift objects, and DynamicSerDe.

Uploaded by

rksekhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

789 views6 pages

Hive Interview Questions Answers

Uploaded by

rksekhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Hive Basics
Hive Processing
Data Storage in Hive
Hive Catalog
SerDe Classes
SQL vs HiveQL

Hive

Interview Questions and Answers

1 What is Hive?

Hive is a data warehouse software which is used for facilitates querying and managing large
data sets residing in distributed storage.

Hive language almost look like SQL language called HiveQL. Hive also allows traditional map
reduce programs to customize mappers and reducers when it is inconvenient or inefficient to
execute the logic in HiveQL (User Defined Functions UDFS)

2 What is Hive Metastore?

Hive metastore is a database that stores metadata about your Hive tables (eg. Table name,
column names and types,table location, storage handler being used, number of buckets in the
table, sorting columns if any, partition columns if any, etc.).

When you create a table,this metastore gets updated with the information related to the new
table which gets queried when you issue queries on that table.

Hive is a central repository of hive metadata. it has 2 parts services and data. by default it uses
derby db in local disk. it is referred as embedded metastore configuration. It tends to the
limitation that only one session can be served at any given point of time.

3 Which classes are used by the Hive to Read and Write HDFS Files?

Following classes are used by Hive to read and write HDFS files

•TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text

file format.

•SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in

hadoop SequenceFile format.

4 What is Object Inspector functionality?

Hive uses Object Inspector to analyze the internal structure of the row object and also the
structure of the individual columns.

Object Inspector provides a uniform way to access complex objects that can be stored in
multiple formats in the memory, including:

•Instance of a Java class (Thrift or native Java)

•A standard Java object (we use java.util.List to represent Struct and Array, and use
java.util.Map to represent Map)

•A lazily-initialized object (For example, a Struct of string fields stored in a single Java string
object with starting offset for each field).

USA: +1 469 522 9879 | INDIA : +91 988 599 1924 | EMAIL ID: [email protected]
Hive

Interview Questions and Answers

A complex object can be represented by a pair of ObjectInspector and Java Object. The
ObjectInspector not only tells us the structure of the Object, but also gives us ways to access the
internal fields inside the Object.

5 What is the functionality of Query Processor in Apached Hive?

This component implements the processing framework for converting SQL to a graph of
map/reduce jobs and the execution time framework to run those jobs in the order of
dependencies.

6 If you run hive as a server, what are the available mechanism for connecting it from
application?

There are following ways by which you can connect with the Hive Server:

1. Thrift Client: Using thrift you can call hive commands from a various programming
languages e.g. C++,Java, PHP, Python and Ruby.

2. JDBC Driver : It supports the Type 4 (pure Java) JDBC Driver

3. ODBC Driver: It supports ODBC protocol.

7 What kind of data warehouse application is suitable for Hive?

Hive is not a full database. The design constraints and limitations of Hadoop and HDFS impose
limits on what Hive can do.

Hive is most suited for data warehouse applications, where

1) Relatively static data is analyzed,

2) Fast response times are not required, and

3) When the data is not changing rapidly.

Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing. It’s closer
to being an OLAP tool, Online Analytic Processing. So, Hive is best suited for data warehouse
applications, where a large data set is maintained and mined for insights, reports, etc.

8 Which database hive used for Metadata store? What are the metastore configuration hive
supports?

Hive can use derby by default and can have three type metastore configuration. It supports

• Embedded Metastore

USA: +1 469 522 9879 | INDIA : +91 988 599 1924 | EMAIL ID: [email protected]
Hive

Interview Questions and Answers

• Local Metastore
• Remote Metastore

Embedded uses derby db to store data backed by file stored in disk. It can’t support multi
session at same time and services of metastore runs in same JVM as hive.

Local Metastore:

In this case we need to have stand alone db like MySql, which would be communicated by
metastore services.Benefit of this approach is, it can support multiple hive session at a time. and
service still runs in same process as Hive.

Remote Metastore:

Metastore and Hive service would run in different process. with stand alone Mysql kind db.

9 what are Binary storage formats hive supports?

Hive natively supports text file format, however hive also has support for other binary formats.
Hive supports Sequence, Avro, RCFiles.

1. Sequence files :-General binary format. splittable, compressible and row oriented. a typical
example can be.if we have lots of small file, we may use sequence file as a container, where file
name can be a key andcontent could stored as value. it support compression which enables
huge gain in performance.

2. Avro datafiles:-Same as Sequence file splittable, compressible and row oriented except
support of schema evolution and multilingual binding support.

3. RCFiles :-Record columnar file, it’s a column oriented storage file. it breaks table in row split. in
each split stores that value of first row in first column and followed sub subsequently..

10 Is it possible to use same metastore by multiple users, in case of embedded hive?

No, it is not possible to use metastore in sharing mode. It is recommended to use standalone
“real” database like MySQL or PostGresSQL.

11 What is Apache Hcatalog ?

HCatalog is built on top of the Hive metastore and incorporates Hive’s DDL. Apache Hcatalog is
a table and data management layer for hadoop,we can process the data on Hcatalog by
using APache pig,Apache Mapreduce and Apache Hive. There is no need to worry in Hcatalog
where data is stored and which format of data generated.

HCatalog displays data from RCFile format, text files, or sequence files in a tabular view. It also
provides REST. APIs so that external systems can access these tables’ metadata.

USA: +1 469 522 9879 | INDIA : +91 988 599 1924 | EMAIL ID: [email protected]
Hive

Interview Questions and Answers

12 What is the work of Hive/Hcatalog ?

Hive/HCatalog also enables sharing of data structure with external systems including traditional
data management tools.

13 What is WebHCatServer ?

The WebHcatServer provides a REST – like web API for Hcatalog. Applications make HTTP
requests to run Pig,Hive, and HCatalog DDL from within applications.

14 What is Hive Present Version ?

Hive-0.13.1

15 What is the stable version of Hive ?

Hive-0.12.0

16 Is it possible to create multiple table in hive for same data?

Hive creates schema and append on top of an existing data file. One can have multiple
schema for one data file,schema would be saved in hive’s metastore and data will not be
parsed read or serialized to disk in given schema.When s/he will try to retrieve data schema will
be used. Lets say if my file have 5 column(Id,Name,Class,Section,Course) we can have multiple
schema by choosing any number of column.

17 Wherever (Different Directory) I run hive query, it creates new metastore_db, please explain
the reason for it?

Whenever you run the hive in embedded mode, it creates the local metastore. And before
creating the metastore it looks whether metastore already exist or not. This property is defined in
configuration file hive-site.xml. Property is
“javax.jdo.option.ConnectionURL”withdefaultvalue“jdbc:derby:;databaseName=metastore_db;
create=true”. So to change the behavior change the location to absolute path, so metastore
will be used from that location.

18 What is SerDe in Apache Hive ?

A SerDe is a short name for a Serializer Deserializer.

Hive uses SerDe (and FileFormat) to read and write data from tables. An important concept
behind Hive is that it DOES NOT own the Hadoop File System (HDFS) format that data is stored in.
Users are able to write files to HDFS with whatever tools/mechanism takes their fancy(“CREATE
EXTERNAL TABLE” or “LOAD DATA INPATH,” ) and use Hive to correctly “parse” that file format in a
way that can be used by Hive.

A SerDe is a powerful (and customizable) mechanism that Hive uses to “parse” data stored in
HDFS to be used by Hive.

USA: +1 469 522 9879 | INDIA : +91 988 599 1924 | EMAIL ID: [email protected]
Hive

Interview Questions and Answers

19 Give examples of the SerDe classes which hive uses to Serialize and Deserilize data ?

Hive currently use these SerDe classes to serialize and deserialize data:

• MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV,

tab-separated control-A separated records (quote is not supported yet.)

• ThriftSerDe: This SerDe is used to read/write thrift serialized objects. The class file for the Thrift
object must be loaded first.

• DynamicSerDe: This SerDe also read/write thrift serialized objects, but it understands thrift DDL
so the schema of the object can be provided at runtime. Also it supports a lot of different
protocols,including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in
delimited records).

20 How do you write your own custom SerDe ?

In most cases, users want to write a Deserializer instead of a SerDe, because users just want to
read their own data format instead of writing to it.

•For example, the RegexDeserializer will deserialize the data using the configuration parameter
‘regex’, and possibly a list of column names.

•If your SerDe supports DDL (basically, SerDe with parameterized columns and column types),
you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe
from scratch. The reason is that the framework passes DDL to SerDe through “thrift DDL” format,
and it’s non-trivial to write a “thrift DDL” parser.

21 What are the types of tables in Hive?

There are two types of tables.

1. Managed tables.

2. External tables.

Only the drop table command differentiates managed and external tables. Otherwise, both
type of tables are very similar. I When you drop an internal table, it drops the data, and it also
drops the metadata. When you drop an external table, it only drops the meta data. That means
hive is ignorant of that data now. It does not touch the data itself.

22 Does Hive support record level Insert, delete or update?

Hive does not provide record-level update, insert, or delete. Henceforth, Hive does not provide
transactions too. However, users can go with CASE statements and built in functions of Hive to
satisfy the above DML operations. Thus, a complex update query in a RDBMS may need many
lines of code in Hive.

USA: +1 469 522 9879 | INDIA : +91 988 599 1924 | EMAIL ID: [email protected]
Hive

Interview Questions and Answers

23 Difference between SQL and HiveQL ?

24 what is Partition?

To increase performance Hive has the capability to partition data

1 The values of partitioned column divide a table into segments

2 Entire partitions can be ignored at query time

3 Similar to relational databases’ indexes but not as granular

25 what is bucketing?

Mechanism to query and examine random samples of data

• Break data into a set of buckets based on a hash function of a "bucket column" .

USA: +1 469 522 9879 | INDIA : +91 988 599 1924 | EMAIL ID: [email protected]

Common questions

The Hive Metastore serves as a crucial component of Hive's architecture by storing metadata about Hive tables, including table names, column information, storage details, and partitioning metadata. This allows Hive to facilitate efficient query execution because the metastore provides the necessary structure information that allows Hive to generate optimized query plans without having to interpret raw data repeatedly . Its configuration (e.g., embedded, local, or remote) can also impact how multiple sessions are managed, affecting concurrency and performance .

Using Hive's embedded metastore is not advisable in environments with multiple concurrent users because the embedded configuration supports only a single session at a time, which limits its ability to handle concurrent queries efficiently. This setup uses Derby database on local disk, which cannot service multiple users simultaneously. For multi-user environments, it is recommended to use a standalone "real" database such as MySQL or PostgreSQL in a remote metastore configuration, facilitating concurrent queries and improving performance .

The primary difference between Hive Managed Tables and External Tables lies in their data lifecycle management. When a Managed Table is dropped, Hive deletes both the metadata and the data stored in HDFS. In contrast, dropping an External Table only removes the metadata reference from Hive, leaving the actual data intact on HDFS. This distinction provides flexibility, allowing users to decide whether Hive should manage the data's physical storage as well as its logical schema .

Apache HCatalog enhances Hive's data usability by acting as a table and data management layer on top of the Hive Metastore. It abstracts the complexity of data storage formats and locations, providing a tabular view of data irrespective of its actual format (e.g., RCFile, text files, sequence files). HCatalog allows different Hadoop components like Pig, Hive, and MapReduce to process data seamlessly, offering REST APIs for external systems to access data metadata. This promotes data interoperability and integration with external and traditional data management systems .

Hive's lack of support for record-level insert, update, or delete operations implies that it is not suitable for applications requiring frequent, granular data modifications or transactional integrity, such as those handled by traditional RDBMS. Users need to rely on batch processing or write complex queries using CASE statements and built-in functions for updates, which may lead to inefficiencies in performing such operations. Therefore, Hive is more aligned with analytical tasks over large datasets rather than real-time transactional processing .

Hive enhances query performance through partitioning by dividing tables into segments based on the values of a column, allowing Hive to access only relevant partitions instead of scanning entire tables. This reduces the amount of data processed during queries, similar to how indexes work in traditional databases. However, unlike indexes, partitions are not as granular, so they may not offer the same performance improvements for fine-grained data queries or small datasets. The trade-off comes in the form of increased disk space usage and maintenance complexity due to the potential for large numbers of partitions being created .

Hive's Object Inspector functionality enhances data processing by providing a uniform way to access and analyze the internal structure of complex data objects in memory. It allows Hive to handle various data formats efficiently by facilitating the reading of complex objects whether they are instances of Java classes or standard Java objects. This allows Hive to manage its data parsing and processing tasks flexibly, accommodating the diverse data storage methodologies within the Hadoop ecosystem .

Hive provides several connectivity mechanisms for applications, including Thrift Client, JDBC Driver, and ODBC Driver. The Thrift Client allows Hive commands to be executed from various programming languages such as C++, Java, PHP, Python, and Ruby, making it highly versatile. The JDBC Driver supports Type 4 (pure Java), facilitating direct communication with Hive from Java applications. The ODBC Driver enables applications that follow the ODBC protocol to interface with Hive. These methods enhance Hive's accessibility and flexibility, allowing integration into a wide variety of systems and leveraging existing application infrastructure .

Hive is most suitable for data warehouse applications where the data is relatively static, fast response times are not critical, and the data changes infrequently. This suitability arises because Hive is designed for OLAP (Online Analytical Processing) rather than OLTP (Online Transaction Processing). Its architecture is optimized for querying and managing large datasets over distributed storage with a focus on complex query processing rather than transactional operations, which are better suited to traditional databases .

Hive supports several storage formats including Sequence Files, Avro Data Files, and RCFiles. Sequence Files are binary, splittable, compressible, and row-oriented, making them suitable for storing large amounts of data with compression benefits. Avro Data Files, similar to Sequence Files, support schema evolution and multilingual bindings, offering flexibility when the schema may change over time. RCFiles, or Record Columnar Files, are column-oriented and enhance performance by allowing specific columns to be read without the need to process entire rows. This diversity in storage formats allows Hive to optimize for performance and storage efficiency based on the specific needs of the data being stored .

Some of The Frequently Asked Interview Questions For Hadoop Developers Are
100% (1)
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
72 pages
100 Interview Questions On Hadoop - Hadoop Online Tutorials
100% (1)
100 Interview Questions On Hadoop - Hadoop Online Tutorials
22 pages
Top 100 Hadoop Interview Questions and Answers 2016
No ratings yet
Top 100 Hadoop Interview Questions and Answers 2016
21 pages
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
No ratings yet
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
10 pages
Real Time Hadoop Interview Questions From Various Interviews
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
6 pages
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
Spark Sample Resume 2
100% (1)
Spark Sample Resume 2
7 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
Top Sqoop Interview Questions
No ratings yet
Top Sqoop Interview Questions
6 pages
Spark Interview Q&A: Key Insights
No ratings yet
Spark Interview Q&A: Key Insights
10 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Hadoop JobTracker Explained
No ratings yet
Hadoop JobTracker Explained
8 pages
6 Years of Experience in Functional, DB and ETL Testing
No ratings yet
6 Years of Experience in Functional, DB and ETL Testing
3 pages
HDFS Interview Prep Guide
No ratings yet
HDFS Interview Prep Guide
29 pages
Using Apache Spark in Local Mode
No ratings yet
Using Apache Spark in Local Mode
56 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Hadoop on Commodity Hardware Explained
No ratings yet
Hadoop on Commodity Hardware Explained
28 pages
Big Data Consultant Profile: Srikanth Sampathkumar
No ratings yet
Big Data Consultant Profile: Srikanth Sampathkumar
7 pages
1.hadoop Admin Brochure
No ratings yet
1.hadoop Admin Brochure
11 pages
Data Warehousing Interview Questions
No ratings yet
Data Warehousing Interview Questions
56 pages
Hadoop Interview Questions Guide
100% (1)
Hadoop Interview Questions Guide
34 pages
Data Engineer Interview Questions Guide
No ratings yet
Data Engineer Interview Questions Guide
16 pages
Hadoop MapReduce Interview Q&A Guide
No ratings yet
Hadoop MapReduce Interview Q&A Guide
7 pages
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
No ratings yet
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
8 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Spark Application Deployment Guide
No ratings yet
Spark Application Deployment Guide
18 pages
Data Warehousing Interview Questions and Answers
No ratings yet
Data Warehousing Interview Questions and Answers
6 pages
Overview of Big Data Platforms
No ratings yet
Overview of Big Data Platforms
82 pages
Comprehensive Big Data and Hadoop Course
No ratings yet
Comprehensive Big Data and Hadoop Course
17 pages
DVS Apache Spark Course Overview
No ratings yet
DVS Apache Spark Course Overview
2 pages
Hive Architecture and Query Operations
No ratings yet
Hive Architecture and Query Operations
36 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Incremental Loading For Dimension Table
100% (1)
Incremental Loading For Dimension Table
3 pages
Lead Data Engineer Resume Example
No ratings yet
Lead Data Engineer Resume Example
1 page
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Spark Big Data Tuning Guide
100% (1)
Spark Big Data Tuning Guide
20 pages
Apache Cassandra Sample Resume
No ratings yet
Apache Cassandra Sample Resume
17 pages
Deloitte Scenario-Based Questions in Spark
No ratings yet
Deloitte Scenario-Based Questions in Spark
7 pages
Introduction to SQL Basics and Usage
No ratings yet
Introduction to SQL Basics and Usage
25 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Data Migration and CDC Tasks
No ratings yet
Data Migration and CDC Tasks
11 pages
Sqoop Commands for SQL Server Integration
No ratings yet
Sqoop Commands for SQL Server Integration
3 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Hive Real Life Use Cases - AcadGild Blog
No ratings yet
Hive Real Life Use Cases - AcadGild Blog
19 pages
IT Analyst with Big Data & Automation Skills
No ratings yet
IT Analyst with Big Data & Automation Skills
2 pages
PracticeExam DCADAS3 Scala 1
No ratings yet
PracticeExam DCADAS3 Scala 1
27 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
Big Data Engineer Interview Questions
No ratings yet
Big Data Engineer Interview Questions
1 page
Spark Interview Prep Guide
No ratings yet
Spark Interview Prep Guide
14 pages
Azure Databricks Interview Guide
No ratings yet
Azure Databricks Interview Guide
17 pages
Aggregated Function in HIVE
No ratings yet
Aggregated Function in HIVE
5 pages
Hive Commands
No ratings yet
Hive Commands
3 pages
Hive Query Execution and Data Management
75% (4)
Hive Query Execution and Data Management
17 pages
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Scala Basic Interview Questions
No ratings yet
Scala Basic Interview Questions
16 pages
Guide To Hive Interviews
No ratings yet
Guide To Hive Interviews
46 pages
Understanding Hive: Key Concepts and Features
No ratings yet
Understanding Hive: Key Concepts and Features
26 pages
Super 25 Unit 4 Notes
No ratings yet
Super 25 Unit 4 Notes
16 pages
TH ST ST ND ST
No ratings yet
TH ST ST ND ST
1 page
Nanyata Academy: Worksheet Name: Real Number
No ratings yet
Nanyata Academy: Worksheet Name: Real Number
3 pages
ID Card Issuance For AY 2025-2026 - Important Instructions: (Navy Blue T-Shirt and Khaki Trousers)
No ratings yet
ID Card Issuance For AY 2025-2026 - Important Instructions: (Navy Blue T-Shirt and Khaki Trousers)
2 pages
1 Year Basic Advaita Vedata Course Details
No ratings yet
1 Year Basic Advaita Vedata Course Details
2 pages
Altair Licensing Quick Installation Guide PDF
No ratings yet
Altair Licensing Quick Installation Guide PDF
15 pages
ADMT Migration Guide
No ratings yet
ADMT Migration Guide
269 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
29 pages
CODESYS Development System V3: Product Description
No ratings yet
CODESYS Development System V3: Product Description
2 pages
Ibm SPSS: University of Eastern Philippines
No ratings yet
Ibm SPSS: University of Eastern Philippines
5 pages
Self Help Group Management System
No ratings yet
Self Help Group Management System
2 pages
Student Advisory Management System Development
No ratings yet
Student Advisory Management System Development
12 pages
Teradata DBMS Quick Reference Guide
No ratings yet
Teradata DBMS Quick Reference Guide
66 pages
2.2.2.3 Lab - Diagnostic Software
No ratings yet
2.2.2.3 Lab - Diagnostic Software
1 page
OpenText Archive Server 10.1.1 - Administration Guide PDF
78% (9)
OpenText Archive Server 10.1.1 - Administration Guide PDF
351 pages
Case Study: Vodafone Egypt
50% (2)
Case Study: Vodafone Egypt
2 pages
Multimedia Lab 1
100% (1)
Multimedia Lab 1
83 pages
Open Warping Cons T
No ratings yet
Open Warping Cons T
22 pages
ANALISIS PENILAIAN KURIKULUM MERDEKA - XLSM
No ratings yet
ANALISIS PENILAIAN KURIKULUM MERDEKA - XLSM
133 pages
Pragathi Pagade: HR Aspirant Profile
No ratings yet
Pragathi Pagade: HR Aspirant Profile
1 page
IOS 6 Cydia Tweaks Compatibility. No Piracy Tweaks.
No ratings yet
IOS 6 Cydia Tweaks Compatibility. No Piracy Tweaks.
15 pages
The Importance of File Extensions
No ratings yet
The Importance of File Extensions
9 pages
Manual For Tripmate
100% (1)
Manual For Tripmate
1 page
Dlink Boxee Box DSM 380 User 'Manual PDF
No ratings yet
Dlink Boxee Box DSM 380 User 'Manual PDF
81 pages
MVC Step by Step Tutorial
No ratings yet
MVC Step by Step Tutorial
115 pages
DreamHome Case Study
No ratings yet
DreamHome Case Study
27 pages
FLYME OS Statusbar
No ratings yet
FLYME OS Statusbar
10 pages
Backup and DR Proposal
100% (1)
Backup and DR Proposal
12 pages
BCA - Syllabusregulation VBU
No ratings yet
BCA - Syllabusregulation VBU
59 pages
EMC Symmetrix Command Reference Guide
No ratings yet
EMC Symmetrix Command Reference Guide
5 pages
Assignment: Machine Learning Engineer: Problem Description 1 (NLP)
No ratings yet
Assignment: Machine Learning Engineer: Problem Description 1 (NLP)
1 page
Data Logegr Sheet - Tempmate-S1-V2-Datasheet-V1.5-EN
No ratings yet
Data Logegr Sheet - Tempmate-S1-V2-Datasheet-V1.5-EN
2 pages
UML Diagram Case Study
No ratings yet
UML Diagram Case Study
25 pages
User's Manual: Assist
75% (8)
User's Manual: Assist
102 pages
PRAS Plug-In 8.0 SP1 Installation Instructions
No ratings yet
PRAS Plug-In 8.0 SP1 Installation Instructions
22 pages

Hive Interview Questions Answers

Uploaded by

Hive Interview Questions Answers

Uploaded by

Hive

Interview Questions and Answers

2 What is Hive Metastore?

•TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text

•SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in

4 What is Object Inspector functionality?

•Instance of a Java class (Thrift or native Java)

Interview Questions and Answers

5 What is the functionality of Query Processor in Apached Hive?

2. JDBC Driver : It supports the Type 4 (pure Java) JDBC Driver

3. ODBC Driver: It supports ODBC protocol.

7 What kind of data warehouse application is suitable for Hive?

Hive is most suited for data warehouse applications, where

1) Relatively static data is analyzed,

2) Fast response times are not required, and

3) When the data is not changing rapidly.

Interview Questions and Answers

9 what are Binary storage formats hive supports?

10 Is it possible to use same metastore by multiple users, in case of embedded hive?

11 What is Apache Hcatalog ?

Interview Questions and Answers

12 What is the work of Hive/Hcatalog ?

14 What is Hive Present Version ?

15 What is the stable version of Hive ?

16 Is it possible to create multiple table in hive for same data?

18 What is SerDe in Apache Hive ?

A SerDe is a short name for a Serializer Deserializer.

Interview Questions and Answers

• MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV,

20 How do you write your own custom SerDe ?

21 What are the types of tables in Hive?

There are two types of tables.

22 Does Hive support record level Insert, delete or update?

Interview Questions and Answers

23 Difference between SQL and HiveQL ?

To increase performance Hive has the capability to partition data

1 The values of partitioned column divide a table into segments

2 Entire partitions can be ignored at query time

3 Similar to relational databases’ indexes but not as granular

Mechanism to query and examine random samples of data

Common questions

What role does the Hive Metastore play in Hive's architecture, and how does it impact data querying?

What role does the Hive Metastore play in Hive's architecture, and how does it impact data querying?

Why is it not advisable to use Hive's embedded metastore for environments with multiple concurrent users?

Why is it not advisable to use Hive's embedded metastore for environments with multiple concurrent users?

In what ways do Hive Managed Tables differ from External Tables in terms of data lifecycle management?

In what ways do Hive Managed Tables differ from External Tables in terms of data lifecycle management?

How does Apache HCatalog enhance the usability and functionality of data managed by Hive?

How does Apache HCatalog enhance the usability and functionality of data managed by Hive?

Explain the implications of non-support for record-level insert, update, or delete operations in Hive.

Explain the implications of non-support for record-level insert, update, or delete operations in Hive.

How does Hive leverage partitions to enhance query performance, and what limitations might exist compared to traditional database indexing?

How does Hive leverage partitions to enhance query performance, and what limitations might exist compared to traditional database indexing?

How does Hive's Object Inspector functionality enhance data processing within Hadoop?

How does Hive's Object Inspector functionality enhance data processing within Hadoop?

What mechanisms does Hive provide for connecting applications to its server, and what are the benefits of these methods?

What mechanisms does Hive provide for connecting applications to its server, and what are the benefits of these methods?

In what scenarios is Hive most suitable, and why is it considered inappropriate for certain types of applications?

In what scenarios is Hive most suitable, and why is it considered inappropriate for certain types of applications?

Describe the types of storages formats that Hive supports and explain the advantages of using these specific formats.

Describe the types of storages formats that Hive supports and explain the advantages of using these specific formats.

You might also like