Hive Introduction
Hive Introduction
YAHOO! CONFIDENTIAL
What is Hive?
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive Provides:
Data Units
Databases.
Tables.
Partitions. Buckets (or Clusters).
-3-
Yahoo! Confidential
Data Types:
Primitive types Integers:TINYINT, SMALLINT, INT, BIGINT. 1. Actionable insights that Boolean: BOOLEAN. enable account managers to optimize campaign Floating point numbers: FLOAT, DOUBLE . performance ad yield. String: STRING. 2. Critical network-wide insights BINARY, TIMESTAMP, DECIMAL that enable our sales force to Complex types sell more consultatively. Structs: {a INT; b INT}. Maps: M['group']. 3. Ability to create valuable packages of inventory that Arrays: ['a', 'b', 'c'], A[1] returns 'b'.to buy from make it easier us. Union: UNIONTYPE[datatype, datatype]
-4Yahoo! Confidential
Physical Layout
Warehouse directory in HDFS e.g., /user/hive/warehouse
1. Actionable insights that enable account managers to campaign Tables stored in subdirectories optimize of warehouse performance ad yield.
2. Critical network-wide insights enable our sales force to Actual data stored in flat files that sell more consultatively.
Control char-delimited text, or SequenceFiles 3. Ability to create valuable With custom SerDe, can use arbitrary format
-5-
Yahoo! Confidential
-6-
Yahoo! Confidential
-7-
Yahoo! Confidential
-8-
Yahoo! Confidential
INSERT OVERWRITE LOCAL DIRECTORY 3. Ability to create valuable packages of inventory that '/tmp/hive-sample-out' SELECT * FROM sample;
make it easier to buy from us.
-9-
Yahoo! Confidential
hive> SELECT freq, COUNT(1) AS FROM thatf2 enable our sales force to sell more shakespeare GROUP BY freq SORT BY consultatively. f2 DESC LIMIT 10; 3. Ability to create valuable
packages of inventory that make it easier tof2 buy from hive> EXPLAIN SELECT freq, COUNT(1) AS FROM us.
Built-in Functions
Mathematical: round, floor, ceil, rand, exp... Collection: size, map_keys, 1. map_values, Actionable insights that enable account managers to array_contains. optimize campaign Type Conversion: cast. performance ad yield. Date: from_unixtime, to_date, year, datediff... 2. Critical network-wide insights Conditional: if, case, coalesce. that enable our sales force to sell more consultatively. String: length, reverse, upper, trim...
3. Ability to create valuable packages of inventory that make it easier to buy from us.
- 11 -
Yahoo! Confidential
Q&A
- 12 -
Yahoo! Confidential
- 13 -
Yahoo! Confidential