Sqoop import OR export

数据来源:
1:关系性数据库 sqoop(Sql To Hadoop)
2:文件(Flume实时抽取数据)

任务调度:Oozie

hadoop生态系统中重要的框架,需要监控(统一WEB UI界面)

1:下载Sqoop

官网:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

Sqoop依赖于hadoop,Sqoop底层实现就是MapReduce

sqoop使用:以mysql数据库为例

1:查看mysql中有几个database:
sqoop list-databases \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123

2:import(-m,--num-mappers <n> Use 'n' map)

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--table emp \
-m 1

2.1:import(--target-dir  HDFS plain)
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--table emp \
--target-dir /user/beifeng/sqoop/import_emp \
-m 1

2.2:import
--where <where clause>            WHERE clause to use during
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--where "city='sec-bad'" \
--target-dir  /user/beifeng/sqoop/import_emp_add \
--table emp_add \
--m 1

2.3:import(指定导入文件格式)
--as-avrodatafile                Imports data to Avro data files
--as-parquetfile                 Imports data to Parquet files
--as-sequencefile               Imports data to SequenceFiles
--as-textfile                       Imports data as plain text

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp2 \
--table emp \
--as-parquetfile \
--m 1

2.4:import(只导出某个特定的列)
--columns <col,col,col...>     Columns to import from table
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp4 \
--table emp \
--m 1 \
--columns name,salary

2.5:在实际项目中,要处理的数据,需要进行初步清洗和过滤(查询语句:query())
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--query 'select name,salary from emp where $CONDITIONS and salary < 50000' \
--target-dir  /user/beifeng/sqoop/import_emp5 \
--m 1

2.6:压缩

-z,--compress                                    Enable compression
--compression-codec <codec>            Compression codec to use for import
--fields-terminated-by <char>             Sets the field separator character
--delete-target-dir                               Imports data in delete mode

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp6 \
--table emp \
--m 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.DefaultCodec \
--fields-terminated-by '\t' \
--delete-target-dir

2.7:import(increment:增量数据导入)

增量数据的导入:有一个唯一标识符,通常这个表都有一个字段,类似于插入时间createtime
query: createtime >= startTime and create <= endTime;

sqoop:
Incremental import arguments:
--check-column <column>        Source column to check for incremental change
--incremental <import-type>    Define an incremental import of type 'append' or 'lastmodified'
--last-value <value>        Last imported value in the incremental check column

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp7 \
--table emp \
--incremental append \
--check-column salary \
--last-value 4 \
--m 1


log截取:
INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`salary`) FROM `emp`
INFO tool.ImportTool: Incremental import based on column `salary`
INFO tool.ImportTool: Lower bound value: 4
INFO tool.ImportTool: Upper bound value: 50000


2.8:import(--direct)

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp7 \
--delete-target-dir \
--table emp \
--direct \
--m 1


3:export
把mysql数据库的数据导入到文件系统中并且进行压缩,在导入到Hive表中(必须要先将数据进行压缩);
在HIVE数据库中创建表:
drop table if exists default.hive_emp_zlib;
CREATE TABLE default.hive_emp_zlib (
id string,
name string,
deg string,
salary string,
dept string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath '/user/beifeng/sqoop/import_emp6' into table default.hive_emp_zlib;


#####################导出数据RDBMS#########################
HIVE TABLE

table:hiveserver2进行jdbc方式查询数据
hdfs file:export ---> MySQL/oracle/db2

创建my_user表,准备数据:
create table my_user(
id varchar(10),
name varchar(10),
password varchar(10)
);

hadoop dfs -mkdir /user/beifeng/sqoop/exp/user
hadoop -put user.txt /user/beifeng/sqoop/exp/user

1:export --> MYSQL
--export-dir <dir>                  HDFS source path for the export
--fields-terminated-by <char>      Sets the field separator character

sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--export-dir  /user/beifeng/sqoop/exp/user \
--table my_user \
--m 1

##################### mysql import To Hive#########################

Hive arguments:
--create-hive-table                   
Fail if the target hive table exists

--hive-database <database-name>       
Sets the database name to use when importing to hive

--hive-delims-replacement <arg>       
Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string

--hive-drop-import-delims             
Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields

--hive-home <dir>                    
Override $HIVE_HOME

--hive-import                         
Import tables into Hive(Uses Hive's default delimiters if none are set.)

--hive-overwrite                            
Overwrite existing data in the Hive table

--hive-partition-key <partition-key>       
Sets the partition key to use when importing to hive

--hive-partition-value <partition-value>    
Sets the partition value to use when importing to hive

--hive-table <table-name>                
Sets the table name to use when importing to hive

--map-column-hive <arg>                     
Override mapping for specific column to hive types.

1:创建Hive表:

create table default.my_user(
id string,
name string,
password string
)
row format delimited fields terminated by "\t";

2:执行导入

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_my_user \
--delete-target-dir \
--table my_user \
--hive-database default \
--hive-import \
--hive-table my_user \
--fields-terminated-by '\t' \
--m 1

3:执行导出(将Hive中的数据导出到MySQL中)
--input-fields-terminated-by <char>

sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--export-dir  /user/hive/warehouse/my_user \
--table my_user2 \
--input-fields-terminated-by '\t' \
--m 1

4:MYSQL中创建my_user2表

create table my_user2(
id varchar(10),
name varchar(10),
password varchar(10)
);

############################ --options-file ###################################

创建一个sqoop-import-hdfs.txt文件:

import
--connect
jdbc:mysql://localhost:3306/sqoop
--username
wql
--password
root123
--target-dir
/user/beifeng/sqoop/import_my_user
--delete-target-dir
--table
my_user
--hive-database
default
--hive-import
--hive-table
my_user
--fields-terminated-by
'\t'
--m
1

sqoop --options-file /home/wql/app/sqoop/testdata/sqoop-import-hdfs.txt


实例:
$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST
等价于:
$ sqoop --options-file /users/homer/work/import.txt --table TEST

 

[root@node ~]# start-dfs.sh Starting namenodes on [node] Last login: 二 7月 8 16:00:18 CST 2025 from 192.168.1.92 on pts/0 Starting datanodes Last login: 二 7月 8 16:00:38 CST 2025 on pts/0 Starting secondary namenodes [node] Last login: 二 7月 8 16:00:41 CST 2025 on pts/0 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# start-yarn.sh Starting resourcemanager Last login: 二 7月 8 16:00:45 CST 2025 on pts/0 Starting nodemanagers Last login: 二 7月 8 16:00:51 CST 2025 on pts/0 [root@node ~]# mapred --daemon start historyserver [root@node ~]# jps 3541 ResourceManager 4007 Jps 2984 NameNode 3944 JobHistoryServer 3274 SecondaryNameNode [root@node ~]# mkdir -p /weblog [root@node ~]# cat > /weblog/access.log << EOF > 192.168.1.1,2023-06-01 10:30:22,/index.html > 192.168.1.2,2023-06-01 10:31:15,/product.html > 192.168.1.1,2023-06-01 10:32:45,/cart.html > 192.168.1.3,2023-06-01 11:45:30,/checkout.html > 192.168.1.4,2023-06-01 12:10:05,/index.html > 192.168.1.2,2023-06-01 14:20:18,/product.htm > EOF [root@node ~]# ls /weblog access.log [root@node ~]# hdfs dfs -mkdir -p /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# hdfs dfs -put /weblog/access.log /weblog/raw/ SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# hdfs dfs -ls /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items -rw-r--r-- 3 root supergroup 269 2025-07-08 16:03 /weblog/raw/access.log [root@node ~]# cd /weblog [root@node weblog]# mkdir weblog-mapreduce [root@node weblog]# cd weblog-mapreduce [root@node weblog-mapreduce]# touch CleanMapper.java [root@node weblog-mapreduce]# vim CleanMapper.java import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class CleanMapper extends Mapper<LongWritable, Text, Text, NullWritable> { public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] fields = line.split(","); if(fields.length == 3) { String ip = fields[0]; String time = fields[1]; String page = fields[2]; if(ip.matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}")) { String outputLine = ip + "," + time + "," + page; context.write(new Text(outputLine), NullWritable.get()); } } } } [root@node weblog-mapreduce]# touch CleanReducer.java [root@node weblog-mapreduce]# vim CleanReducer.java import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class CleanReducer extends Reducer<Text, NullWritable, Text, NullWritable> { public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { context.write(key, NullWritable.get()); } } [root@node weblog-mapreduce]# touch LogCleanDriver.java [root@node weblog-mapreduce]# vim LogCleanDriver.java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class LogCleanDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Web Log Cleaner"); job.setJarByClass(LogCleanDriver.class); job.setMapperClass(CleanMapper.class); job.setReducerClass(CleanReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(NullWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# javac -classpath $(hadoop classpath) -d . *.java [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.class CleanReducer.class LogCleanDriver.class CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# jar cf logclean.jar *.class [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.class CleanReducer.class LogCleanDriver.class logclean.jar CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items -rw-r--r-- 3 root supergroup 269 2025-07-08 16:03 /weblog/raw/access.log [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. ls: `/weblog/output': No such file or directory [root@node weblog-mapreduce]# hadoop jar logclean.jar LogCleanDriver /weblog/raw /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node weblog-mapreduce]# [root@node weblog-mapreduce]# mapred job -status job_1751961655287_0001 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Job: job_1751961655287_0001 Job File: hdfs://node:9000/tmp/hadoop-yarn/staging/history/done/2025/07/08/000000/job_1751961655287_0001_conf.xml Job Tracking URL : http://node:19888/jobhistory/job/job_1751961655287_0001 Uber job : false Number of maps: 1 Number of reduces: 1 map() completion: 1.0 reduce() completion: 1.0 Job state: SUCCEEDED retired: false reason for failure: Counters: 54 File System Counters FILE: Number of bytes read=287 FILE: Number of bytes written=552699 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=372 HDFS: Number of bytes written=269 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=1848 Total time spent by all reduces in occupied slots (ms)=2016 Total time spent by all map tasks (ms)=1848 Total time spent by all reduce tasks (ms)=2016 Total vcore-milliseconds taken by all map tasks=1848 Total vcore-milliseconds taken by all reduce tasks=2016 Total megabyte-milliseconds taken by all map tasks=1892352 Total megabyte-milliseconds taken by all reduce tasks=2064384 Map-Reduce Framework Map input records=6 Map output records=6 Map output bytes=269 Map output materialized bytes=287 Input split bytes=103 Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=287 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=95 CPU time spent (ms)=1050 Physical memory (bytes) snapshot=500764672 Virtual memory (bytes) snapshot=5614292992 Total committed heap usage (bytes)=379584512 Peak Map Physical memory (bytes)=293011456 Peak Map Virtual memory (bytes)=2803433472 Peak Reduce Physical memory (bytes)=207753216 Peak Reduce Virtual memory (bytes)=2810859520 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=269 File Output Format Counters Bytes Written=269 [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 2 items -rw-r--r-- 3 root supergroup 0 2025-07-08 16:34 /weblog/output/_SUCCESS -rw-r--r-- 3 root supergroup 269 2025-07-08 16:34 /weblog/output/part-r-00000 [root@node weblog-mapreduce]# hdfs dfs -cat /weblog/output/part-r-00000 | head -5 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 192.168.1.1,2023-06-01 10:30:22,/index.html 192.168.1.1,2023-06-01 10:32:45,/cart.html 192.168.1.2,2023-06-01 10:31:15,/product.html 192.168.1.2,2023-06-01 14:20:18,/product.htm 192.168.1.3,2023-06-01 11:45:30,/checkout.html [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 5199f37c-a381-428a-be1b-0a2afaab8583 Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = f38c99b3-ff7c-4f61-ae07-6b21d86d7160 hive> CREATE EXTERNAL TABLE weblog ( > ip STRING, > access_time TIMESTAMP, > page STRING > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LOCATION '/weblog/output'; OK Time taken: 1.274 seconds hive> select * from weblog; OK 192.168.1.1 2023-06-01 10:30:22 /index.html 192.168.1.1 2023-06-01 10:32:45 /cart.html 192.168.1.2 2023-06-01 10:31:15 /product.html 192.168.1.2 2023-06-01 14:20:18 /product.htm 192.168.1.3 2023-06-01 11:45:30 /checkout.html 192.168.1.4 2023-06-01 12:10:05 /index.html Time taken: 1.947 seconds, Fetched: 6 row(s) hive> select * from weblog limit 5; OK 192.168.1.1 2023-06-01 10:30:22 /index.html 192.168.1.1 2023-06-01 10:32:45 /cart.html 192.168.1.2 2023-06-01 10:31:15 /product.html 192.168.1.2 2023-06-01 14:20:18 /product.htm 192.168.1.3 2023-06-01 11:45:30 /checkout.html Time taken: 0.148 seconds, Fetched: 5 row(s) hive> hive> CREATE TABLE page_visits AS > SELECT > page, > COUNT(*) AS visits > FROM weblog > GROUP BY page > ORDER BY visits DESC; Query ID = root_20250708183002_ec44d1b4-af24-403c-bb67-380dfb6961c3 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0002, Tracking URL = http://node:8088/proxy/application_1751961655287_0002/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-07-08 18:30:12,692 Stage-1 map = 0%, reduce = 0% 2025-07-08 18:30:16,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.8 sec 2025-07-08 18:30:23,184 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.66 sec MapReduce Total cumulative CPU time: 3 seconds 660 msec Ended Job = job_1751961655287_0002 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0003, Tracking URL = http://node:8088/proxy/application_1751961655287_0003/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0003 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2025-07-08 18:30:35,969 Stage-2 map = 0%, reduce = 0% 2025-07-08 18:30:41,155 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.23 sec 2025-07-08 18:30:46,313 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.95 sec MapReduce Total cumulative CPU time: 2 seconds 950 msec Ended Job = job_1751961655287_0003 Moving data to directory hdfs://node:9000/hive/warehouse/page_visits MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.66 sec HDFS Read: 12379 HDFS Write: 251 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.95 sec HDFS Read: 7308 HDFS Write: 150 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 610 msec OK Time taken: 46.853 seconds hive> hive> describe page_visits; OK page string visits bigint Time taken: 0.214 seconds, Fetched: 2 row(s) hive> CREATE TABLE ip_visits AS > SELECT > ip, > COUNT(*) AS visits > FROM weblog > GROUP BY ip > ORDER BY visits DESC; Query ID = root_20250708183554_da402d08-af34-46f9-a33a-3f66ddd1a580 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0004, Tracking URL = http://node:8088/proxy/application_1751961655287_0004/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-07-08 18:36:04,037 Stage-1 map = 0%, reduce = 0% 2025-07-08 18:36:09,250 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.57 sec 2025-07-08 18:36:14,393 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec MapReduce Total cumulative CPU time: 3 seconds 300 msec Ended Job = job_1751961655287_0004 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0005, Tracking URL = http://node:8088/proxy/application_1751961655287_0005/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0005 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2025-07-08 18:36:27,073 Stage-2 map = 0%, reduce = 0% 2025-07-08 18:36:31,215 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2025-07-08 18:36:36,853 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.27 sec MapReduce Total cumulative CPU time: 3 seconds 270 msec Ended Job = job_1751961655287_0005 Moving data to directory hdfs://node:9000/hive/warehouse/ip_visits MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.3 sec HDFS Read: 12445 HDFS Write: 216 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.27 sec HDFS Read: 7261 HDFS Write: 129 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 570 msec OK Time taken: 44.523 seconds hive> [root@node weblog-mapreduce]# hive> [root@node weblog-mapreduce]# describe ip_visite; bash: describe: command not found... [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 57dafc2a-afe2-41a4-8159-00f8d44b5add Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive Session ID = f866eae4-4cb4-4403-b7a2-7a52701c5a74 Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> describe ip_visite; FAILED: SemanticException [Error 10001]: Table not found ip_visite hive> describe ip_visits; OK ip string visits bigint Time taken: 0.464 seconds, Fetched: 2 row(s) hive> SELECT * FROM page_visits; OK /index.html 2 /product.html 1 /product.htm 1 /checkout.html 1 /cart.html 1 Time taken: 2.095 seconds, Fetched: 5 row(s) hive> SELECT * FROM ip_visits; OK 192.168.1.2 2 192.168.1.1 2 192.168.1.4 1 192.168.1.3 1 Time taken: 0.176 seconds, Fetched: 4 row(s) hive> hive> [root@node weblog-mapreduce]# [root@node weblog-mapreduce]# mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 48 Server version: 8.0.42 MySQL Community Server - GPL Copyright (c) 2000, 2025, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE IF NOT EXISTS weblog_db; Query OK, 1 row affected (0.06 sec) mysql> USE weblog_db; Database changed mysql> CREATE TABLE IF NOT EXISTS page_visits ( -> page VARCHAR(255), -> visits BIGINT -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected, 1 warning (0.05 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | page_visits | +---------------------+ 1 row in set (0.00 sec) mysql> DESCRIBE page_visits; +--------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------+------+-----+---------+-------+ | page | varchar(255) | YES | | NULL | | | visits | bigint | YES | | NULL | | +--------+--------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> CREATE TABLE IF NOT EXISTS ip_visits ( -> ip VARCHAR(15), -> visits BIGINT -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected, 1 warning (0.02 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | ip_visits | | page_visits | +---------------------+ 2 rows in set (0.01 sec) mysql> DESC ip_visits; +--------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------+------+-----+---------+-------+ | ip | varchar(15) | YES | | NULL | | | visits | bigint | YES | | NULL | | +--------+-------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> ^C mysql> [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = f34e6971-71ae-4aa5-aa22-895061f33bdf Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = f7a06e76-e117-4fbb-9ee8-09fdfd002104 hive> DESCRIBE FORMATTED page_visits; OK # col_name data_type comment page string visits bigint # Detailed Table Information Database: default OwnerType: USER Owner: root CreateTime: Tue Jul 08 18:30:47 CST 2025 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://node:9000/hive/warehouse/page_visits Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} bucketing_version 2 numFiles 1 numRows 5 rawDataSize 70 totalSize 75 transient_lastDdlTime 1751970648 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 1.043 seconds, Fetched: 32 row(s) hive> 到这里就不会了 6.2.2sqoop导出格式 6.2.3导出page_visits表 6.2.4导出到ip_visits表 6.3验证导出数据 6.3.1登录MySQL 6.3.2执行查询
最新发布
07-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值