创建job

创建一个名为testjob01的job,它可以从RDBMS表的数据导入到HDFS作业。
下面的命令用于创建一个从DB数据库的emp表导入到HDFS文件的作业。

sqoop job --create testjob01 -- import --connect jdbc:mysql://hadoop01:3306/userdb \
--username root \
--password 123456 \
--target-dir /sqoopresult333 \
--table emp --m 1

注意import前要有空格

这里遇到一个bug,缺少JSON的jar包

[root@hadoop01 sbin]# sqoop job --create jobtest -- import --connect jdbc:mysql://hadoop01:3306/userdb \
> --username root \
> --password 123456 \
> --target-dir /sqoopresult333 \
> --table emp --m 1
19/12/09 08:05:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/12/09 08:05:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObjectat org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)at org.apache.sqoop.Sqoop.run(Sqoop.java:147)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObjectat java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)... 12 more
[root@hadoop01 sbin]#

在这个网站下载jar包,放到sqoop的lib目录下即可。

http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm

验证job

‘–list’ 参数是用来验证保存的作业。下面的命令用来验证保存Sqoop作业的列表。

sqoop job --list
[root@hadoop01 profile.d]# sqoop job --list
19/12/09 08:25:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Available jobs:jobtesttestjob
[root@hadoop01 profile.d]#

检查job

‘–show’ 参数用于检查或验证特定的工作,及其详细信息。以下命令和样本输出用来验证一个名为testjob01的作业。

sqoop job --show testjob01
[root@hadoop01 lib]# sqoop job --show testjob01
Warning: /export/servers/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /export/servers/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /export/servers/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/12/09 08:30:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
Job: testjob01
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
db.connect.string = jdbc:mysql://hadoop01:3306/userdb
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
split.limit = null
hbase.create.table = false
mainframe.input.dataset.type = p
db.require.password = true
skip.dist.cache = false
hdfs.append.dir = false
db.table = emp
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = root
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = None
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/46be1c1ea67765eb1ae0d9ed97252d51
direct.import = false
temporary.dirRoot = _sqoop
hdfs.target.dir = /sqoopresult333
hive.fail.table.exists = false
db.batch = false
[root@hadoop01 lib]#

执行job

‘–exec’ 选项用于执行保存的作业。下面的命令用于执行名称为testjob01的job

sqoop job --exec testjob01
[root@hadoop01 lib]# sqoop job --exec testjob01
19/12/09 08:33:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
19/12/09 08:33:23 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/12/09 08:33:23 INFO tool.CodeGenTool: Beginning code generation
19/12/09 08:33:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/09 08:33:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/09 08:33:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /export/servers/hadoop-2.6.0-cdh5.14.0
注: /tmp/sqoop-root/compile/d6c4c284e3a92591a5f929630597abb5/emp.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/12/09 08:33:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/d6c4c284e3a92591a5f929630597abb5/emp.jar
19/12/09 08:33:28 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/12/09 08:33:28 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/12/09 08:33:28 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/12/09 08:33:28 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/12/09 08:33:28 INFO mapreduce.ImportJobBase: Beginning import of emp
19/12/09 08:33:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/12/09 08:33:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/12/09 08:33:29 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.100.201:8032
19/12/09 08:33:33 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedExceptionat java.lang.Object.wait(Native Method)at java.lang.Thread.join(Thread.java:1252)at java.lang.Thread.join(Thread.java:1326)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
19/12/09 08:33:33 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedExceptionat java.lang.Object.wait(Native Method)at java.lang.Thread.join(Thread.java:1252)at java.lang.Thread.join(Thread.java:1326)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
19/12/09 08:33:34 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedExceptionat java.lang.Object.wait(Native Method)at java.lang.Thread.join(Thread.java:1252)at java.lang.Thread.join(Thread.java:1326)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
19/12/09 08:33:35 INFO db.DBInputFormat: Using read commited transaction isolation
19/12/09 08:33:35 INFO mapreduce.JobSubmitter: number of splits:1
19/12/09 08:33:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575849577977_0001
19/12/09 08:33:36 INFO impl.YarnClientImpl: Submitted application application_1575849577977_0001
19/12/09 08:33:36 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1575849577977_0001/
19/12/09 08:33:36 INFO mapreduce.Job: Running job: job_1575849577977_0001
19/12/09 08:33:52 INFO mapreduce.Job: Job job_1575849577977_0001 running in uber mode : true
19/12/09 08:33:52 INFO mapreduce.Job:  map 100% reduce 0%
19/12/09 08:33:54 INFO mapreduce.Job: Job job_1575849577977_0001 completed successfully
19/12/09 08:33:54 INFO mapreduce.Job: Counters: 32File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=0FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=100HDFS: Number of bytes written=174172HDFS: Number of read operations=140HDFS: Number of large read operations=0HDFS: Number of write operations=6Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=0Total time spent by all reduces in occupied slots (ms)=0TOTAL_LAUNCHED_UBERTASKS=1NUM_UBER_SUBMAPS=1Total time spent by all map tasks (ms)=715Total vcore-milliseconds taken by all map tasks=0Total megabyte-milliseconds taken by all map tasks=0Map-Reduce FrameworkMap input records=7Map output records=7Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=0CPU time spent (ms)=530Physical memory (bytes) snapshot=318537728Virtual memory (bytes) snapshot=3091116032Total committed heap usage (bytes)=252182528File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=202
19/12/09 08:33:54 INFO mapreduce.ImportJobBase: Transferred 170.0898 KB in 25.9669 seconds (6.5503 KB/sec)
19/12/09 08:33:54 INFO mapreduce.ImportJobBase: Retrieved 7 records.
[root@hadoop01 lib]# hadoop fs -ls /
Found 10 items
drwxr-xr-x   - root supergroup          0 2019-12-05 21:27 /allowinsert_1
drwxr-xr-x   - root supergroup          0 2019-12-05 21:38 /allowinsert_2
drwxr-xr-x   - root supergroup          0 2019-11-07 16:12 /config
drwxr-xr-x   - root supergroup          0 2019-11-20 15:13 /hivedatas
drwxr-xr-x   - root supergroup          0 2019-12-09 08:33 /sqoopresult333
drwxr-xr-x   - root supergroup          0 2019-11-06 14:21 /system
drwx------   - root supergroup          0 2019-12-05 21:04 /tmp
drwxr-xr-x   - root supergroup          0 2019-12-05 20:57 /updateonly_1
drwxr-xr-x   - root supergroup          0 2019-12-05 21:08 /updateonly_2
drwxr-xr-x   - root supergroup          0 2019-11-20 14:20 /user
[root@hadoop01 lib]# hadoop fs -ls /sqoopresult333
Found 2 items
-rw-r--r--   2 root supergroup          0 2019-12-09 08:33 /sqoopresult333/_SUCCESS
-rw-r--r--   2 root supergroup        202 2019-12-09 08:33 /sqoopresult333/part-m-00000
[root@hadoop01 lib]# hadoop fs -cat /sqoopresult333/part-m-00000
1201,gopal,manager,50000,TP
1202,manisha,Proof reader,50000,TP
1203,khalil,php dev,30000,AC
1204,prasanth,php dev,30000,AC
1205,kranthi,admin,20000,TP
1206,allen,admin,30000,tp
1207,woon,admin,40000,tp
[root@hadoop01 lib]#

免密执行job

sqoop在创建job时,使用–password-file参数,可以避免输入mysql密码,如果使用–password将出现警告,并且每次都要手动输入密码才能执行job,sqoop规定密码文件必须存放在HDFS上,并且权限必须是400。
并且检查sqoop的sqoop-site.xml是否存在如下配置:
(sqoop目录/conf/sqoop-site.xml)

<property><name>sqoop.metastore.client.record.password</name><value>true</value><description>If true, allow saved passwords in the metastore.</description>
</property>

创建一个密码文件(这里不能用vi)

echo -n "123456" > pwmysql.pwd

创建hdfs的目录

hadoop fs -mkdir -p /input/sqoop/pwd/

上传密码

hadoop fs -put pwmysql.pwd /input/sqoop/pwd/

修改权限为400

hadoop fs -chmod 400 /input/sqoop/pwd/pwmysql.pwd

测试免密

sqoop job --create testjob02 -- import --connect jdbc:mysql://hadoop01:3306/userdb \
--username root \
--password-file /input/sqoop/pwd/pwmysql.pwd \
--target-dir /sqoopresult333 \
--table emp --m 1sqoop job --exec testjob02

Apache Sqoop job 作业相关推荐

  1. Apache Sqoop性能调整

    Sqoop 是 Apache 基础提供的一种工具,在大数据世界中通常用于异构关系数据库 (RDBMS) 和 Hadoop 分布式文件系统 (HDFS) 之间的导入-导出数百万条记录.这种数据传输可能导 ...

  2. Sqoop找不到主类 Error: Could not find or load main class org.apache.sqoop.Sqoop

    最近由于要使用Sqoop来到出数据到hdfs,可是发现Sqoop1.4.5跟hadoop2.X不兼容,需要对Sqoop1.4.5进行编译,编译的具体方法见:http://my.codeweblog.c ...

  3. Sqoop job作业

    job 语法Sqoop job作业 1. $ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)] $ sqoo ...

  4. sqoop job 作业

    目录: 文章目录 #一.最简单的job作业 ##1.创建一个job 将mysql中为emp表的数据导入到hdfs上 注意:在创建job时,命令"-- import" 中间有个空格, ...

  5. Apache Sqoop Job :案例练习

    1.job 语法 $ sqoop job (generic-args) (job-args)[-- [subtool-name] (subtool-args)]$ sqoop-job (generic ...

  6. Apache Doris在作业帮实时数仓中的应用实践

    点击上方蓝色字体,选择"设为星标" 回复"资源"获取更多资源 大数据技术与架构 点击右侧关注,大数据开发领域最强公众号! 大数据真好玩 点击右侧关注,大数据真好 ...

  7. Sqoop数据的导入导出与job作业

    1. Sqoop导入数据 站在hadoop的立场看: import:数据导入.RDBMS----->Hadoop export:数据导出.Hadoop---->RDBMS 创建表 SET ...

  8. sqoop架构_SQOOP架构的深入介绍

    sqoop架构 by Jayvardhan Reddy 通过杰伊瓦尔丹·雷迪(Jayvardhan Reddy) SQOOP架构的深入介绍 (An in-depth introduction to S ...

  9. 大数据教程(13.6)sqoop使用教程

    2019独角兽企业重金招聘Python工程师标准>>> 上一章节,介绍了sqoop数据迁移工具安装以及简单导入实例的相关知识:本篇博客,博主将继续为小伙伴们分享sqoop的使用. 一 ...

最新文章

  1. Domain Driven Design
  2. Raising Modulo Numbers
  3. 使用 Docker 让传统 .NET 应用程序现代化
  4. 选择Java加密算法第2部分–单密钥对称加密
  5. 【HDU - 5917】Instability(规律,结论,Ramsey定理,知识点,tricks)
  6. VC制作类似于IE4的酷工具条
  7. Linux下top命令详解
  8. 写给萌新们的Python安装及环境配置(anaconda,pycharm,GPU)教程
  9. Oracle PLSQL 客户端 连接Oracle12.2 出现权限问题的解决办法以及绿色版Oracle客户端的使用....
  10. IOS And WCF 上传文件
  11. probie 菜鸟翻译工具开源了
  12. 动态载入.ascx用户控件
  13. 2021年G2电站锅炉司炉最新解析及G2电站锅炉司炉作业考试题库
  14. HTMl载入FLV格式网页视频播放器
  15. 生命密码是几适合学计算机,生命密码学
  16. 强网杯-强网先锋辅助
  17. linux peek,Peek - Gif 录制软件
  18. 03【若依框架解读】Tree树形结构的控制(菜单,部门)
  19. 计算机卡怎么解决,电脑卡怎么办,详细教您电脑卡怎么解决
  20. ios严格检验身份证号码有效性

热门文章

  1. 十个酷炫的人工智能开源项目
  2. centos8调整分辨率_centos 7 修改系统屏幕分辨率
  3. Python版超市管理系统源代码,基于django+mysql
  4. Ubuntu每次开机后提示:检测到系统程序出现问题的解决方法
  5. 前台离岗提示语_前台规范服务用语
  6. jquery发送ajax请求_复习之Vue用axios发送ajax请求
  7. 室内场景数据集 Indoor Scene Recognition
  8. 访日本Marza团队:Unity引擎制作VR动画实践
  9. VSCode运行CPP单文件
  10. Python爬虫笔记——分析AJAX传递的JSON获取数据-初步分析动态网页