上一篇文章介绍了sqoop增量同步数据到hive，同时上一篇文章也给出了本人写的hadoop+hive+hbase+sqoop+kylin的伪分布式安装方法及使用和增量同步实现的连接，上篇文章连接：Sqoop增量同步mysql/oracle数据到hive(merge-key/append）测试文档。
本篇文章将介绍如何将上一篇文章介绍的增量方式同sqoop自带的job机制和shell脚本以及crontab结合起来实现自动增量同步的需求。

一、知识储备

sqoop job --help
usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [<tool-name>] [TOOL-ARGS]]
Job management arguments:--create <job-id> Create a new saved job--delete <job-id> Delete a saved job--exec <job-id> Run a saved job--help Print usage instructions--list List saved jobs--meta-connect <jdbc-uri> Specify JDBC connect string for themetastore--show <job-id> Show the parameters for a saved job--verbose Print more information while workingGeneric Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

二、详细实验

这里先来看一个根据时间戳增量append的job创建和执行的过程，然后再看merge-id方式。
1、先来创建一个增量追加的job：

[root@hadoop bin]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --li
nes-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'19/03/13 18:12:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/13 18:12:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObjectat org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)at org.apache.sqoop.Sqoop.run(Sqoop.java:147)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObjectat java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

报错了，sqoop缺少java-json.jar包。好吧，下载缺少的jar包然后上传到$SQOOP_HOME/lib，连接：

点此下载jar包
将下载好的jar包放到$SQOOP_HOME/lib下，然后重新创建：
先把之前创建失败的job删除了

[root@hadoop ~]# sqoop job --delete inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

创建job

[root@hadoop ~]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password "tiger" --table INR_LAS --fields-terminated-by '\t' --l
ines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 18:40:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

列出来刚刚创建的job

[root@hadoop ~]# sqoop job --list
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:41:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Available jobs:inc_job

查看刚刚创建的job保存的last_value

[root@hadoop ~]# sqoop job --show inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:45:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: -------这里输入的是执行这个job的系统用户密码。而执行的时候这里输入的是连接数据库的用户对应的密码
Job: inc_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-20 14:49:19
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
hbase.create.table = false
split.limit = null
null.string = \\N
db.require.password = true
skip.dist.cache = false
hdfs.append.dir = true
db.table = INR_LAS
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = true
codegen.input.delimiters.enclose = 0
hive.table.name = INR_LAS
accumulo.batch.size = 10240000
hive.database.name = oracle
hive.drop.delims = false
customtool.options.jsonmap = {}
null.non-string = \\N
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/1173d716481c4bd8f6cb589b87a382ea
direct.import = false
temporary.dirRoot = _sqoop
hive.fail.table.exists = false
db.batch = false

接下来手动执行

[root@hadoop ~]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:47:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: ---------这里输入的是连接数据库的用户对应的密码，如何做到免密登录？向下继续看
19/03/13 18:47:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 18:47:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 18:47:50 INFO tool.CodeGenTool: Beginning code generation
19/03/13 18:47:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:47:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 18:47:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 18:47:55 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 18:47:55 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 18:47:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 18:47:57 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 18:48:00 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 18:48:00 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 18:48:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0016
19/03/13 18:48:01 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0016
19/03/13 18:48:01 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0016/
19/03/13 18:48:01 INFO mapreduce.Job: Running job: job_1552469242276_0016
19/03/13 18:48:11 INFO mapreduce.Job: Job job_1552469242276_0016 running in uber mode : false
19/03/13 18:48:11 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job: Job job_1552469242276_0016 completed successfully
19/03/13 18:48:19 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=144628FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=39HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=4454Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=4454Total vcore-milliseconds taken by all map tasks=4454Total megabyte-milliseconds taken by all map tasks=4560896Map-Reduce FrameworkMap input records=1Map output records=1Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=229CPU time spent (ms)=2430Physical memory (bytes) snapshot=191975424Virtual memory (bytes) snapshot=2143756288Total committed heap usage (bytes)=116916224File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=39
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3135 seconds (1.7478 bytes/sec)
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 18:48:19 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 18:48:19 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:48:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:48:19 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 18:48:19 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 18:48:19 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xmlLogging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO SessionState:
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0/_tmp_space.db
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO ql.Driver: Compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:25 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:25 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:25 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 18:48:25 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 18:48:25 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:25 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:25 INFO ql.Driver: Completed compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 3.251 seconds
19/03/13 18:48:25 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:25 INFO ql.Driver: Executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=e09a2f96-2edd-4747-a65f-4899c2863aa
0, clientType=HIVECLI]19/03/13 18:48:25 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 18:48:25 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 INFO ql.Driver: Completed executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 0.113 seconds
OK
19/03/13 18:48:26 INFO ql.Driver: OK
Time taken: 3.379 seconds
19/03/13 18:48:26 INFO CliDriver: Time taken: 3.379 seconds
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:26 INFO ql.Driver: Compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:26 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:26 INFO ql.Driver: Completed compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 0.426 seconds
19/03/13 18:48:26 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:26 INFO ql.Driver: Executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 18:48:26 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 18:48:27 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 18:48:27 INFO exec.StatsTask: Executing stats task
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=6, numRows=0, totalSize=518, rawDataSize=0]
19/03/13 18:48:27 INFO ql.Driver: Completed executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 1.225 seconds
OK
19/03/13 18:48:27 INFO ql.Driver: OK
Time taken: 1.653 seconds
19/03/13 18:48:27 INFO CliDriver: Time taken: 1.653 seconds
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme hdfs
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme file
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.HiveImport: Hive import complete.
19/03/13 18:48:27 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 18:48:27 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 18:48:27 INFO tool.ImportTool: Updated data for job: inc_job

通过上面实验我们发现每次执行job时，都要输入数据库用户密码，怎么实现免密登录，可以参照这种方式：
在创建Job时，使用–password-file参数，而且非–passoword。主要原因是在执行Job时使用–password参数将有警告，并且需要输入密码才能执行Job。当我们采用–password-file参数时，执行Job无需输入数据库密码，所以我们修改一下上面创建的job语句：
先drop原来的job

[root@hadoop conf]# sqoop job --delete inc_job

创建password-file文件
注：sqoop规定密码文件必须放在HDFS之上，并且权限必须为400

[root@hadoop sqoop]# mkdir pwd
[root@hadoop sqoop]# cd pwd
[root@hadoop pwd]# pwd
/hadoop/sqoop/pwd
[root@hadoop pwd]# echo -n "tiger" > scott.pwd
[root@hadoop pwd]# hdfs dfs -put scott.pwd /user/hive/warehouse
[root@hadoop pwd]# hdfs dfs -chmod 400 /user/hive/warehouse/scott.pwd

重新创建，这里不在指定password而是passwordfile

[root@hadoop conf]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password-file /user/hive/warehouse/scott.pwd --table INR_LAS --fields-terminated-by '\t'
--lines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'

验证，看下当前oracle数据库表：

select * from inr_las;
EMPNO       ENAME       JOB     SAL         ETLTIME
1           er      CLERK       800.00  2019/3/20 10:42:27
2           ALLEN   SALESMAN    1600.00 2019/3/20 10:42:27
3           WARD    SALESMAN    1250.00 2019/3/20 10:42:27
4           JONES   MANAGER     2975.00 2019/3/20 10:42:27
5           MARTIN  SALESMAN    1250.00 2019/3/20 10:42:27
6           zhao    DBA         1000.00 2019/3/20 10:52:34
7           yan     BI          100.00  2019/3/20 10:42:27
8           dong    JAVA        5232.00 2019/3/20 15:36:07

再看下当前hive表数据：

hive> select * from inr_las;
OK
1   er  CLERK   800.0   2019-03-20 10:42:27.0
2   ALLEN   SALESMAN    1600.0  2019-03-20 10:42:27.0
3   WARD    SALESMAN    1250.0  2019-03-20 10:42:27.0
4   JONES   MANAGER 2975.0  2019-03-20 10:42:27.0
5   MARTIN  SALESMAN    1250.0  2019-03-20 10:42:27.0
6   zhao    DBA 1000.0  2019-03-20 10:52:34.0
7   yan BI  100.0   2019-03-20 10:42:27.0
8   dong    JAVA    332.0   2019-03-20 14:49:19.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
8   dong    JAVA    4232.0  2019-03-20 15:29:03.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
Time taken: 0.176 seconds, Fetched: 13 row(s)

我们job的增量时间设置的–last-value ‘2019-03-20 14:49:19’，源端有一条数据empno=8符合增量条件，现在再执行一下新创建的job：

[root@hadoop pwd]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 19:14:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 19:14:32 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 19:14:32 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 19:14:32 INFO tool.CodeGenTool: Beginning code generation
19/03/13 19:14:33 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:14:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 19:14:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 19:14:38 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 19:14:38 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 19:14:38 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 19:14:42 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0017
19/03/13 19:14:42 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0017
19/03/13 19:14:43 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0017/
19/03/13 19:14:43 INFO mapreduce.Job: Running job: job_1552469242276_0017
19/03/13 19:14:53 INFO mapreduce.Job: Job job_1552469242276_0017 running in uber mode : false
19/03/13 19:14:53 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job: Job job_1552469242276_0017 completed successfully
19/03/13 19:15:00 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=144775FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=39HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=5332Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=5332Total vcore-milliseconds taken by all map tasks=5332Total megabyte-milliseconds taken by all map tasks=5459968Map-Reduce FrameworkMap input records=1Map output records=1Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=651CPU time spent (ms)=2670Physical memory (bytes) snapshot=188571648Virtual memory (bytes) snapshot=2148745216Total committed heap usage (bytes)=119537664File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=39
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3081 seconds (1.7482 bytes/sec)
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 19:15:00 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 19:15:01 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:15:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:15:01 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 19:15:01 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 19:15:01 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xmlLogging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO SessionState:
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618/_tmp_space.db
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO ql.Driver: Compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:09 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:09 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:09 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 19:15:09 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 19:15:09 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:09 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:09 INFO ql.Driver: Completed compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 5.309 seconds
19/03/13 19:15:09 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:09 INFO ql.Driver: Executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=7feac288-289d-4d74-8641-553c5ab6561
8, clientType=HIVECLI]19/03/13 19:15:09 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 19:15:09 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 INFO ql.Driver: Completed executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 0.106 seconds
OK
19/03/13 19:15:10 INFO ql.Driver: OK
Time taken: 5.429 seconds
19/03/13 19:15:10 INFO CliDriver: Time taken: 5.429 seconds
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:10 INFO ql.Driver: Compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:10 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:10 INFO ql.Driver: Completed compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 0.415 seconds
19/03/13 19:15:10 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:10 INFO ql.Driver: Executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 19:15:10 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 19:15:11 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 19:15:11 INFO exec.StatsTask: Executing stats task
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=7, numRows=0, totalSize=557, rawDataSize=0]
19/03/13 19:15:11 INFO ql.Driver: Completed executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 1.296 seconds
OK
19/03/13 19:15:11 INFO ql.Driver: OK
Time taken: 1.713 seconds
19/03/13 19:15:11 INFO CliDriver: Time taken: 1.713 seconds
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme hdfs
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme file
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.HiveImport: Hive import complete.
19/03/13 19:15:11 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 19:15:11 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 19:15:11 INFO tool.ImportTool: Updated data for job: inc_job

发现已经不需要输入密码了，再来看下hive表数据：

hive> select * from inr_las;
OK
1   er  CLERK   800.0   2019-03-20 10:42:27.0
2   ALLEN   SALESMAN    1600.0  2019-03-20 10:42:27.0
3   WARD    SALESMAN    1250.0  2019-03-20 10:42:27.0
4   JONES   MANAGER 2975.0  2019-03-20 10:42:27.0
5   MARTIN  SALESMAN    1250.0  2019-03-20 10:42:27.0
6   zhao    DBA 1000.0  2019-03-20 10:52:34.0
7   yan BI  100.0   2019-03-20 10:42:27.0
8   dong    JAVA    332.0   2019-03-20 14:49:19.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
8   dong    JAVA    4232.0  2019-03-20 15:29:03.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
Time taken: 0.161 seconds, Fetched: 14 row(s)

成14条数据了，多了条empno=8的数据，成功了。
不过笔者这里的需求是源端oracle数据库做了update之后，由于时间戳也会跟着变化，所以我们要求根据时间戳找出变更的数据然后在hive增量更新，这里就使用到了根据时间和主键进行合并增量的nerge-id模式，job的创建类似上面的例子.
这里的例子为：我们通过shell脚本进行封装让crontab 定时执行增量。
注意：先声明一下，因为笔者是测试增量导入给kylin做增量cube用，测试数据量很少，所以hive表只创建外部表，不在分区。
下面全流程演示如何一步步把一个表通过sqoop job结合crontab+shell脚本自动增量导入到hive：
第一步，先在oracle端创建一个要同步的表,这里用上面的inr_las表再初始化一个表：

create table inr_job as select a.empno,a.ename,a.job,a.sal,sysdate etltime  from inr_las a ;

第二步，在hive创建目标表：

hive> use oracle;
OK
Time taken: 1.425 seconds
create table INR_JOB
(empno   int,ename   string,job     string,sal     float,etltime string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
location '/user/hive/warehouse/exter_inr_job';
Time taken: 2.836 seconds

第三步，全量把数据导入hive：
先删除一下上面创建外部表时指定的目录，因为创建外部表时会自动创建目录，而下面的全量导入也会自动创建，因此会导致冲突提示目录存在的错误：

[root@hadoop hadoop]# hadoop fs -rmr /user/hive/warehouse/exter_inr_job
rmr: DEPRECATED: Please use 'rm -r' instead.
19/03/14 06:01:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut
es, Emptier interval = 0 minutes.Deleted /user/hive/warehouse/exter_inr_job

接下来全量导入：

sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_JOB -m 1 --target-dir /user/hive/warehouse/exter_inr_job --fields-terminated-by '\t'

导入完成查询下hive数据：

hive> select * from inr_job;
OK
1   er  CLERK   800.0   2019-03-22 17:24:42.0
2   ALLEN   SALESMAN    1600.0  2019-03-22 17:24:42.0
3   WARD    SALESMAN    1250.0  2019-03-22 17:24:42.0
4   JONES   MANAGER 2975.0  2019-03-22 17:24:42.0
5   MARTIN  SALESMAN    1250.0  2019-03-22 17:24:42.0
6   zhao    DBA 1000.0  2019-03-22 17:24:42.0
7   yan BI  100.0   2019-03-22 17:24:42.0
8   dong    JAVA    400.0   2019-03-22 17:24:42.0
Time taken: 3.153 seconds, Fetched: 8 row(s)

第四步，创建增量sqoop job
下面的–password-file /user/hive/warehouse/scott.pwd 是之前上一篇文章创建的密码文件，读者可以看下上篇文章如何创建的

 sqoop job --create auto_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott  --password-file /user/hive/warehouse/scott.pwd  --table INR_JOB --fields-terminated-by '\t' --lines-terminated-by '\n'  --target-dir /user/hive/warehouse/exter_inr_job -m 1 --check-column ETLTIME --incremental lastmodified --merge-key EMPNO --last-value "2019-03-22 17:24:42"

看下创建的job信息：

[root@hadoop hadoop]# sqoop  job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/14 06:10:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/i
mpl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLogg
erBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLog
gerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
split.limit = null
hbase.create.table = false
skip.dist.cache = false
hdfs.append.dir = false
db.table = INR_JOB
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
db.password.file = /user/hive/warehouse/scott.pwd
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/be3b358816e17c786d114afb7a4f2a6d
direct.import = false
temporary.dirRoot = _sqoop
hdfs.target.dir = /user/hive/warehouse/exter_inr_job
hive.fail.table.exists = false
merge.key.col = EMPNO
db.batch = false

第五步，封装到shell脚本，加入定时任务

[root@hadoop ~]# cd /hadoop/
[root@hadoop hadoop]# vim auto_inr.sh
加入下面内容：
#!/bin/bash
log="/hadoop/auto_job_log.log"
echo "======================`date "+%Y-%m-%d %H:%M:%S"`增量======================" >> $log
nohup sqoop job --exec auto_job >> $log 2>&1 &
保存退出，赋予权限
[root@hadoop hadoop]# chmod +x auto_inr.sh

先来手动执行一下，不过执行前先再看看job的last_value时间：

[root@hadoop hadoop]# sqoop job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:50:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

看到是2019-03-22 17:24:42，再看下当前时间：

[root@hadoop hadoop]# date
Mon Mar 25 17:54:54 CST 2019

接下来手动执行下这个脚本：

[root@hadoop hadoop]# ./auto_inr.sh

然后去看重定向的日志：

[root@hadoop hadoop]# cat auto_job_log.log
======================2019-03-25 17:55:46增量======================
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:55:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/25 17:55:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/25 17:55:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/25 17:55:50 INFO tool.CodeGenTool: Beginning code generation
19/03/25 17:55:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/25 17:55:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:54 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/25 17:55:54 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-22 17:24:42', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of INR_JOB
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/25 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:55:57 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/25 17:55:57 INFO mapreduce.JobSubmitter: number of splits:1
19/03/25 17:55:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0009
19/03/25 17:55:58 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0009
19/03/25 17:55:58 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0009/
19/03/25 17:55:58 INFO mapreduce.Job: Running job: job_1553503985304_0009
19/03/25 17:56:07 INFO mapreduce.Job: Job job_1553503985304_0009 running in uber mode : false
19/03/25 17:56:07 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job: Job job_1553503985304_0009 completed successfully
19/03/25 17:56:15 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=144775FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=323HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=5270Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=5270Total vcore-milliseconds taken by all map tasks=5270Total megabyte-milliseconds taken by all map tasks=5396480Map-Reduce FrameworkMap input records=8Map output records=8Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=73CPU time spent (ms)=3000Physical memory (bytes) snapshot=205058048Virtual memory (bytes) snapshot=2135244800Total committed heap usage (bytes)=109576192File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=323
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Transferred 323 bytes in 20.9155 seconds (15.4431 bytes/sec)
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 8 records.
19/03/25 17:56:15 INFO tool.ImportTool: Final destination exists, will run merge job.
19/03/25 17:56:15 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
19/03/25 17:56:15 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:56:18 INFO input.FileInputFormat: Total input paths to process : 2
19/03/25 17:56:18 INFO mapreduce.JobSubmitter: number of splits:2
19/03/25 17:56:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0010
19/03/25 17:56:19 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0010
19/03/25 17:56:19 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0010/
19/03/25 17:56:19 INFO mapreduce.Job: Running job: job_1553503985304_0010
19/03/25 17:56:29 INFO mapreduce.Job: Job job_1553503985304_0010 running in uber mode : false
19/03/25 17:56:29 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:39 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:50 INFO mapreduce.Job:  map 100% reduce 100%
19/03/25 17:56:50 INFO mapreduce.Job: Job job_1553503985304_0010 completed successfully
19/03/25 17:56:50 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=1090FILE: Number of bytes written=436771FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=942HDFS: Number of bytes written=323HDFS: Number of read operations=9HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=2Launched reduce tasks=1Data-local map tasks=2Total time spent by all maps in occupied slots (ms)=14667Total time spent by all reduces in occupied slots (ms)=7258Total time spent by all map tasks (ms)=14667Total time spent by all reduce tasks (ms)=7258Total vcore-milliseconds taken by all map tasks=14667Total vcore-milliseconds taken by all reduce tasks=7258Total megabyte-milliseconds taken by all map tasks=15019008Total megabyte-milliseconds taken by all reduce tasks=7432192Map-Reduce FrameworkMap input records=16Map output records=16Map output bytes=1052Map output materialized bytes=1096Input split bytes=296Combine input records=0Combine output records=0Reduce input groups=8Reduce shuffle bytes=1096Reduce input records=16Reduce output records=8Spilled Records=32Shuffled Maps =2Failed Shuffles=0Merged Map outputs=2GC time elapsed (ms)=230CPU time spent (ms)=5420Physical memory (bytes) snapshot=684474368Virtual memory (bytes) snapshot=6394597376Total committed heap usage (bytes)=511705088Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=646File Output Format Counters Bytes Written=323
19/03/25 17:56:50 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/25 17:56:51 INFO tool.ImportTool: Updated data for job: auto_job

可以看到日志中这么一段话：

19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')

说明上限是当前时间，然后再看下当前job的last_value:

hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-25 17:55:54.0
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

和上面日志中的时间一致，如果job内容需要更改时，可以删了job重建更改好的job,手动指定时间为日志中的Upper bound或删除前记录下面last_value,为重建后的job提供增量时间。
上面手动调用没问题。就剩最后一步crontab定时了，crontab -e加入下面一段话（每五分钟增量一次）：

*/2 * * * *  /hadoop/auto_inr.sh

如果一个表很大。我第一次初始化一部分最新的数据到hive表，如果没初始化进来的历史数据今天发生了变更，那merge-key的增量方式会不会报错呢？看下一篇测试文章吧，等写完会放链接到这：
https://blog.csdn.net/qq_28356739/article/details/88803284

【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍相关推荐

【大数据开发运维解决方案】ssh: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b问题解决过程
文章目录前言一.问题描述二.原因分析三.解决方案总结前言近期在做信创工作,运维Informatica的同事,在之前我为他做了个华为fusioninsight集群内节点与其Informat ...
【大数据开发运维解决方案】ogg(GoldenGate)三大进程常用参数
============================================1.管理进程 oggCI (ogg1) 22> edit params mgr PORT 7809 DYN ...
【大数据开发运维解决方案】Oracle Sql基础查询
1.查找空值 null不支持加.减.乘.除.大小比较.相等比较,否则结果只能为空. SQL> select * from dept where 1>=null;DEPTNO DNAME L ...
【大数据开发运维解决方案】Solr5.1+Jcseg分词器安装部署(中英文同义词、停止词配置)
Solr5.1+Jcseg分词器安装部署(同义词.停止词) ###################################################################### ...
【大数据开发运维解决方案】超级详细的VMware16安装Redhat8挂载镜像配置本地yum源安装unixODBC教程
文章目录前言一.安装配置VMware 1.虚拟网络配置 2.修改windows网卡配置二.安装RedHat8.6 1.虚拟机安装 1.1.新建虚拟机 1.2.接下来选择稍候安装操作系统,下一步 ...
【大数据开发运维解决方案】hadoop fs常用命令案例解释
文章目录前言一.hadoop fs常用命令参数列表二.案例展示 hadoop fs -mkdir hadoop fs -ls hadoop fs -put hadoop fs -get hado ...
阿里，B站小伙伴刚刚分享的大数据开发运维学习规划，抓紧收藏
一.大数据运维相关答疑与概述 1.0 Class介绍本课程是专门培养大数据运维与架构方向专业人才的体系化课程.课程所有讲师小伙伴全部是在职的知名企业大数据开发专家,大数据技术专家职位员工,非专门的培 ...
为什么数据科学家需要承担开发运维的工作？
作者 | Caleb Kaiser 译者 | 弯月,责编 | 夕颜头图 | CSDN下载自视觉中国出品 | CSDN(ID:CSDNnews) 以下为译文: 如果你需要创建一个生产环境下的机器学习 ...
Terraform实战 | 实用云部署编程入门指南，DevOps软件开发运维必备
Terraform是一种部署技术,任何想要通过基础设施即代码(Infrastructure as Code,IaC)方法来置备和管理基础设施的人,都可以使用这种技术.基础设施指的主要是基于云的基础设施 ...

【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍

一、知识储备

二、详细实验

【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍相关推荐

最新文章

热门文章