上一篇文章介绍了sqoop增量同步数据到hive,同时上一篇文章也给出了本人写的hadoop+hive+hbase+sqoop+kylin的伪分布式安装方法及使用和增量同步实现的连接,上篇文章连接:Sqoop增量同步mysql/oracle数据到hive(merge-key/append)测试文档。
本篇文章将介绍如何将上一篇文章介绍的增量方式同sqoop自带的job机制和shell脚本以及crontab结合起来实现自动增量同步的需求。

一、知识储备

sqoop job --help
usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [<tool-name>] [TOOL-ARGS]]
Job management arguments:--create <job-id> Create a new saved job--delete <job-id> Delete a saved job--exec <job-id> Run a saved job--help Print usage instructions--list List saved jobs--meta-connect <jdbc-uri> Specify JDBC connect string for themetastore--show <job-id> Show the parameters for a saved job--verbose Print more information while workingGeneric Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

二、详细实验

这里先来看一个根据时间戳增量append的job创建和执行的过程,然后再看merge-id方式。
1、先来创建一个增量追加的job:

[root@hadoop bin]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --li
nes-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'19/03/13 18:12:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/13 18:12:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObjectat org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)at org.apache.sqoop.Sqoop.run(Sqoop.java:147)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObjectat java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

报错了,sqoop缺少java-json.jar包。好吧,下载缺少的jar包然后上传到$SQOOP_HOME/lib,连接:

点此下载jar包
将下载好的jar包放到$SQOOP_HOME/lib下,然后重新创建:
先把之前创建失败的job删除了

[root@hadoop ~]# sqoop job --delete inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

创建job

[root@hadoop ~]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password "tiger" --table INR_LAS --fields-terminated-by '\t' --l
ines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 18:40:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

列出来刚刚创建的job

[root@hadoop ~]# sqoop job --list
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:41:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Available jobs:inc_job

查看刚刚创建的job保存的last_value

[root@hadoop ~]# sqoop job --show inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:45:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: -------这里输入的是执行这个job的系统用户密码。而执行的时候这里输入的是连接数据库的用户对应的密码
Job: inc_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-20 14:49:19
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
hbase.create.table = false
split.limit = null
null.string = \\N
db.require.password = true
skip.dist.cache = false
hdfs.append.dir = true
db.table = INR_LAS
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = true
codegen.input.delimiters.enclose = 0
hive.table.name = INR_LAS
accumulo.batch.size = 10240000
hive.database.name = oracle
hive.drop.delims = false
customtool.options.jsonmap = {}
null.non-string = \\N
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/1173d716481c4bd8f6cb589b87a382ea
direct.import = false
temporary.dirRoot = _sqoop
hive.fail.table.exists = false
db.batch = false

接下来手动执行

[root@hadoop ~]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:47:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: ---------这里输入的是连接数据库的用户对应的密码,如何做到免密登录?向下继续看
19/03/13 18:47:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 18:47:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 18:47:50 INFO tool.CodeGenTool: Beginning code generation
19/03/13 18:47:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:47:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 18:47:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 18:47:55 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 18:47:55 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 18:47:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 18:47:57 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 18:48:00 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 18:48:00 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 18:48:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0016
19/03/13 18:48:01 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0016
19/03/13 18:48:01 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0016/
19/03/13 18:48:01 INFO mapreduce.Job: Running job: job_1552469242276_0016
19/03/13 18:48:11 INFO mapreduce.Job: Job job_1552469242276_0016 running in uber mode : false
19/03/13 18:48:11 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job: Job job_1552469242276_0016 completed successfully
19/03/13 18:48:19 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=144628FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=39HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=4454Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=4454Total vcore-milliseconds taken by all map tasks=4454Total megabyte-milliseconds taken by all map tasks=4560896Map-Reduce FrameworkMap input records=1Map output records=1Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=229CPU time spent (ms)=2430Physical memory (bytes) snapshot=191975424Virtual memory (bytes) snapshot=2143756288Total committed heap usage (bytes)=116916224File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=39
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3135 seconds (1.7478 bytes/sec)
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 18:48:19 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 18:48:19 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:48:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:48:19 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 18:48:19 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 18:48:19 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xmlLogging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO SessionState:
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0/_tmp_space.db
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO ql.Driver: Compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:25 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:25 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:25 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 18:48:25 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 18:48:25 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:25 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:25 INFO ql.Driver: Completed compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 3.251 seconds
19/03/13 18:48:25 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:25 INFO ql.Driver: Executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=e09a2f96-2edd-4747-a65f-4899c2863aa
0, clientType=HIVECLI]19/03/13 18:48:25 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 18:48:25 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 INFO ql.Driver: Completed executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 0.113 seconds
OK
19/03/13 18:48:26 INFO ql.Driver: OK
Time taken: 3.379 seconds
19/03/13 18:48:26 INFO CliDriver: Time taken: 3.379 seconds
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:26 INFO ql.Driver: Compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:26 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:26 INFO ql.Driver: Completed compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 0.426 seconds
19/03/13 18:48:26 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:26 INFO ql.Driver: Executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 18:48:26 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 18:48:27 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 18:48:27 INFO exec.StatsTask: Executing stats task
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=6, numRows=0, totalSize=518, rawDataSize=0]
19/03/13 18:48:27 INFO ql.Driver: Completed executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 1.225 seconds
OK
19/03/13 18:48:27 INFO ql.Driver: OK
Time taken: 1.653 seconds
19/03/13 18:48:27 INFO CliDriver: Time taken: 1.653 seconds
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme hdfs
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme file
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.HiveImport: Hive import complete.
19/03/13 18:48:27 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 18:48:27 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 18:48:27 INFO tool.ImportTool: Updated data for job: inc_job

通过上面实验我们发现每次执行job时,都要输入数据库用户密码,怎么实现免密登录,可以参照这种方式:
在创建Job时,使用–password-file参数,而且非–passoword。主要原因是在执行Job时使用–password参数将有警告,并且需要输入密码才能执行Job。当我们采用–password-file参数时,执行Job无需输入数据库密码,所以我们修改一下上面创建的job语句:
先drop原来的job

[root@hadoop conf]# sqoop job --delete inc_job

创建password-file文件
注:sqoop规定密码文件必须放在HDFS之上,并且权限必须为400

[root@hadoop sqoop]# mkdir pwd
[root@hadoop sqoop]# cd pwd
[root@hadoop pwd]# pwd
/hadoop/sqoop/pwd
[root@hadoop pwd]# echo -n "tiger" > scott.pwd
[root@hadoop pwd]# hdfs dfs -put scott.pwd /user/hive/warehouse
[root@hadoop pwd]# hdfs dfs -chmod 400 /user/hive/warehouse/scott.pwd

重新创建,这里不在指定password而是passwordfile

[root@hadoop conf]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password-file /user/hive/warehouse/scott.pwd --table INR_LAS --fields-terminated-by '\t'
--lines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'

验证,看下当前oracle数据库表:

select * from inr_las;
EMPNO       ENAME       JOB     SAL         ETLTIME
1           er      CLERK       800.00  2019/3/20 10:42:27
2           ALLEN   SALESMAN    1600.00 2019/3/20 10:42:27
3           WARD    SALESMAN    1250.00 2019/3/20 10:42:27
4           JONES   MANAGER     2975.00 2019/3/20 10:42:27
5           MARTIN  SALESMAN    1250.00 2019/3/20 10:42:27
6           zhao    DBA         1000.00 2019/3/20 10:52:34
7           yan     BI          100.00  2019/3/20 10:42:27
8           dong    JAVA        5232.00 2019/3/20 15:36:07

再看下当前hive表数据:

hive> select * from inr_las;
OK
1   er  CLERK   800.0   2019-03-20 10:42:27.0
2   ALLEN   SALESMAN    1600.0  2019-03-20 10:42:27.0
3   WARD    SALESMAN    1250.0  2019-03-20 10:42:27.0
4   JONES   MANAGER 2975.0  2019-03-20 10:42:27.0
5   MARTIN  SALESMAN    1250.0  2019-03-20 10:42:27.0
6   zhao    DBA 1000.0  2019-03-20 10:52:34.0
7   yan BI  100.0   2019-03-20 10:42:27.0
8   dong    JAVA    332.0   2019-03-20 14:49:19.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
8   dong    JAVA    4232.0  2019-03-20 15:29:03.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
Time taken: 0.176 seconds, Fetched: 13 row(s)

我们job的增量时间设置的–last-value ‘2019-03-20 14:49:19’,源端有一条数据empno=8符合增量条件,现在再执行一下新创建的job:

[root@hadoop pwd]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 19:14:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 19:14:32 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 19:14:32 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 19:14:32 INFO tool.CodeGenTool: Beginning code generation
19/03/13 19:14:33 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:14:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 19:14:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 19:14:38 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 19:14:38 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 19:14:38 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 19:14:42 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0017
19/03/13 19:14:42 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0017
19/03/13 19:14:43 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0017/
19/03/13 19:14:43 INFO mapreduce.Job: Running job: job_1552469242276_0017
19/03/13 19:14:53 INFO mapreduce.Job: Job job_1552469242276_0017 running in uber mode : false
19/03/13 19:14:53 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job: Job job_1552469242276_0017 completed successfully
19/03/13 19:15:00 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=144775FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=39HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=5332Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=5332Total vcore-milliseconds taken by all map tasks=5332Total megabyte-milliseconds taken by all map tasks=5459968Map-Reduce FrameworkMap input records=1Map output records=1Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=651CPU time spent (ms)=2670Physical memory (bytes) snapshot=188571648Virtual memory (bytes) snapshot=2148745216Total committed heap usage (bytes)=119537664File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=39
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3081 seconds (1.7482 bytes/sec)
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 19:15:00 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 19:15:01 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:15:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:15:01 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 19:15:01 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 19:15:01 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xmlLogging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO SessionState:
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618/_tmp_space.db
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO ql.Driver: Compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:09 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:09 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:09 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 19:15:09 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 19:15:09 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:09 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:09 INFO ql.Driver: Completed compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 5.309 seconds
19/03/13 19:15:09 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:09 INFO ql.Driver: Executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=7feac288-289d-4d74-8641-553c5ab6561
8, clientType=HIVECLI]19/03/13 19:15:09 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 19:15:09 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 INFO ql.Driver: Completed executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 0.106 seconds
OK
19/03/13 19:15:10 INFO ql.Driver: OK
Time taken: 5.429 seconds
19/03/13 19:15:10 INFO CliDriver: Time taken: 5.429 seconds
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:10 INFO ql.Driver: Compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:10 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:10 INFO ql.Driver: Completed compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 0.415 seconds
19/03/13 19:15:10 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:10 INFO ql.Driver: Executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01):
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 19:15:10 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 19:15:11 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 19:15:11 INFO exec.StatsTask: Executing stats task
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=7, numRows=0, totalSize=557, rawDataSize=0]
19/03/13 19:15:11 INFO ql.Driver: Completed executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 1.296 seconds
OK
19/03/13 19:15:11 INFO ql.Driver: OK
Time taken: 1.713 seconds
19/03/13 19:15:11 INFO CliDriver: Time taken: 1.713 seconds
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme hdfs
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme file
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.HiveImport: Hive import complete.
19/03/13 19:15:11 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 19:15:11 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 19:15:11 INFO tool.ImportTool: Updated data for job: inc_job

发现已经不需要输入密码了,再来看下hive表数据:

hive> select * from inr_las;
OK
1   er  CLERK   800.0   2019-03-20 10:42:27.0
2   ALLEN   SALESMAN    1600.0  2019-03-20 10:42:27.0
3   WARD    SALESMAN    1250.0  2019-03-20 10:42:27.0
4   JONES   MANAGER 2975.0  2019-03-20 10:42:27.0
5   MARTIN  SALESMAN    1250.0  2019-03-20 10:42:27.0
6   zhao    DBA 1000.0  2019-03-20 10:52:34.0
7   yan BI  100.0   2019-03-20 10:42:27.0
8   dong    JAVA    332.0   2019-03-20 14:49:19.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
8   dong    JAVA    4232.0  2019-03-20 15:29:03.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    5232.0  2019-03-20 15:36:07.0
8   dong    JAVA    3232.0  2019-03-20 15:13:35.0
Time taken: 0.161 seconds, Fetched: 14 row(s)

成14条数据了,多了条empno=8的数据,成功了。
不过笔者这里的需求是源端oracle数据库做了update之后,由于时间戳也会跟着变化,所以我们要求根据时间戳找出变更的数据然后在hive增量更新,这里就使用到了根据时间和主键进行合并增量的nerge-id模式,job的创建类似上面的例子.
这里的例子为:我们通过shell脚本进行封装让crontab 定时执行增量。
注意:先声明一下,因为笔者是测试增量导入给kylin做增量cube用,测试数据量很少,所以hive表只创建外部表,不在分区。
下面全流程演示如何一步步把一个表通过sqoop job结合crontab+shell脚本自动增量导入到hive:
第一步,先在oracle端创建一个要同步的表,这里用上面的inr_las表再初始化一个表:

create table inr_job as select a.empno,a.ename,a.job,a.sal,sysdate etltime  from inr_las a ;

第二步,在hive创建目标表:

hive> use oracle;
OK
Time taken: 1.425 seconds
create table INR_JOB
(empno   int,ename   string,job     string,sal     float,etltime string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
location '/user/hive/warehouse/exter_inr_job';
Time taken: 2.836 seconds

第三步,全量把数据导入hive:
先删除一下上面创建外部表时指定的目录,因为创建外部表时会自动创建目录,而下面的全量导入也会自动创建,因此会导致冲突提示目录存在的错误:

[root@hadoop hadoop]# hadoop fs -rmr /user/hive/warehouse/exter_inr_job
rmr: DEPRECATED: Please use 'rm -r' instead.
19/03/14 06:01:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut
es, Emptier interval = 0 minutes.Deleted /user/hive/warehouse/exter_inr_job

接下来全量导入:

sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_JOB -m 1 --target-dir /user/hive/warehouse/exter_inr_job --fields-terminated-by '\t'

导入完成查询下hive数据:

hive> select * from inr_job;
OK
1   er  CLERK   800.0   2019-03-22 17:24:42.0
2   ALLEN   SALESMAN    1600.0  2019-03-22 17:24:42.0
3   WARD    SALESMAN    1250.0  2019-03-22 17:24:42.0
4   JONES   MANAGER 2975.0  2019-03-22 17:24:42.0
5   MARTIN  SALESMAN    1250.0  2019-03-22 17:24:42.0
6   zhao    DBA 1000.0  2019-03-22 17:24:42.0
7   yan BI  100.0   2019-03-22 17:24:42.0
8   dong    JAVA    400.0   2019-03-22 17:24:42.0
Time taken: 3.153 seconds, Fetched: 8 row(s)

第四步,创建增量sqoop job
下面的–password-file /user/hive/warehouse/scott.pwd 是之前上一篇文章创建的密码文件,读者可以看下上篇文章如何创建的

 sqoop job --create auto_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott  --password-file /user/hive/warehouse/scott.pwd  --table INR_JOB --fields-terminated-by '\t' --lines-terminated-by '\n'  --target-dir /user/hive/warehouse/exter_inr_job -m 1 --check-column ETLTIME --incremental lastmodified --merge-key EMPNO --last-value "2019-03-22 17:24:42"

看下创建的job信息:

[root@hadoop hadoop]# sqoop  job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/14 06:10:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/i
mpl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLogg
erBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLog
gerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
split.limit = null
hbase.create.table = false
skip.dist.cache = false
hdfs.append.dir = false
db.table = INR_JOB
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
db.password.file = /user/hive/warehouse/scott.pwd
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/be3b358816e17c786d114afb7a4f2a6d
direct.import = false
temporary.dirRoot = _sqoop
hdfs.target.dir = /user/hive/warehouse/exter_inr_job
hive.fail.table.exists = false
merge.key.col = EMPNO
db.batch = false

第五步,封装到shell脚本,加入定时任务

[root@hadoop ~]# cd /hadoop/
[root@hadoop hadoop]# vim auto_inr.sh
加入下面内容:
#!/bin/bash
log="/hadoop/auto_job_log.log"
echo "======================`date "+%Y-%m-%d %H:%M:%S"`增量======================" >> $log
nohup sqoop job --exec auto_job >> $log 2>&1 &
保存退出,赋予权限
[root@hadoop hadoop]# chmod +x auto_inr.sh

先来手动执行一下,不过执行前先再看看job的last_value时间:

[root@hadoop hadoop]# sqoop job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:50:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

看到是2019-03-22 17:24:42,再看下当前时间:

[root@hadoop hadoop]# date
Mon Mar 25 17:54:54 CST 2019

接下来手动执行下这个脚本:

[root@hadoop hadoop]# ./auto_inr.sh

然后去看重定向的日志:

[root@hadoop hadoop]# cat auto_job_log.log
======================2019-03-25 17:55:46增量======================
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:55:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/25 17:55:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/25 17:55:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/25 17:55:50 INFO tool.CodeGenTool: Beginning code generation
19/03/25 17:55:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/25 17:55:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:54 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/25 17:55:54 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-22 17:24:42', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of INR_JOB
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/25 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:55:57 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/25 17:55:57 INFO mapreduce.JobSubmitter: number of splits:1
19/03/25 17:55:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0009
19/03/25 17:55:58 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0009
19/03/25 17:55:58 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0009/
19/03/25 17:55:58 INFO mapreduce.Job: Running job: job_1553503985304_0009
19/03/25 17:56:07 INFO mapreduce.Job: Job job_1553503985304_0009 running in uber mode : false
19/03/25 17:56:07 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job: Job job_1553503985304_0009 completed successfully
19/03/25 17:56:15 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=144775FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=323HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=5270Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=5270Total vcore-milliseconds taken by all map tasks=5270Total megabyte-milliseconds taken by all map tasks=5396480Map-Reduce FrameworkMap input records=8Map output records=8Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=73CPU time spent (ms)=3000Physical memory (bytes) snapshot=205058048Virtual memory (bytes) snapshot=2135244800Total committed heap usage (bytes)=109576192File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=323
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Transferred 323 bytes in 20.9155 seconds (15.4431 bytes/sec)
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 8 records.
19/03/25 17:56:15 INFO tool.ImportTool: Final destination exists, will run merge job.
19/03/25 17:56:15 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
19/03/25 17:56:15 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:56:18 INFO input.FileInputFormat: Total input paths to process : 2
19/03/25 17:56:18 INFO mapreduce.JobSubmitter: number of splits:2
19/03/25 17:56:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0010
19/03/25 17:56:19 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0010
19/03/25 17:56:19 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0010/
19/03/25 17:56:19 INFO mapreduce.Job: Running job: job_1553503985304_0010
19/03/25 17:56:29 INFO mapreduce.Job: Job job_1553503985304_0010 running in uber mode : false
19/03/25 17:56:29 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:39 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:50 INFO mapreduce.Job:  map 100% reduce 100%
19/03/25 17:56:50 INFO mapreduce.Job: Job job_1553503985304_0010 completed successfully
19/03/25 17:56:50 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=1090FILE: Number of bytes written=436771FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=942HDFS: Number of bytes written=323HDFS: Number of read operations=9HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=2Launched reduce tasks=1Data-local map tasks=2Total time spent by all maps in occupied slots (ms)=14667Total time spent by all reduces in occupied slots (ms)=7258Total time spent by all map tasks (ms)=14667Total time spent by all reduce tasks (ms)=7258Total vcore-milliseconds taken by all map tasks=14667Total vcore-milliseconds taken by all reduce tasks=7258Total megabyte-milliseconds taken by all map tasks=15019008Total megabyte-milliseconds taken by all reduce tasks=7432192Map-Reduce FrameworkMap input records=16Map output records=16Map output bytes=1052Map output materialized bytes=1096Input split bytes=296Combine input records=0Combine output records=0Reduce input groups=8Reduce shuffle bytes=1096Reduce input records=16Reduce output records=8Spilled Records=32Shuffled Maps =2Failed Shuffles=0Merged Map outputs=2GC time elapsed (ms)=230CPU time spent (ms)=5420Physical memory (bytes) snapshot=684474368Virtual memory (bytes) snapshot=6394597376Total committed heap usage (bytes)=511705088Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=646File Output Format Counters Bytes Written=323
19/03/25 17:56:50 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/25 17:56:51 INFO tool.ImportTool: Updated data for job: auto_job

可以看到日志中这么一段话:

19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')

说明上限是当前时间,然后再看下当前job的last_value:

hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-25 17:55:54.0
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

和上面日志中的时间一致,如果job内容需要更改时,可以删了job重建更改好的job,手动指定时间为日志中的Upper bound或删除前记录下面last_value,为重建后的job提供增量时间。
上面手动调用没问题。就剩最后一步crontab定时了,crontab -e加入下面一段话(每五分钟增量一次):

*/2 * * * *  /hadoop/auto_inr.sh

如果一个表很大。我第一次初始化一部分最新的数据到hive表,如果没初始化进来的历史数据今天发生了变更,那merge-key的增量方式会不会报错呢?看下一篇测试文章吧,等写完会放链接到这:
https://blog.csdn.net/qq_28356739/article/details/88803284

【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍相关推荐

  1. 【大数据开发运维解决方案】ssh: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b问题解决过程

    文章目录 前言 一.问题描述 二.原因分析 三.解决方案 总结 前言 近期在做信创工作,运维Informatica的同事,在之前我为他做了个华为fusioninsight集群内节点与其Informat ...

  2. 【大数据开发运维解决方案】ogg(GoldenGate)三大进程常用参数

    ============================================1.管理进程 oggCI (ogg1) 22> edit params mgr PORT 7809 DYN ...

  3. 【大数据开发运维解决方案】Oracle Sql基础查询

    1.查找空值 null不支持加.减.乘.除.大小比较.相等比较,否则结果只能为空. SQL> select * from dept where 1>=null;DEPTNO DNAME L ...

  4. 【大数据开发运维解决方案】Solr5.1+Jcseg分词器安装部署(中英文同义词、停止词配置)

    Solr5.1+Jcseg分词器安装部署(同义词.停止词) ###################################################################### ...

  5. 【大数据开发运维解决方案】超级详细的VMware16安装Redhat8挂载镜像配置本地yum源安装unixODBC教程

    文章目录 前言 一.安装配置VMware 1.虚拟网络配置 2.修改windows网卡配置 二.安装RedHat8.6 1.虚拟机安装 1.1.新建虚拟机 1.2.接下来选择 稍候安装操作系统,下一步 ...

  6. 【大数据开发运维解决方案】hadoop fs常用命令案例解释

    文章目录 前言 一.hadoop fs常用命令参数列表 二.案例展示 hadoop fs -mkdir hadoop fs -ls hadoop fs -put hadoop fs -get hado ...

  7. 阿里,B站小伙伴刚刚分享的大数据开发运维学习规划,抓紧收藏

    一.大数据运维相关答疑与概述 1.0 Class介绍 本课程是专门培养大数据运维与架构方向专业人才的体系化课程.课程所有讲师小伙伴全部是在职的知名企业大数据开发专家,大数据技术专家职位员工,非专门的培 ...

  8. 为什么数据科学家需要承担开发运维的工作?

    作者 | Caleb Kaiser 译者 | 弯月,责编 | 夕颜 头图 | CSDN下载自视觉中国 出品 | CSDN(ID:CSDNnews) 以下为译文: 如果你需要创建一个生产环境下的机器学习 ...

  9. Terraform实战 | 实用云部署编程入门指南,DevOps软件开发运维必备

    Terraform是一种部署技术,任何想要通过基础设施即代码(Infrastructure as Code,IaC)方法来置备和管理基础设施的人,都可以使用这种技术.基础设施指的主要是基于云的基础设施 ...

最新文章

  1. pwa+webpack,初探与踩坑
  2. 架构模式:MVC与MVVM
  3. JspServlet之Cookie
  4. spring boot建立项目 git推送giteee
  5. 你从未见过的 HTML5 动画效果
  6. 视频丨包不同的沙雕敏捷之砸锅卖铁买兰博
  7. 苹果电脑获取Android Studio的发布版SHA1和开发版SHA1
  8. Oracle表空间、段、区和块
  9. sed替换html标签,sed多次替换
  10. SKNode的子类和常用的Action
  11. 驱动该如何入门 关于file_operations和Linux设备模型
  12. 用xshell7和xftp7连接虚拟机CentOS7.6的步骤
  13. 深度学习与人工神经网络的介绍
  14. 利用python3 爬取 网易云 上 周杰伦所有专辑,歌曲,评论,并完成可视化分析已经歌曲情绪化分析...
  15. 影子传文件到服务器,影子传说——文件超级隐身术
  16. 移动硬盘提示格式化解决办法,未正确删除导致不能读取文件提示格式化解决方案
  17. cf487e Tourists 题解(圆方树)
  18. 如何恢复手动删除的微信聊天记录?2个简单高效方法推荐
  19. Mackbook 外接移动硬盘无法写入数据(三种解决办法)
  20. 致远oa读取服务器信息失败,致远OA A8V5常见问题集

热门文章

  1. 自身未来的计划及目标
  2. 网络逻辑漏洞、SQLMAP、Nmap、burpsuite及其命令的使用方法
  3. 计算机控制系统信号分析
  4. 去图片水印软件有哪些?安利这几个实用的工具给你们
  5. 总体准确率(overall accuracy)、平均准确率(average accuracy)的含义
  6. 《Linux运维总结:vsftp服务器发回了不可路由的地址使用服务器地址代替》
  7. java 汽车加速_汽车加速没劲怎么办?四个简单的方法教你提升汽车动力!
  8. GitHub Flavored Markdown 规范
  9. 计算机四级百度云资源,干货来了!学习部网盘整理链接
  10. 大数据-智慧城市解决方案