一:该如何使其生效:

  • set hive.auto.convert.join = true;  --是否开自动mapjoin
  • set hive.mapjoin.smalltable.filesize;   --mapjoin的表size大小

两个同时设置。具体的 filesize 量力而行,默认我记得好像使25m 很多帖子上的奇怪语法你也不用去看,都是老掉牙的东西了,比如:/*+ mapjoin(A)*/,除非你的hive版本很低,否则根本用不上。

还有一个:set hive.ignore.mapjoin.hint=true; 这个的话我觉得咩有必要。集群本身也是有参数设置的,且运维是有考量的。即使这么干了也不一定就会生效。走常规的方式即可。适当的调整hive.mapjoin.smalltable.filesize 这个值的大小。其实这个本身就是对小表来说的,但是大小是相对的,你如果有一个500g的表和一个50g的小表关联,你放内存真不一定就合适。个人建议1g以下的可以考虑,太大的话就没必要了。

二:hive 的mapjoin起作用了我们如何确定?

  • 只是inner join 的时候

看日志吧,最为直观:

2021-12-10 12:05:41  Starting to launch local task to process map join;  maximum memory = 954728448
2021-12-10 12:05:44 Processing rows:    200000  Hashtable size: 199999  Memory usage:   135058920   percentage: 0.141
2021-12-10 12:05:44 Dump the side-table into file: file:/tmp/hive_2021-12-10_11-47-34_913_2061727660300134431-1/-local-10007/HashTable-Stage-13/MapJoin-mapfile10--.hashtable
2021-12-10 12:05:44 Uploaded 1 File to: file:/tmp/hive_2021-12-10_11-47-34_913_2061727660300134431-1/-local-10007/HashTable-Stage-13/MapJoin-mapfile10--.hashtable (3517 bytes)
2021-12-10 12:05:44 Dump the side-table into file: file:/tmp/hive_2021-12-10_11-47-34_913_2061727660300134431-1/-local-10007/HashTable-Stage-13/MapJoin-mapfile12--.hashtable
2021-12-10 12:05:44 Uploaded 1 File to: file:/tmp/hive_2021-12-10_11-47-34_913_2061727660300134431-1/-local-10007/HashTable-Stage-13/MapJoin-mapfile12--.hashtable (8683158 bytes)
2021-12-10 12:05:44 End of local task; Time Taken: 3.034 sec.
Execution completed successfully

关键点:

  1. Starting to launch local task to process map join; 这个说的够直白了吧
  2. Uploaded 1 File to: file:/tmp/hive_2021-12-10_11-47-34_913_2061727660300134431-1/-local-10007/HashTable-Stage-13/MapJoin-mapfile10--.hashtable    hashtable
  3. end of local task
  4. 起一个local task 映射成一个hashtable

补充一下:

我发现其实left join 在满足条件的时候也是会走mapjoin的。

STAGE DEPENDENCIES:Stage-9 is a root stage , consists of Stage-11, Stage-1Stage-11 has a backup stage: Stage-1Stage-8 depends on stages: Stage-11Stage-7 depends on stages: Stage-1, Stage-8 , consists of Stage-10, Stage-2Stage-10 has a backup stage: Stage-2Stage-6 depends on stages: Stage-10Stage-3 depends on stages: Stage-2, Stage-6Stage-2Stage-1Stage-0 is a root stageSTAGE PLANS:Stage: Stage-9Conditional OperatorStage: Stage-11Map Reduce Local WorkAlias -> Map Local Tables:t_2:temp_sjs_interact_cf_top10_t1 --21.3mFetch Operatorlimit: -1Alias -> Map Local Operator Tree:t_2:temp_sjs_interact_cf_top10_t1TableScanalias: temp_sjs_interact_cf_top10_t1Filter Operatorpredicate: sjs_r is not null (type: boolean)Select Operatorexpressions: uid (type: string), inter_type (type: string), sjs_r (type: string), level_cf (type: string)outputColumnNames: _col0, _col1, _col2, _col3HashTable Sink Operatorcondition expressions:0 {_col0} {_col1} {_col2} {_col3}1 {_col2}keys:0 _col0 (type: string), _col2 (type: string), _col1 (type: string)1 _col0 (type: string), _col3 (type: string), _col1 (type: string)Stage: Stage-8Map ReduceMap Operator Tree:TableScanalias: temp_sjs_interact_cf_top10_t2Select Operatorexpressions: uid (type: string), inter_type (type: string), level_cf (type: string), cnt (type: string)outputColumnNames: _col0, _col1, _col2, _col3Map Join Operatorcondition map:Left Outer Join0 to 1condition expressions:0 {_col0} {_col1} {_col2} {_col3}1 {_col2}keys:0 _col0 (type: string), _col2 (type: string), _col1 (type: string)1 _col0 (type: string), _col3 (type: string), _col1 (type: string)outputColumnNames: _col0, _col1, _col2, _col3, _col6File Output Operatorcompressed: falsetable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDeLocal Work:Map Reduce Local WorkStage: Stage-7Conditional OperatorStage: Stage-10Map Reduce Local WorkAlias -> Map Local Tables:t_3:ods_user_base_infoFetch Operatorlimit: -1Alias -> Map Local Operator Tree:t_3:ods_user_base_infoTableScanalias: ods_user_base_infoSelect Operatorexpressions: uid (type: string), nick (type: string)outputColumnNames: _col0, _col1HashTable Sink Operatorcondition expressions:0 {_col6} {_col0} {_col1} {_col2} {_col3}1 {_col1}keys:0 _col0 (type: string)1 _col0 (type: string)Stage: Stage-6Map ReduceMap Operator Tree:TableScanMap Join Operatorcondition map:Left Outer Join0 to 1condition expressions:0 {_col6} {_col0} {_col1} {_col2} {_col3}1 {_col1}keys:0 _col0 (type: string)1 _col0 (type: string)outputColumnNames: _col2, _col4, _col5, _col6, _col7, _col9Select Operatorexpressions: _col4 (type: string), _col9 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col2 (type: string)outputColumnNames: _col4, _col9, _col5, _col6, _col7, _col2Group By Operatorkeys: _col4 (type: string), _col9 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col2 (type: string)mode: hashoutputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5File Output Operatorcompressed: falsetable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDeLocal Work:Map Reduce Local WorkStage: Stage-3Map ReduceMap Operator Tree:TableScanReduce Output Operatorkey expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string)sort order: ++++++Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string)Statistics: Num rows: 1598946560 Data size: 319789301760 Basic stats: COMPLETE Column stats: NONEReduce Operator Tree:Group By Operatorkeys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: string), KEY._col5 (type: string)mode: mergepartialoutputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5Statistics: Num rows: 799473280 Data size: 159894650880 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string)outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5Statistics: Num rows: 799473280 Data size: 159894650880 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 799473280 Data size: 159894650880 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-2Map ReduceMap Operator Tree:TableScanReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 30685 Data size: 12274456 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col6 (type: string), _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string)TableScanalias: ods_user_base_infoStatistics: Num rows: 1453587694 Data size: 290717538880 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: uid (type: string), nick (type: string)outputColumnNames: _col0, _col1Statistics: Num rows: 1453587694 Data size: 290717538880 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 1453587694 Data size: 290717538880 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: string)Reduce Operator Tree:Join Operatorcondition map:Left Outer Join0 to 1condition expressions:0 {VALUE._col2} {VALUE._col4} {VALUE._col5} {VALUE._col6} {VALUE._col7}1 {VALUE._col1}outputColumnNames: _col2, _col4, _col5, _col6, _col7, _col9Statistics: Num rows: 1598946560 Data size: 319789301760 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col4 (type: string), _col9 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col2 (type: string)outputColumnNames: _col4, _col9, _col5, _col6, _col7, _col2Statistics: Num rows: 1598946560 Data size: 319789301760 Basic stats: COMPLETE Column stats: NONEGroup By Operatorkeys: _col4 (type: string), _col9 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col2 (type: string)mode: hashoutputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5Statistics: Num rows: 1598946560 Data size: 319789301760 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falsetable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDeStage: Stage-1Map ReduceMap Operator Tree:TableScanalias: temp_sjs_interact_cf_top10_t2Statistics: Num rows: 5 Data size: 2336 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: uid (type: string), inter_type (type: string), level_cf (type: string), cnt (type: string)outputColumnNames: _col0, _col1, _col2, _col3Statistics: Num rows: 5 Data size: 2336 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string), _col2 (type: string), _col1 (type: string)sort order: +++Map-reduce partition columns: _col0 (type: string), _col2 (type: string), _col1 (type: string)Statistics: Num rows: 5 Data size: 2336 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string)TableScanalias: temp_sjs_interact_cf_top10_t1Statistics: Num rows: 55792 Data size: 22317192 Basic stats: COMPLETE Column stats: NONEFilter Operatorpredicate: sjs_r is not null (type: boolean)Statistics: Num rows: 27896 Data size: 11158596 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: uid (type: string), inter_type (type: string), sjs_r (type: string), level_cf (type: string)outputColumnNames: _col0, _col1, _col2, _col3Statistics: Num rows: 27896 Data size: 11158596 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string), _col3 (type: string), _col1 (type: string)sort order: +++Map-reduce partition columns: _col0 (type: string), _col3 (type: string), _col1 (type: string)Statistics: Num rows: 27896 Data size: 11158596 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col2 (type: string)Reduce Operator Tree:Join Operatorcondition map:Left Outer Join0 to 1condition expressions:0 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3}1 {VALUE._col2}outputColumnNames: _col0, _col1, _col2, _col3, _col6Statistics: Num rows: 30685 Data size: 12274456 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falsetable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDeStage: Stage-0Fetch Operatorlimit: -1

执行计划已经很好的说明了问题。

hive之mapjoin相关推荐

  1. Hive MapJoin OOM

    Hive升级完后ETL开发找到我说有的Job一直failed.看了一下在MAP阶段进行MAPJOIN处理时就OOM了,但是开发说没有加MAPJOIN HINT,其实在0.11后hive.auto.co ...

  2. 技术02期:这么做竟然能让你的hive运行得更流畅!

    导读 在大数据领域中,hive是基于Hadoop的一个数据仓库工具,主要用于对大数据量的处理工作,在平时设计和查询时要特别注意效率.数据倾斜.数据冗余.job或者I/O过多,MapReduce分配不合 ...

  3. 大数据开发实战:Hive优化实战2-大表join小表优化

    4.大表join小表优化 和join相关的优化主要分为mapjoin可以解决的优化(即大表join小表)和mapjoin无法解决的优化(即大表join大表),前者相对容易解决,后者较难,比较麻烦. 首 ...

  4. hive常用参数配置设置

    hive.exec.mode.local.auto  决定 Hive 是否应该自动地根据输入文件大小,在本地运行(在GateWay运行)  true hive.exec.mode.local.auto ...

  5. hive 数据迁移SQL

    2019独角兽企业重金招聘Python工程师标准>>> #!/bin/sh . /etc/profileworkdir=$(dirname $0) cd $workdir || ex ...

  6. hive报错(1)MoveTask/HIVE return code 1、2、3

    今天在CDH上执行hive sql的时候报了一个错 错误内容为: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive. ...

  7. hive 行转列和列转行的方法_读离线和实时大数据开发实战,为你揭开 Hive 优化实践的神秘面纱...

    前言 「1024,1GB,一级棒!程序仔们节日快乐!」 ❝ 指尖流动的 1024 行代码,到底是什么? ❞ ❝ 是10行的迷茫?是101行的叛逆?是202行的理性思考?是307行对渴望的冲动?还是40 ...

  8. hive链接mysql的shell命令_Hive shell 基本命令

    首先连接 hive shell 直接输入 hive启动, 使用--开头的字符串来表示注释 hive>quit; --退出hive hive> exit; --exit会影响之前的使用,所以 ...

  9. Hive中JOIN的使用入门

    Hive中join的用法 Hive中Join的通常使用有以下几种: inner join 等值连接 left join  right join  full join left semi join cr ...

最新文章

  1. 一篇文章能够看懂基础代码之CSS
  2. 表格(增加行号) http://www.blogjava.net/zeyuphoenix/archive/2010/04/19/318788.html
  3. 读书笔记之《习惯的力量》
  4. shell技巧之以逆序形式打印行
  5. 多于2个字符串的拼接,禁止使用“+”,而应该用join
  6. excel 宏编程_在 Excel 中使用 Python 开发宏脚本
  7. 文献记录(part11)--Biclustering of Expression Data
  8. 使用OpenSSL工具制作X.509证书的方法及其注意事项总结
  9. IOS之Info.plist文件简介
  10. bind() 理解 【转】
  11. asp连接mysql未发现数据源名称_asp.net – 连接到MySQL导致错误“未找到数据源名称且未指定默认驱动程序”...
  12. 一篇别人写的Kmp算法的讲解,多看多得
  13. 形式语言与自动机 3.正则表达式
  14. 游艇床垫MED认证证书/床垫Wheel Mark舵轮标识认证
  15. Make sure ‘SystemCfg‘ is registered using qRegisterMetaType
  16. Python爬虫学习笔记 (11) [初级] 小练习 爬取Eason所有歌曲歌词 制作词云图
  17. 写给静不下心来的朋友们
  18. 知识蒸馏 | 知识蒸馏理论篇
  19. 今日睡眠质量记录79
  20. 代码画验证码图片(一)

热门文章

  1. android获取uevent实例,Android的uevent通信机制
  2. 2023华东杯数学建模A题B题C题思路模型代码
  3. java 常量放在哪里_浅谈JAVA中字符串常量的储存位置
  4. 国产操作系统使用经历:红旗,深度,优麒麟
  5. 在华为鲲鹏openEuler20.03系统上安装Redis, Zookeeper, Nginx
  6. 测试飞机高度和速度的软件,飞机是怎样测量飞行的高度、速度和方向的?
  7. 马铃薯(土豆)播种机设计
  8. protocal 协议
  9. 谷歌开源了星际争霸2 AI训练框架
  10. Glide在RecyclerView自适应图片尺寸