21.1 问题情况

  • Hive的MapReduce(后面用MR来简称MapReduce)作业无法正常运行,日志如下:
0: jdbc:hive2://localhost:10000>select count(*) from student;
…command(queryId=hive_20170902081616_d676f921-c62c-4fac-84b9-272663a2fca0); Timetaken: 10.029 seconds
Error: Error while processing statement: FAILED: Execution Error,return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
0: jdbc:hive2://localhost:10000>
  • MR作业无法正常运行,日志如下:
[root@ip-172-31-6-148 hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 5 5
...
Diagnostics: Exception from container-launch.
Container id: container_1504338960864_0005_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)at org.apache.hadoop.util.Shell.run(Shell.java:504)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
17/09/02 08:19:36 INFO mapreduce.Job: Counters: 0
Job Finished in 8.452 seconds
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-6-148:8020/user/root/QuasiMonteCarlo_1504340365604_1994724640/out/reduce-outat org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1820)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.RunJar.run(RunJar.java:221)at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
[root@ip-172-31-6-148 hadoop-mapreduce]#
  • 通过JobHistory页面无法查看作业的日志:

21.2 分析与解决

21.2.1 问题分析

  • 查看Yarn的ResourceManager日志,无法正常创建Container,异常如下:
Exit code: 1
Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)at org.apache.hadoop.util.Shell.run(Shell.java:504)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
…
Container id: container_1504341269835_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)at org.apache.hadoop.util.Shell.run(Shell.java:504)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)
  • 查看NodeManager节点日志,异常日志如下:
2017-09-02 08:37:35,317 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1504341269835_0001_01_000001 and exit code: 1
ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)at org.apache.hadoop.util.Shell.run(Shell.java:504)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:745)
2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1504341
  • 查看JobHistory服务的log日志
2017-09-02 08:40:31,676 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move intermediate done files
2017-09-02 08:40:32,880 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:PROXY) via mapred (auth:SIMPLE) cause:java.io.FileNotFoundException:
File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summaryat org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)2017-09-02 08:40:32,882 WARN org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService: Could not process job files
java.io.FileNotFoundException: File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summaryat org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

查看HDFS的Namenode日志,异常如下:

2017-09-02 08:37:29,445 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/root/.staging/job_1504341269835_0001/job.xml is closed by DFSClient_NONMAPREDUCE_478129775_1
2017-09-02 08:37:29,451 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.10.118:50010 is added to blk_1073744484_3660 size 106954
2017-09-02 08:37:35,265 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: P
ermission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:35,265 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.5.190:46293 Call#5
Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:40,188 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: P
ermission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:40,188 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.10.118:49343 Call#5Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:41,200 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/fail/root_appattempt_1504341269835_0001_000002 is closed by DFSClient_NONMAPREDUCE_-
860670620_215
2017-09-02 08:37:41,276 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073744476_3652 172.31.10.118:50010 172.31.9.33:
  • 分析结论:

    • 查看ResourceManager日志未发现原因
    • 查看NodeManager日志未发现原因
    • JobHistory日志无法正常查看,由于MR作业先在(/user/xxx用户/xxxJob)目录下创建临时日志文件,然后将日志文件移至/user/history目录。
    • 查看HDFS的NameNode日志,作业产生的临时日志文件无法正常写入/user/history目录
    • 问题原因是由于HDFS的/user/history目录权限低,导致Yarn作业日志无法记录

21.2.2 解决方法

  • 修改/user/history目录的权限及属主
sudo -u hdfs hadoop dfs -chmod 777 /user/history
sudo –u hdfs hadoop dfs –chown mapred:hadoop /user/history
  • 修改权限前
  • 修改权限后,数据正常写入,MR任务正常

大数据视频推荐:
CSDN
大数据语音推荐:
企业级大数据技术应用
大数据机器学习案例之推荐系统
自然语言处理
大数据基础
人工智能:深度学习入门到精通

21.Yarn的目录权限问题导致MR异常相关推荐

  1. linux oracle目录权限不够,Linux 目录权限不足导致ORA-39070错误 | 信春哥,系统稳,闭眼上线不回滚!...

    同事要做数据迁移测试,需要服务器权限,就在操作系统上给他创建了一个用户wzs,给分了dba组,拥有dba组的用户就可以正常操作数据库,而且可以使用最高权限(SYS). [root@SL010A-IVD ...

  2. Linux无法登陆,var目录权限修改导致SSH失败

    1.问题说明 Linux远程服务器突然无法SSH登录了, 登陆报错: ssh_exchange_identification: read: Connection reset by peer. 2.问题 ...

  3. linux usr目录权限不够,【ARM-Linux开发】Ubuntu下的/usr目录权限,导致不能使用sudo命令的修复...

    刚开始运行sudo时,报了下面这个错误 sudo: must be setuid root,于是上网找解决方法,搜索出来的都是这样解决的 ls -l/usr/bin/sudo chown root:r ...

  4. linux 查看网站目录权限,解决SELinux对网站目录权限控制的不当的问题

    前言:本文主要介绍了因为SELinux对网站目录权限控制的不当而引起网站无法正常操作和访问的问题. 正文开始:今天下午闲着没有事做于是突然兴起想尝试安装下Drupal.以前用Wordpress做博客久 ...

  5. Linux创建指定用户特定指定目录权限

    指定用户特定指定目录权限 需要注意要指定好文件夹的权限,不然会导致nginx不能访问,最好是在root下建立目录,然后 # useradd -d /usr/www -m tempuser # pass ...

  6. 11gR2 GI和DB安装目录权限属主被修改后的恢复方法

    某位仁兄新装一套11gR2 RAC的过程中,在GI的安装配置阶段遇到了安装目录无法写入的报错,于是他便将$GRID_HOME下所有目录和文件属主改成了grid:oinstall,将$GRID_HOME ...

  7. 文件/目录权限相关命令:chmod、chown、umask、lsattr/chattr命令解析

    2019独角兽企业重金招聘Python工程师标准>>> 本文索引: 文件/目录权限修改:chmod 预备知识 几种具体用法 重要参数: -R 所有者/所属组修改: chown 几种具 ...

  8. linux 文件权限的作用,Linux文件与目录权限的意义

    文件与目录权限的作用图 1-1 命令 chgrp(修改文件所属用户组) 格式:chgrp 要改成的组  文件名 -v 显示过程 -R 递归修改,如果是一个目录,那么这个参数会把目录和下面的所有文件的所 ...

  9. 文件或目录权限chmod,更改所有者和所属组chown ,umask的使用 ,隐藏权限的使用 lsattr,chattr...

    文件或目录权限chmod,更改所有者和所属组chown ,umask的使用 ,隐藏权限的使用 lsattr,chattr r = 可读w =可写 x=可执行 [root@alex ~]# ls -l ...

最新文章

  1. IHelloWorldService
  2. 这些最常用的Linux命令,每一条都应该学会!
  3. 2015华为校招机试题
  4. C语言在二叉搜索树找到第k个最小元素(附完整源码)
  5. bios文件查看工具_何必花钱升级显卡!AMD鸡血BIOS杀到
  6. 光端机安装调试需注意的几大因素
  7. Linux命令替换字符串
  8. 对 C++ 的忧虑?C++ 创始人警告:关于 C++ 的某些未来计划十分危险
  9. sencha touch 在安卓中横屏、竖屏切换 应用崩溃问题
  10. SpringBoot项目从Git拉取代码并完成编译打包启动的sh自动脚本
  11. 与缓存有关的http-header
  12. mysql 1690_mysql error BIGINT UNSIGNED value is out of range in 解决办法
  13. java自定义窗口_Java-创建一个自定义窗口,扁平化界面
  14. java jsession,JSession
  15. linux shell将字符串分割数组
  16. mocha 的基本介绍expect风格断言库的基本语法
  17. 【POJ 3062】Party(2-SAT、tarjan)
  18. python曲线和直线的交点_求直线与分段线性曲线的交点
  19. Java获取世界各国各城市代码_获取世界各国、全国省份、城市、县
  20. html转换成avi,swf怎么转换成avi,swf转换avi的方法

热门文章

  1. thickbox的用法
  2. C++11 for循环新用法、for_each 用法
  3. 智慧物流:物流行业下一个风口,仓储转运全方位掌控
  4. 推荐一款 IntelliJ IDEA 神级插件,由 ChatGPT 团队开发,免费使用,堪称辅助神器!
  5. 交易开拓者TB夜盘编程技术集
  6. 庄帅:马云怕的不是微信,怕的是微信“扫一扫”!
  7. 大专简单的2年程序人生
  8. HTML文本框只读状态
  9. 汽车示波器的类型有哪些?
  10. CppWeekly 06 structured binding