Spark SQL Exception in task 0.0 in stage 0.0 (TID 0)org.apache.hadoop.hdfs.BlockMissingException
2.ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:同样属于访问不到HDFS数据节点问题
21/06/1919:30:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2074914242-172.23.8.102-1618472170982:blk_1073744030_3207 file=/user/hive/warehouse/hdu.db/city_info/city_info.txtat org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:976)at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:632)at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:874)at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:926)at java.io.DataInputStream.read(DataInputStream.java:149)at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:308)at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:239)at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at org.apache.spark.scheduler.Task.run(Task.scala:127)at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
问题原因:
NameNode节点存放的是文件目录,也就是文件夹、文件名称,本地可以通过公网访问 NameNode,所以可以进行文件夹的创建,当上传文件需要写入数据到DataNode时,NameNode 和DataNode 是通过局域网进行通信,NameNode返回地址为 DataNode 的私有 IP,本地无法访问
解决方案:
返回的IP地址无法返回公网IP,所以通过设置让其返回主机名,通过主机名与公网地址的映射便可以访问到DataNode节点,问题将解决。
由于代码的设置的优先级为最高,所以直接进行代码的设置:
添加配置信息:
config("dfs.client.use.datanode.hostname", "true")
config("dfs.replication", "2")
如下添加:
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("sparkSQL")
val spark = SparkSession.builder().enableHiveSupport().config(sparkConf).config("dfs.client.use.datanode.hostname", "true").config("dfs.replication", "2").getOrCreate()
Spark SQL Exception in task 0.0 in stage 0.0 (TID 0)org.apache.hadoop.hdfs.BlockMissingException相关推荐
- spark 2.2 读取 Hadoop3.0 数据异常 org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterfa
spark 2.2 读取 Hadoop3.0 数据异常 Exception in thread "main" java.lang.IllegalAccessError: class ...
- Spark运行任务时报错:org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of...
今天运行spark任务时,遇到一个错误,主要报错信息如下: org.apache.spark.SparkException:Task failed while writing rows. Caused ...
- org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block
Hbase依赖的datanode日志中如果出现如下报错信息:DataXceiverjava.io.EOFException: INFO org.apache.hadoop.hdfs.server.da ...
- ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
grep -ri Exception得到如下信息: /home/appleyuchi/bigdata/hadoop-3.2.1/logs/hadoop-appleyuchi-secondaryname ...
- Spark SQL入门示例
pom <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http:// ...
- 快学Big Data -- Spark SQL总结(二十四)
Spark SQL 总结 概述 Spark Sql 是用来处理结构化数据的一个模块,它提供了一个编程抽象叫做DataFrame并且作为分布式SQL查询引擎的作用. 特点 spark sql 要比 ...
- spark sql运行时候出现cannot resolve '`bid`' given input columns: [bid, name, iphone];
测试代码如下: object ReadFileTest {def main(args: Array[String]): Unit = {//创建spark环境val spark: SparkSessi ...
- Spark SQL 简介
是什么? image Spark 1.0 推出 Spark SQL,是 Spark 生态系统中最活跃的组件之一.能够利用 Spark 进行结构化的存储和操作.结构化数据可以来自外部源:Hive/Jso ...
- spark sql常用方法
常用方法 show scala> emp.show() +----+------+-----+------+----------+---------+----+------+ |comm|dep ...
- spark sql的行转列
起因 日常应用中,我们经常会使用到把行转成列的功能,以NBA球队的球员薪资记录作为例子,表中的每一条记录表示球队在某一年支付的员工薪资记录. team year salary Laker 2019 2 ...
最新文章
- 浏览器从输入url到页面加载完成发生了什么
- 如何处理大数据:微博信息流数据库设计
- 6.没有Release文件。N:无法安全地用该源进行更新,所以默认禁用该源解决
- PHP probuf详细步骤_初识protobuf和php的相关用法
- 我xp电脑桌面没有计算机图标不见了,xp系统我的电脑图标不见了怎么办|如何找回我的电脑图标-系统城...
- 计算机里的网络是什么意思啊,计算机网络中本地站点是什么意思
- 【转载】C#中可使用string.Empty代表空字符
- synchronized()_Synchronized关键字引出的多种锁
- NumPy:数组计算
- API接口文档范文-API接口文档示例
- 各尺寸学术会议海报模板[转]
- Googel knowledge graph API
- WPF 加载GIF图片
- 密封橡胶圈尺寸缺陷视觉检测系统
- firefox android手势,6款Firefox鼠标手势扩展推荐
- Springboot集成使用阿里云kafka详细步骤
- Java —— 自定义JSR303校验
- 云麦体脂秤华为体脂秤_荣耀体脂秤和小米体脂秤对比哪个好 荣耀/小米体脂秤评测...
- LORA手持机便携终端PDA的应用场景
- 做一个海纳百川的方外之人------我的极乐世界观