最近由于项目需要在研究spark相关的内容,形成了一些技术性文档,发布这记录下,懒得翻译了。

There are some spaces the official documents didn't explain very clearly, especially on some details. Here are given some more explanations base on the practices  I did  and the source codes I read these days.

(The official document link is http://spark.apache.org/docs/latest/job-scheduling.html)

  1. There are two different schedulers in current spark implementation, FIFO is the default setting and the initial way that spark implement.
  2. Both FIFO and FAIR schedulers can support the basic functionality that multiple parallel jobs run simultaneously, the prerequisite is that they are submitted from separate threads. (i.e., in single thread, the jobs are executed in order)
  3. In FIFO Scheduler, the jobs which are submitted earlier has higher priority and possibility than those later jobs. But it doesn't mean that the first job will execute first, it is also possible that later jobs run before the earlier ones if the resources of the whole cluster are not occupied. However,  the FIFO scheduler will cause the worst case: if the first jobs are large, the later jobs maybe suffer significant delay.
  4. The FAIR Scheduler is the way corresponding to Hadoop FAIR scheduler and enhancement for FIFO. In FIFO fashion, there is only one factor Priority will be considered in SchedulableQueue; While in FAIR fashion, more factors will be considered including minshare, runningtasks, weight (You can reference the code below if interest).Similarly, the jobs don't always run by following the rules by FairSchedulingAlgorithm strictly, while as a whole, the FAIR scheduler really alleviate largely the delay time for small jobs by adjusting the parameters which were delayed significantly in FIFO fashion in my observation through the concurrent JMeter tests。

private[spark] class FIFOSchedulingAlgorithm extends SchedulingAlgorithm {override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {val priority1 = s1.priorityval priority2 = s2.priorityvar res = math.signum(priority1 - priority2)if (res == 0) {val stageId1 = s1.stageIdval stageId2 = s2.stageIdres = math.signum(stageId1 - stageId2)}if (res < 0) {true} else {false}}
}
private[spark] class FairSchedulingAlgorithm extends SchedulingAlgorithm {override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {val minShare1 = s1.minShareval minShare2 = s2.minShareval runningTasks1 = s1.runningTasksval runningTasks2 = s2.runningTasksval s1Needy = runningTasks1 < minShare1val s2Needy = runningTasks2 < minShare2val minShareRatio1 = runningTasks1.toDouble / math.max(minShare1, 1.0).toDoubleval minShareRatio2 = runningTasks2.toDouble / math.max(minShare2, 1.0).toDoubleval taskToWeightRatio1 = runningTasks1.toDouble / s1.weight.toDoubleval taskToWeightRatio2 = runningTasks2.toDouble / s2.weight.toDoublevar compare: Int = 0if (s1Needy && !s2Needy) {return true} else if (!s1Needy && s2Needy) {return false} else if (s1Needy && s2Needy) {compare = minShareRatio1.compareTo(minShareRatio2)} else {compare = taskToWeightRatio1.compareTo(taskToWeightRatio2)}if (compare < 0) {true} else if (compare > 0) {false} else {s1.name < s2.name}}
 

  5.The pools in FIFO and FAIR schedulers

转载于:https://www.cnblogs.com/taxuexunmei/p/4991250.html

Spark Job Scheduling相关推荐

  1. spark性能优化 -- spark工作原理

    从本篇文章开始,将开启 spark 学习和总结之旅,专门针对如何提高 spark 性能进行总结,力图总结出一些干货. 无论你是从事算法工程师,还是数据分析又或是其他与数据相关工作,利用 spark 进 ...

  2. spark总结——转载

    转载自:    spark总结 第一个Spark程序 /** * 功能:用spark实现的单词计数程序 * 环境:spark 1.6.1, scala 2.10.4 */ // 导入相关类库impor ...

  3. sparkcore分区_Spark学习:Spark源码和调优简介 Spark Core (二)

    本文基于 Spark 2.4.4 版本的源码,试图分析其 Core 模块的部分实现原理,其中如有错误,请指正.为了简化论述,将部分细节放到了源码中作为注释,因此正文中是主要内容. 第一部分内容见: S ...

  4. Spark Streaming实践和优化

    2019独角兽企业重金招聘Python工程师标准>>> Spark Streaming实践和优化 博客分类: spark 在流式计算领域,Spark Streaming和Storm时 ...

  5. Spark源码阅读02-Spark核心原理之作业执行原理

    概述 Spark的作业调度主要是指基于RDD的一系列操作构成的一个作业,在Executor中执行的过程.其中,在Spark作业调度中最主要的是DAGScheduler和TaskScheduler两个调 ...

  6. Spark源码分析 – DAGScheduler

    DAGScheduler的架构其实非常简单, 1. eventQueue, 所有需要DAGScheduler处理的事情都需要往eventQueue中发送event 2. eventLoop Threa ...

  7. Spark学习之路 (十五)SparkCore的源码解读(一)启动脚本

    讨论QQ:1586558083 目录 一.启动脚本分析 1.1 start-all.sh 1.2 start-master.sh 1.3 spark-config.sh(1.2的第5步) 1.4 lo ...

  8. Spark Streaming学习笔记

    特点: Spark Streaming能够实现对实时数据流的流式处理,并具有很好的可扩展性.高吞吐量和容错性. Spark Streaming支持从多种数据源提取数据,如:Kafka.Flume.Tw ...

  9. Spark详解(六):Spark集群资源调度算法原理

    1. 应用程序之间 在Standalone模式下,Master提供里资源管理调度功能.在调度过程中,Master先启动等待列表中应用程序的Driver,这个Driver尽可能分散在集群的Worker节 ...

最新文章

  1. php日期选择插件,优雅的日期选择插件daterangepicker
  2. 数学分析原理 定理 6.4
  3. dedecms上传图片不自动改名,以利于seo图片优化
  4. oracle导出库压缩参数,oracle数据库的定时备份:导出 压缩 归类
  5. SecureCRT如何进入和退出全屏及调出菜单栏
  6. Powershell 查看软件是否成功安装
  7. [Leetcode][第410题][JAVA][分割数组的最大值][动态规划][二分]
  8. 图结构练习——判断给定图是否存在合法拓扑序列
  9. Vuex核心知识(2.0)
  10. 基于APMSSGA-LSTM的容器云资源预测
  11. 解决Mac打开matlab编码问题
  12. 前端工程师技能之photoshop巧用系列第一篇——准备篇
  13. python迭代器、生成器和yield语句
  14. htc a620d 刷android,续航以及最后的总结_HTC A620d - CNMO
  15. 前端播放flv的视频
  16. vc9.vc11.vc14_vc解释了为什么vc现在如此生气
  17. 华为云服务怎么弄金卡会员_华为云XR云服务,助力千行百业产业升级
  18. 关于微命令,微指令,微程序,机器周期,机器指令的关系图解(超详细)
  19. linux下安装Redis
  20. 【计算机网络】网络层精选习题1(含联考真题)

热门文章

  1. Flask Oauth
  2. linux 用户及权限管理
  3. jQuery 事件方法(交互)
  4. C语言 gcc API
  5. R语言聚类算法之密度聚类(Density-based Methods)
  6. 输入法半角和全角的快捷转换_华宇拼音输入法 一款完全免费的国产输入法_第1页...
  7. Nginx学习总结(13)——Nginx 重要知识点回顾
  8. 阿里电商架构演变之路(一)
  9. java架构师之路:推荐的15本书
  10. 计算机系班级海报,系部动态 | 电子系“五彩班栏”班级海报评比