【甘道夫】Spark1.3.0 Running Spark on YARN 官方文档精华摘要
欢迎转载,请注明出处:
![](/assets/blank.gif)
Property Name | Default | Meaning |
---|---|---|
spark.yarn.am.memory
|
512m |
Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m , 2g ). In cluster mode, use spark.driver.memory instead.
|
spark.driver.cores
|
1 |
Number of cores used by the driver in YARN cluster mode. Since the driver is run in the same JVM as the YARN Application Master in cluster mode, this also controls the cores used by the YARN AM. In client mode, usespark.yarn.am.cores to control the number of cores used by the YARN AM instead.
|
spark.yarn.am.cores
|
1 |
Number of cores to use for the YARN Application Master in client mode. In cluster mode, use spark.driver.cores instead.
|
spark.yarn.am.waitTime
|
100000 | In yarn-cluster mode, time in milliseconds for the application master to wait for the SparkContext to be initialized. In yarn-client mode, time for the application master to wait for the driver to connect to it. |
spark.yarn.submit.file.replication
|
The default HDFS replication (usually 3) | HDFS replication level for the files uploaded into HDFS for the application. These include things like the Spark jar, the app jar, and any distributed cache files/archives. |
spark.yarn.preserve.staging.files
|
false | Set to true to preserve the staged files (Spark jar, app jar, distributed cache files) at the end of the job rather than delete them. |
spark.yarn.scheduler.heartbeat.interval-ms
|
5000 | The interval in ms in which the Spark application master heartbeats into the YARN ResourceManager. |
spark.yarn.max.executor.failures
|
numExecutors * 2, with minimum of 3 | The maximum number of executor failures before failing the application. |
spark.yarn.historyServer.address
|
(none) | The address of the Spark history server (i.e. host.com:18080). The address should not contain a scheme (http://). Defaults to not being set since the history server is an optional service. This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI. |
spark.yarn.dist.archives
|
(none) | Comma separated list of archives to be extracted into the working directory of each executor. |
spark.yarn.dist.files
|
(none) | Comma-separated list of files to be placed in the working directory of each executor. |
spark.executor.instances
|
2 |
The number of executors. Note that this property is incompatible withspark.dynamicAllocation.enabled .
|
spark.yarn.executor.memoryOverhead
|
executorMemory * 0.07, with minimum of 384 | The amount of off heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). |
spark.yarn.driver.memoryOverhead
|
driverMemory * 0.07, with minimum of 384 | The amount of off heap memory (in megabytes) to be allocated per driver in cluster mode. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%). |
spark.yarn.am.memoryOverhead
|
AM memory * 0.07, with minimum of 384 |
Same as spark.yarn.driver.memoryOverhead , but for the Application Master in client mode.
|
spark.yarn.queue
|
default | The name of the YARN queue to which the application is submitted. |
spark.yarn.jar
|
(none) | The location of the Spark jar file, in case overriding the default location is desired. By default, Spark on YARN will use a Spark jar installed locally, but the Spark jar can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to a jar on HDFS, for example, set this configuration to "hdfs:///some/path". |
spark.yarn.access.namenodes
|
(none) | A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters. |
spark.yarn.appMasterEnv.[EnvironmentVariableName]
|
(none) |
Add the environment variable specified by EnvironmentVariableName to the Application Master process launched on YARN. The user can specify multiple of these and to set multiple environment variables. In yarn-cluster mode this controls the environment of the SPARK driver and in yarn-client mode it only controls the environment of the executor launcher.
|
spark.yarn.containerLauncherMaxThreads
|
25 | The maximum number of threads to use in the application master for launching executor containers. |
spark.yarn.am.extraJavaOptions
|
(none) | A string of extra JVM options to pass to the YARN Application Master in client mode. In cluster mode, use spark.driver.extraJavaOptions instead. |
spark.yarn.maxAppAttempts
|
yarn.resourcemanager.am.max-attempts in YARN | The maximum number of attempts that will be made to submit the application. It should be no larger than the global number of max attempts in the YARN configuration. |
- yarn-client模式:在yarn-client模式中,driver运行在client进程中,Application Master(Application Master是YARN架构的一部分,是运行在YARN中各个应用程序的调度器)仅仅用于向YARN申请资源(client在控制台可以看到程序打印输出)。
- yarn-cluster模式:在yarn-cluster模式中,Spark driver运行在 Application Master进程中(client控制台看不到程序打印的输出)。
![](/assets/blank.gif)
$ ./bin/spark-submit --class my.main.Class \--master
yarn-cluster
\
--jars my-other-jar.jar,my-other-other-jar.jar
my-main-jar.jarapp_arg1 app_arg2
【甘道夫】Spark1.3.0 Running Spark on YARN 官方文档精华摘要相关推荐
- 【甘道夫】Spark1.3.0 Cluster Mode Overview 官方文档精华摘要
引言 由于工作需要,即将拥抱Spark,曾经进行过相关知识的学习,现在计划详细读一遍最新版本Spark1.3的部分官方文档,一是复习,二是了解最新进展,三是为公司团队培训做储备. 欢迎转载,请注明出处 ...
- 【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详细解释
环境: hadoop2.2.0 hive0.13.1 Ubuntu 14.04 LTS java version "1.7.0_60" Oracle10g ***欢迎转载.请注明来 ...
- 【甘道夫】Win7x64环境下编译Apache Hadoop2.2.0的Eclipse插件
目标: 编译Apache Hadoop2.2.0在win7x64环境下的Eclipse插件 环境: win7x64家庭普通版 eclipse-jee-kepler-SR1-win32-x86_64.z ...
- 【甘道夫】Win7x64环境下编译Apache Hadoop2.2.0的Eclipse小工具
目标: 编译Apache Hadoop2.2.0在win7x64环境下的Eclipse插件 环境: win7x64家庭普通版 eclipse-jee-kepler-SR1-win32-x86_64.z ...
- 【甘道夫】HBase(0.96以上版本)过滤器Filter详解及实例代码
说明: 本文参考官方Ref Guide,Developer API和众多博客,并结合实测代码编写,详细总结HBase的Filter功能,并附上每类Filter的相应代码实现. 本文尽量遵从Ref Gu ...
- 【甘道夫】MapReduce实现矩阵乘法--实现代码
之前写了一篇分析MapReduce实现矩阵乘法算法的文章: [甘道夫]Mapreduce实现矩阵乘法的算法思路 为了让大家更直观的了解程序运行,今天编写了实现代码供大家參考. 编程环境: java v ...
- 用Python与Watson,将《魔戒》甘道夫的性格可视化!
全文共4301字,预计学习时长9分钟 图源Unsplash,由Marko Blažević提供 著名心理学家詹姆斯· 彭内贝克曾说:"仔细观察人们通过语言表达思想的方式,会感受到他们的性格特 ...
- 【甘道夫】Hadoop2.4.1尝鲜部署+完整版配置文件
引言 转眼间,Hadoop的stable版本已经升级到2.4.1了,社区的力量真是强大!3.0啥时候release呢? 今天做了个调研,尝鲜了一下2.4.1版本的分布式部署,包括NN HA(目前已经部 ...
- 【甘道夫】Hive扩展GIS函数
阶段一:编译函数包 基于 https://github.com/Esri/spatial-framework-for-hadoop 项目编译产出两个jar包: spatial-sdk-hive-2.1 ...
最新文章
- [YTU]_2008( 简单编码)
- 一、史上最强hadoop分布式集群的搭建
- dubbo通信协议之对比
- return view前端怎么获取_Web 前端路由原理解析和功能实现
- 牛客练习赛20:A. 礼物(组合数学/小球与盒子问题)
- 《统计学习方法》——k近邻法
- html5 预渲染,VUE预渲染及遇到的坑_情愫_前端开发者
- icem搅拌器网格划分_搅拌器研究所的第六个开放电影项目
- 米扑代理:爬虫代理IP哪家好呢
- 美国大学计算机系学什么,2017美国大学计算机专业排名
- ftp工具绿色版,四款好用的绿色版ftp工具
- 王安计算机科学思想,【OHI访谈手记】互联网口述历史访谈计算机先驱John E. Savage...
- 关于UML中的Stereotype
- 求最大公约数和最小公倍数的多种方法
- 计算机桌面底下显示条,详细教您电脑屏幕出现条纹怎么办
- AX4.0 SP2本地化的问题---汇兑损益报表打印
- 小试X64 inline HOOK,hook explorer.exe---CreateProcessInternalW监视进程创建
- 自学python第四天之实现LUR算法
- 西行漫记(14):慌神了
- FFmpeg中调用av_read_frame函数导致的内存泄漏问题
热门文章
- java在线考试系统源码下载_Java在线考试系统 SpringMVC实现 源码下载
- 二进制 正负数加减法 计算INT_MIN - 1=INT_MAX
- 物联网应用之 - 智能搜索系统
- 【开源】基于Blinker的智能WiFi物联网插座
- Cortex-M55的单片机AI技术Helium权威指南电子书发布(2020-09-08)
- validate和validateField的使用及区别
- 01-微服务探讨(摘)
- shell 批量修改多个文件中的内容
- 由玫琳凯赞助的NFTE世界创新系列赛挑战赛公布致力于解决全球问题的青年获胜者
- 选择IT运维工具,拒绝裸奔。