spark的三种运行模式以及yarn-client和yarn-cluster在提交命令上的区别

本文深入解析了Spark的各种部署模式,包括本地模式、Standalone-Client、Standalone-Cluster、Yarn-client、Yarn-cluster以及Mesos模式。详细介绍了每种模式下的spark-submit命令配置,并解释了MasterURLs的不同格式及其含义。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文针对的是Spark 2.3.1

standalone:线下模式

分为standalone-client和standalone-cluster两种模式

 

yarn:线上模式

又分为yarn-client(调试模式)和yarn-cluster

--master yarn和--master yarn-client效果等效[2]

 

mesos:线上模式(官方推荐)

#----------------------------------------------------

本地模式(8核,伪分布式):

spark-submit --class WordCountLocal --master local[8] /home/appleyuchi/IdeaProjects/scala-learn/target/scalalearn-1.0-SNAPSHOT-jar-with-dependencies.jar


 

Standalone-Client模式:

spark-submit --class WordCountLocal --master spark://master:7077 /home/appleyuchi/IdeaProjects/scala-learn/target/scalalearn-1.0-SNAPSHOT-jar-with-dependencies.jar

 

Standalone-Client模式(python工程文件):

spark-submit --class WordCountLocal --master spark://master:7077 xxx.py

 

Standalone-Cluster模式:

spark-submit --class WordCountLocal --master spark://master:7077 --deploy-mode cluster  --supervise  /home/appleyuchi/IdeaProjects/scala-learn/target/scalalearn-1.0-SNAPSHOT-jar-with-dependencies.jar

 

 

Yarn-client模式:

spark-submit --class WordCountLocal --master yarn --deploy-mode client  /home/appleyuchi/IdeaProjects/scala-learn/target/scalalearn-1.0-SNAPSHOT-jar-with-dependencies.jar

yarn-client模式时,后边这句--deploy-mode client可写可不写

 

Yarn-cluster模式:

spark-submit --class WordCountLocal --master yarn --deploy-mode cluster  /home/appleyuchi/IdeaProjects/scala-learn/target/scalalearn-1.0-SNAPSHOT-jar-with-dependencies.jar

 

yarn的意思是使用hadoop的资源管理器,standalone的意思是使用spark自带的资源管理器

 

[4]Master URLs

The master URL passed to Spark can be in one of the following formats:

Master URLMeaning
localRun Spark locally with one worker thread (i.e. no parallelism at all).
local[K]Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
local[K,F]Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable)
local[*]Run Spark locally with as many worker threads as logical cores on your machine.
local[*,F]Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures.
spark://HOST:PORTConnect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.
spark://HOST1:PORT1,HOST2:PORT2Connect to the given Spark standalone cluster with standby masters with Zookeeper. The list must have all the master hosts in the high availability cluster set up with Zookeeper. The port must be whichever each master is configured to use, which is 7077 by default.
mesos://HOST:PORTConnect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk://.... To submit with --deploy-mode cluster, the HOST:PORT should be configured to connect to the MesosClusterDispatcher.
yarnConnect to a YARN cluster in client or cluster mode depending on the value of --deploy-mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.
k8s://HOST:PORTConnect to a Kubernetes cluster in cluster mode. Client mode is currently unsupported and will be supported in future releases. The HOST and PORT refer to the Kubernetes API Server. It connects using TLS by default. In order to force it to use an unsecured connection, you can use k8s://http://HOST:PORT.

 

Reference:

[1]yarn-cluster和yarn-client提交模式的区别

[2]4.5.1 Yarn-Client模式实例部署及运行演示

[3]Running Spark on YARN

[4]Submitting Applications

[5]Standalone模式两种提交任务方式

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值