Spark(3.3.0)集群部署
组件版本
组件 | 版本 | 下载地址 |
---|---|---|
JDK | 1.8 | Download JDK |
Hadoop | 3.3.3 | Download Hadoop |
Spark | 3.3.0 | Download Spark |
**机器环境 **
IP | 主机名 | 密码 |
---|---|---|
192.168.222.101 | master | password |
192.168.222.102 | slave1 | password |
192.169.222.103 | slave2 | password |
1、基础环境
- firewalld
- SSH
- NTP
- JDK
- MySQL
- Hadoop
2、Spark集群部署
2.1、解压并重命名
[root@master ~]#
tar -xzvf /opt/spark-3.3.0-bin-hadoop3.tgz -C /usr/local/src/
mv /usr/local/src/spark-3.3.0-bin-hadoop3 /usr/local/src/spark
2.2、配置环境变量
[root@master ~]#
vim /root/.bash_profile
export SPARK_HOME=/usr/local/src/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin
source /root/.bash_profile
2.3、配置spark-env.sh
[root@master ~]#
cp /usr/local/src/spark/conf/spark-env.sh.template /usr/local/src/spark/conf/spark-env.sh
vim /usr/local/src/spark/conf/spark-env.sh
# java位置
export JAVA_HOME=/usr/local/src/java
# master节点IP或域名
export SPARK_MASTER_IP=master
# worker内存大小
export SPARK_WORKER_MEMORY=4G
# Worker的cpu核数
export SPARK_WORKER_CORES=2
export SCALA_HOME=/usr/local/scala
export SPARK_LOCAL_DIRS=/usr/local/src/spark
# hadoop配置文件路径
export HADOOP_CONF_DIR=/usr/local/src/hadoop/etc/hadoop
2.4、配置workers
[root@master ~]#
cp /usr/local/src/spark/conf/workers.template /usr/local/src/spark/conf/workers
vim /usr/local/src/spark/conf/workers
master
slave1
slave2
2.5、添加mysql驱动
[root@master ~]#
cp /opt/mysql-connector-java-5.1.47.jar /usr/local/src/spark/jars/
2.6、分发文件
[root@master ~]#
rsync -av /usr/local/src/spark slave1:/usr/local/src/
rsync -av /usr/local/src/spark slave2:/usr/local/src/
2.7、测试spark集群
[root@master ~]#
spark-submit --master yarn --class org.apache.spark.examples.SparkPi /usr/local/src/spark/examples/jars/spark-examples_2.12-3.3.0.jar
3、spark on hive
[root@master ~]#
cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf
cp $HADOOP_HOME/etc/hadoop/core-site.xml $SPARK_HOME/conf
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $SPARK_HOME/conf