Spark on K8S 的几种模式
1. Standalone:在 K8S 启动一个长期运行的集群,所有 Job 都通过 spark-submit 向这个集群提交
2. Kubernetes Native:通过 spark-submit 直接向 K8S 的 API Server 提交,申请到资源后启动 Pod 做为 Driver 和 Executor 执行 Job,参考 http://spark.apache.org/docs/2.4.6/running-on-kubernetes.html
3.Spark Operator:安装 Spark Operator,然后定义 spark-app.yaml,再执行 kubectl apply -f spark-app.yaml,这种申明式 API 和调用方式是 K8S 的典型应用方式,参考 https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
下载spark包
wget https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop2.7.tgz
tar -zxvf spark-3.2.0-bin-hadoop2.7.tgz
构建镜像:
共三个镜像, java/scala共一个、 python_image、 R_image
./bin/docker-image-tool.sh -t 3.2 build
./bin/docker-image-tool.sh -t 3.2 -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
提交任务
使用root@anneng-730xd
cd /ceph_disk1/software/spark-3.2.0-bin-hadoop2.7
./bin/spark-submit \
--master k8s://https://192.168.1.2:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark:3.2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar
此处local:// 后的地址为容器内路径
bin/spark-submit \
--master k8s://https://192.168.1.2:6443 \
--deploy-mode cluster \
--name spark-test \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.container.image=spark-py:3.2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/src/main/python/wordcount.py \
/opt/spark/examples/src/main/python/wordcount.py