mirror of
https://github.com/newnius/Dockerfiles.git
synced 2025-06-07 16:41:55 +00:00
.. | ||
bootstrap.sh | ||
core-site.xml | ||
Dockerfile | ||
hdfs-site.xml | ||
mapred-site.xml | ||
README.md | ||
slaves | ||
yarn-site.xml |
Spark on yarn
Create a spark cluster in swarm mode
--hostname
needs 1.13 or higher
docker service create \
--name spark-master \
--hostname spark-master \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
docker service create \
--name spark-slave1 \
--hostname spark-slave1 \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
docker service create \
--name spark-slave2 \
--hostname spark-slave2 \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
Init && Test
In the first deploy, format hdfs
Stop HDFS (in master)
sbin/stop-dfs.sh
Format HDFS (in master)
bin/hadoop namenode -format
Start HDFS (in master)
sbin/start-dfs.sh
Run Hello World
spark-submit \
--master yarn-cluster \
--class org.apache.spark.examples.JavaSparkPi \
./examples/jars/spark-examples_2.11-2.2.1.jar 100
UI
YARN: spark-master:8088
HDFS: spark-master:50070
Proxy needed, e.g. newnius/docker-proxy
customized config
docker service create \
--name spark-master \
--hostname spark-master \
--detach true \
--network swarm-net \
--replicas 1 \
--mount type=bind,source=/mnt/data/spark/hdfs/master,target=/tmp/hadoop-root \
--mount type=bind,source=/mnt/data/spark/logs/master,target=/usr/local/hadoop/logs \
--mount type=bind,source=/mnt/data/spark/config/hadoop,target=/mnt/config/hadoop \
--mount type=bind,source=/mnt/data/spark/config/spark,target=/mnt/config/spark \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
You dont't need to put all files in dir, only add files to be replaced.