mirror of
https://github.com/newnius/Dockerfiles.git
synced 2025-06-06 08:11:54 +00:00
.. | ||
config | ||
bootstrap.sh | ||
Dockerfile | ||
README.md |
Deploy Spark On Yarn
Client
docker service create \
--name spark-client \
--hostname spark-client \
--network swarm-net \
--replicas 1 \
--detach true \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/spark:2.2.1-yarn
Validate installation
spark-submit PI
spark-submit \
--master yarn \
--deploy-mode cluster \
--class org.apache.spark.examples.JavaSparkPi \
./examples/jars/spark-examples*.jar 100
spark-shell HDFS wordcount
Enter spark-shell --master yarn
to enter shell.
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")
val words = lines.flatMap(_.split("\\s+"))
val wc = words.map(word => (word, 1)).reduceByKey(_ + _)
wc.collect()
val cnt = words.map(word => 1).reduce(_ + _)
Browse the web UI
In Spark On Yarn mode, the spark jobs will occur in the Yarn web UI.
Custom configuration
To persist data or modify the conf files, refer to the following script.
The /config/hadoop
path is where new conf files to be replaces, you don't have to put all the files.
docker service create \
--name spark-client \
--hostname spark-client \
--network swarm-net \
--replicas 1 \
--detach true \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
newnius/spark:2.2.1-yarn