Dockerfiles/spark/2.2.1-yarn
2018-08-08 15:51:43 +08:00
..
config update spark 2018-08-08 15:51:43 +08:00
bootstrap.sh update spark 2018-08-08 12:16:24 +08:00
Dockerfile update spark 2018-08-08 12:16:24 +08:00
README.md update spark 2018-08-08 15:51:43 +08:00

Deploy Spark On Yarn

Client

docker service create \
	--name spark-client \
	--hostname spark-client \
	--network swarm-net \
	--replicas 1 \
	--detach true \
  --mount type=bind,source=/etc/localtime,target=/etc/localtime \
	newnius/spark:2.2.1-yarn

Validate installation

spark-submit PI

spark-submit \
	--master yarn \
	--deploy-mode cluster \
	--class org.apache.spark.examples.JavaSparkPi \
	./examples/jars/spark-examples*.jar 100

spark-shell HDFS wordcount

Enter spark-shell --master yarn to enter shell.

val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")

val words = lines.flatMap(_.split("\\s+"))

val wc = words.map(word => (word, 1)).reduceByKey(_ + _)

wc.collect()

val cnt = words.map(word => 1).reduce(_ + _)

Browse the web UI

In Spark On Yarn mode, the spark jobs will occur in the Yarn web UI.

Custom configuration

To persist data or modify the conf files, refer to the following script.

The /config/hadoop path is where new conf files to be replaces, you don't have to put all the files.

docker service create \
	--name spark-client \
	--hostname spark-client \
	--network swarm-net \
	--replicas 1 \
	--detach true \
  --mount type=bind,source=/etc/localtime,target=/etc/localtime \
  --mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
	newnius/spark:2.2.1-yarn