# Deploy Spark On Yarn

## Client

```bash
docker service create \
	--name spark-client \
	--hostname spark-client \
	--network swarm-net \
	--replicas 1 \
	--detach true \
  --mount type=bind,source=/etc/localtime,target=/etc/localtime \
	newnius/spark:2.2.1-yarn
```

## Validate installation

#### spark-submit PI

```bash
spark-submit \
	--master yarn \
	--deploy-mode cluster \
	--class org.apache.spark.examples.JavaSparkPi \
	./examples/jars/spark-examples*.jar 100
```

#### spark-shell HDFS wordcount

Enter `spark-shell --master yarn` to enter shell.

```shell
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")

val words = lines.flatMap(_.split("\\s+"))

val wc = words.map(word => (word, 1)).reduceByKey(_ + _)

wc.collect()

val cnt = words.map(word => 1).reduce(_ + _)
```

## Browse the web UI

In Spark On Yarn mode, the spark jobs will occur in the Yarn web UI.

## Custom configuration

To persist data or modify the conf files, refer to the following script.

The `/config/hadoop` path is where new conf files to be replaces, you don't have to put all the files.


```bash
docker service create \
	--name spark-client \
	--hostname spark-client \
	--network swarm-net \
	--replicas 1 \
	--detach true \
  --mount type=bind,source=/etc/localtime,target=/etc/localtime \
  --mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
	newnius/spark:2.2.1-yarn
```