Dockerfiles/spark/2.3.1-yarn/README.md

66 lines
1.4 KiB
Markdown
Raw Permalink Normal View History

2018-08-08 07:33:55 +00:00
# Deploy Spark On Yarn
## Client
```bash
docker service create \
--name spark-client \
--hostname spark-client \
--network swarm-net \
--replicas 1 \
--detach true \
2018-08-08 07:51:43 +00:00
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
2018-08-08 07:33:55 +00:00
newnius/spark:2.3.1-yarn
```
## Validate installation
#### spark-submit PI
```bash
spark-submit \
--master yarn \
--deploy-mode cluster \
--class org.apache.spark.examples.JavaSparkPi \
./examples/jars/spark-examples*.jar 100
```
#### spark-shell HDFS wordcount
Enter `spark-shell --master yarn` to enter shell.
```shell
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")
val words = lines.flatMap(_.split("\\s+"))
val wc = words.map(word => (word, 1)).reduceByKey(_ + _)
wc.collect()
val cnt = words.map(word => 1).reduce(_ + _)
```
## Browse the web UI
In Spark On Yarn mode, the spark jobs will occur in the Yarn web UI.
2018-08-08 07:51:43 +00:00
## Custom configuration
To persist data or modify the conf files, refer to the following script.
The `/config/hadoop` path is where new conf files to be replaces, you don't have to put all the files.
```bash
docker service create \
--name spark-client \
--hostname spark-client \
--network swarm-net \
--replicas 1 \
--detach true \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
newnius/spark:2.3.1-yarn
```