Dockerfiles/spark/2.2.1-yarn/README.md

66 lines
1.4 KiB
Markdown
Raw Permalink Normal View History

2018-08-08 04:16:24 +00:00
# Deploy Spark On Yarn
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
## Client
2018-08-07 10:04:59 +00:00
```bash
docker service create \
2018-08-08 04:16:24 +00:00
--name spark-client \
--hostname spark-client \
--network swarm-net \
--replicas 1 \
--detach true \
2018-08-08 07:51:43 +00:00
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
2018-08-08 04:16:24 +00:00
newnius/spark:2.2.1-yarn
2018-08-07 10:04:59 +00:00
```
2018-08-08 04:16:24 +00:00
## Validate installation
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
#### spark-submit PI
2018-08-07 10:04:59 +00:00
```bash
spark-submit \
--master yarn \
--deploy-mode cluster \
--class org.apache.spark.examples.JavaSparkPi \
2018-08-08 04:16:24 +00:00
./examples/jars/spark-examples*.jar 100
2018-08-07 10:04:59 +00:00
```
2018-08-08 04:16:24 +00:00
#### spark-shell HDFS wordcount
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
Enter `spark-shell --master yarn` to enter shell.
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
```shell
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
val words = lines.flatMap(_.split("\\s+"))
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
val wc = words.map(word => (word, 1)).reduceByKey(_ + _)
2018-08-07 10:04:59 +00:00
2018-08-08 04:16:24 +00:00
wc.collect()
val cnt = words.map(word => 1).reduce(_ + _)
2018-08-07 10:04:59 +00:00
```
2018-08-08 04:16:24 +00:00
## Browse the web UI
In Spark On Yarn mode, the spark jobs will occur in the Yarn web UI.
2018-08-08 07:51:43 +00:00
## Custom configuration
To persist data or modify the conf files, refer to the following script.
The `/config/hadoop` path is where new conf files to be replaces, you don't have to put all the files.
```bash
docker service create \
--name spark-client \
--hostname spark-client \
--network swarm-net \
--replicas 1 \
--detach true \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
newnius/spark:2.2.1-yarn
```