Dockerfiles/spark/2.2.1/README.md

65 lines
1.4 KiB
Markdown
Raw Normal View History

2018-08-07 12:04:47 +00:00
# Deploy Spark Cluster of standalone mode
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
## Master
2018-02-01 11:37:58 +00:00
```bash
docker service create \
2018-08-07 12:04:47 +00:00
--name spark-master \
--hostname spark-master \
--network swarm-net \
--replicas 1 \
--detach true \
--endpoint-mode dnsrr \
newnius/spark:2.2.1 master
2018-02-01 11:37:58 +00:00
```
2018-08-07 12:04:47 +00:00
## Slaves
2018-02-01 11:37:58 +00:00
```bash
docker service create \
2018-08-07 12:04:47 +00:00
--name spark-slave \
--network swarm-net \
--replicas 5 \
--detach true \
--endpoint-mode dnsrr \
newnius/spark:2.2.1 slave spark://spark-master:7077
2018-02-01 11:37:58 +00:00
```
2018-08-07 12:04:47 +00:00
## Validate installation
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
#### spark-submit PI
2018-02-01 11:37:58 +00:00
```bash
spark-submit \
2018-08-07 12:04:47 +00:00
--master spark://spark-master:7077 \
2018-03-24 06:16:03 +00:00
--deploy-mode cluster \
2018-02-01 11:37:58 +00:00
--class org.apache.spark.examples.JavaSparkPi \
./examples/jars/spark-examples_2.11-2.2.1.jar 100
```
2018-08-07 12:04:47 +00:00
#### spark-shell HDFS wordcount
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
Enter `spark-shell --master spark://spark-master:7077` to enter shell.
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
```shell
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
val words = lines.flatMap(_.split("\\s+"))
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
val wc = words.map(word => (word, 1)).reduceByKey(_ + _)
2018-02-01 11:37:58 +00:00
2018-08-07 12:04:47 +00:00
wc.collect()
val cnt = words.map(word => 1).reduce(_ + _)
2018-02-01 11:37:58 +00:00
```
2018-08-07 12:04:47 +00:00
## Browse the web UI
You can expose the ports in the script, but I'd rather not since the slaves shoule occupy the same ports.
To access the web UI, deploy another (socks5) proxy to route the traffic.
If you don't one, try [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/), it is rather easy to use.
Visit [spark-master:8080](http://spark-master:8080) to view the cluster.