update spark

This commit is contained in:
2018-08-07 20:04:47 +08:00
parent 0582327307
commit 94228e4944
13 changed files with 98 additions and 125 deletions

View File

@@ -1,93 +1,64 @@
# Spark on yarn
# Deploy Spark Cluster of standalone mode
## Create a spark cluster in swarm mode
`--hostname` needs 1.13 or higher
## Master
```bash
docker service create \
--name spark-master \
--hostname spark-master \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
--name spark-master \
--hostname spark-master \
--network swarm-net \
--replicas 1 \
--detach true \
--endpoint-mode dnsrr \
newnius/spark:2.2.1 master
```
## Slaves
```bash
docker service create \
--name spark-slave1 \
--hostname spark-slave1 \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
--name spark-slave \
--network swarm-net \
--replicas 5 \
--detach true \
--endpoint-mode dnsrr \
newnius/spark:2.2.1 slave spark://spark-master:7077
```
```bash
docker service create \
--name spark-slave2 \
--hostname spark-slave2 \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
```
## Validate installation
## Init && Test
#### spark-submit PI
In the first deploy, format hdfs
### Stop HDFS (in master)
```bash
sbin/stop-dfs.sh
```
### Format HDFS (in master)
```
bin/hadoop namenode -format
```
### Start HDFS (in master)
```bash
sbin/start-dfs.sh
```
### Run Hello World
```bash
spark-submit \
--master yarn \
--master spark://spark-master:7077 \
--deploy-mode cluster \
--class org.apache.spark.examples.JavaSparkPi \
./examples/jars/spark-examples_2.11-2.2.1.jar 100
```
### UI
#### spark-shell HDFS wordcount
YARN: spark-master:8088
Enter `spark-shell --master spark://spark-master:7077` to enter shell.
HDFS: spark-master:50070
```shell
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")
_Proxy needed, e.g. [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/)_
val words = lines.flatMap(_.split("\\s+"))
## customized config
val wc = words.map(word => (word, 1)).reduceByKey(_ + _)
```bash
docker service create \
--name spark-master \
--hostname spark-master \
--detach=true \
--network swarm-net \
--replicas 1 \
--mount type=bind,source=/mnt/data/spark/hdfs/master,target=/tmp/hadoop-root \
--mount type=bind,source=/mnt/data/spark/logs/master,target=/usr/local/hadoop/logs \
--mount type=bind,source=/mnt/data/spark/config/hadoop,target=/mnt/config/hadoop \
--mount type=bind,source=/mnt/data/spark/config/spark,target=/mnt/config/spark \
--endpoint-mode dnsrr \
newnius/spark:2.2.1
wc.collect()
val cnt = words.map(word => 1).reduce(_ + _)
```
You dont't need to put all files in dir, only add files to be replaced.
## Browse the web UI
You can expose the ports in the script, but I'd rather not since the slaves shoule occupy the same ports.
To access the web UI, deploy another (socks5) proxy to route the traffic.
If you don't one, try [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/), it is rather easy to use.
Visit [spark-master:8080](http://spark-master:8080) to view the cluster.