Dockerfiles/spark/1.6.0/README.md
2017-04-08 22:27:04 +08:00

83 lines
1.7 KiB
Markdown

# based on sequenceiq/spark
## Create a spark cluster in swarm mode
`--hostname` needs 1.13 or higher
```bash
docker service create \
--name spark-master \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark
```
```bash
docker service create \
--name spark-slave1 \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark
```
```bash
docker service create \
--name spark-slave2 \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
newnius/spark
```
## Init && Test
In the first deploy, format dfs first
### stop cluster (in master)
`sbin/stop-yarn.sh`
`sbin/stop-dfs.sh`
`../spark/sbin/stop-all.sh`
### remove previous data (in all nodes)
clear all data in /tmp in all nodes
### format hdfs (in master)
```
bin/hadoop namenode -format
```
### start cluster (in master)
`sbin/start-dfs.sh`
`sbin/start-yarn.sh`
`../spark/sbin/start-all.sh`
### monitor cluster in browser
YARN: spark-master:8088
HDFS: spark-master:50070
SPARK: spark-master:8080
_Proxy needed, e.g. newnius/docker-proxy_
## customized config
```bash
docker service create \
--name spark-master \
--network swarm-net \
--replicas 1 \
--mount type=bind,source=/mnt/data/spark/hdfs/master,target=/tmp/hadoop-root \
--mount type=bind,source=/mnt/data/spark/logs/master,target=/usr/local/hadoop/logs \
--mount type=bind,source=/mnt/data/spark/config/hadoop,target=/mnt/config/hadoop \
--mount type=bind,source=/mnt/data/spark/config/spark,target=/mnt/config/spark \
--mount type=bind,source=/mnt/data/spark/config/spark-yarn-remote-client,target=/mnt/config/spark-yarn-remote-client \
--endpoint-mode dnsrr \
newnius/spark
```
You dont't need to put all files in dir, only add files needs modified.