update spark, add 2.3.1

This commit is contained in:
Newnius 2018-08-07 20:36:19 +08:00
parent 922ddd57d1
commit 8750cbd8c2
3 changed files with 105 additions and 0 deletions

30
spark/2.3.1/Dockerfile Normal file
View File

@ -0,0 +1,30 @@
FROM alpine:3.8
MAINTAINER Newnius <newnius.cn@gmail.com>
USER root
# Prerequisites
RUN apk add --no-cache openssh openssl openjdk8-jre rsync bash procps coreutils
ENV JAVA_HOME /usr/lib/jvm/java-1.8-openjdk
ENV PATH $PATH:$JAVA_HOME/bin
ENV SPARK_VER 2.3.1
RUN wget https://archive.apache.org/dist/spark/spark-$SPARK_VER/spark-$SPARK_VER-bin-hadoop2.7.tgz && \
tar -xvf spark-$SPARK_VER-bin-hadoop2.7.tgz -C /usr/local && \
rm spark-$SPARK_VER-bin-hadoop2.7.tgz
RUN ln -s /usr/local/spark-$SPARK_VER-bin-hadoop2.7 /usr/local/spark
ENV SPARK_HOME /usr/local/spark
ENV PATH $PATH:$SPARK_HOME/bin
ADD bootstrap.sh /etc/bootstrap.sh
WORKDIR /usr/local/spark
ENTRYPOINT ["/etc/bootstrap.sh"]

64
spark/2.3.1/README.md Normal file
View File

@ -0,0 +1,64 @@
# Deploy Spark Cluster of standalone mode
## Master
```bash
docker service create \
--name spark-master \
--hostname spark-master \
--network swarm-net \
--replicas 1 \
--detach true \
--endpoint-mode dnsrr \
newnius/spark:2.3.1 master
```
## Slaves
```bash
docker service create \
--name spark-slave \
--network swarm-net \
--replicas 5 \
--detach true \
--endpoint-mode dnsrr \
newnius/spark:2.3.1 slave spark://spark-master:7077
```
## Validate installation
#### spark-submit PI
```bash
spark-submit \
--master spark://spark-master:7077 \
--deploy-mode cluster \
--class org.apache.spark.examples.JavaSparkPi \
./examples/jars/spark-examples_2.11-2.3.1.jar 100
```
#### spark-shell HDFS wordcount
Enter `spark-shell --master spark://spark-master:7077` to enter shell.
```shell
val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")
val words = lines.flatMap(_.split("\\s+"))
val wc = words.map(word => (word, 1)).reduceByKey(_ + _)
wc.collect()
val cnt = words.map(word => 1).reduce(_ + _)
```
## Browse the web UI
You can expose the ports in the script, but I'd rather not since the slaves shoule occupy the same ports.
To access the web UI, deploy another (socks5) proxy to route the traffic.
If you don't one, try [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/), it is rather easy to use.
Visit [spark-master:8080](http://spark-master:8080) to view the cluster.

11
spark/2.3.1/bootstrap.sh Executable file
View File

@ -0,0 +1,11 @@
#! /bin/bash
if [[ $1 == "master" ]]; then
/usr/local/spark/sbin/start-master.sh
fi
if [[ $1 == "slave" ]]; then
/usr/local/spark/sbin/start-slave.sh $2
fi
while true; do sleep 1000; done