update spark

2026-02-04 06:45:54 +00:00 · 2018-08-08 12:16:24 +08:00
parent c7b252fe79
commit 702f667281
16 changed files with 52 additions and 335 deletions
--- a/spark/2.2.1-yarn/README.md
+++ b/spark/2.2.1-yarn/README.md
@@ -1,93 +1,45 @@
-# Spark on yarn
+# Deploy Spark On Yarn

-## Create a spark cluster in swarm mode
-
-`--hostname` needs 1.13 or higher
+## Client

 ```bash
 docker service create \
--name spark-master \
--hostname spark-master \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
-newnius/spark:2.2.1
+	--name spark-client \
+	--hostname spark-client \
+	--network swarm-net \
+	--replicas 1 \
+	--detach true \
+	newnius/spark:2.2.1-yarn
 ```

-```bash
-docker service create \
--name spark-slave1 \
--hostname spark-slave1 \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
-newnius/spark:2.2.1
-```
+## Validate installation

-```bash
-docker service create \
--name spark-slave2 \
--hostname spark-slave2 \
--detach true \
--network swarm-net \
--replicas 1 \
--endpoint-mode dnsrr \
-newnius/spark:2.2.1
-```
+#### spark-submit PI

-## Init && Test
-
-In the first deploy, format hdfs
-
-### Stop HDFS (in master)
-```bash
-sbin/stop-dfs.sh
-```
-
-### Format HDFS (in master)
-```
-bin/hadoop namenode -format
-```
-
-### Start HDFS (in master)
-```bash
-sbin/start-dfs.sh
-```
-
-### Run Hello World
 ```bash
 spark-submit \
 	--master yarn \
 	--deploy-mode cluster \
 	--class org.apache.spark.examples.JavaSparkPi \
-	./examples/jars/spark-examples_2.11-2.2.1.jar 100
+	./examples/jars/spark-examples*.jar 100
 ```

-### UI
+#### spark-shell HDFS wordcount

-YARN: spark-master:8088
+Enter `spark-shell --master yarn` to enter shell.

-HDFS: spark-master:50070
+```shell
+val lines = sc.textFile("hdfs://hadoop-master:8020/user/root/input")

-_Proxy needed, e.g. [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/)_
+val words = lines.flatMap(_.split("\\s+"))

-## customized config
+val wc = words.map(word => (word, 1)).reduceByKey(_ + _)

-```bash
-docker service create \
--name spark-master \
--hostname spark-master \
--detach=true \
--network swarm-net \
--replicas 1 \
--mount type=bind,source=/mnt/data/spark/hdfs/master,target=/tmp/hadoop-root \
--mount type=bind,source=/mnt/data/spark/logs/master,target=/usr/local/hadoop/logs \
--mount type=bind,source=/mnt/data/spark/config/hadoop,target=/mnt/config/hadoop \
--mount type=bind,source=/mnt/data/spark/config/spark,target=/mnt/config/spark \
--endpoint-mode dnsrr \
-newnius/spark:2.2.1
+wc.collect()
+
+val cnt = words.map(word => 1).reduce(_ + _)
 ```

-You dont't need to put all files in dir, only add files to be replaced.
+## Browse the web UI
+
+In Spark On Yarn mode, the spark jobs will occur in the Yarn web UI.