Dockerfiles/hadoop/2.7.4/README.md

116 lines
2.9 KiB
Markdown
Raw Normal View History

2018-08-06 08:42:15 +00:00
# Deploy one Hadoop Cluster with docker
2018-08-06 08:42:15 +00:00
## Start Master
2018-08-03 05:07:18 +00:00
```bash
docker service create \
2018-08-06 08:42:15 +00:00
--name hadoop-master \
--hostname hadoop-master \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.7.4
```
2018-08-06 08:42:15 +00:00
## Start slaves
2018-08-03 05:07:18 +00:00
```bash
docker service create \
2018-08-06 08:42:15 +00:00
--name hadoop-slave1 \
--hostname hadoop-slave1 \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.7.4
```
2018-08-03 05:07:18 +00:00
```bash
docker service create \
2018-08-06 08:42:15 +00:00
--name hadoop-slave2 \
--network swarm-net \
--hostname hadoop-slave2 \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.7.4
```
2018-08-03 05:07:18 +00:00
```bash
docker service create \
2018-08-06 08:42:15 +00:00
--name hadoop-slave3 \
--hostname hadoop-slave3 \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.7.4
```
2018-08-06 08:42:15 +00:00
## Init for the first time
2018-08-06 08:42:15 +00:00
#### format dfs first
Run these commands on the master node.
2018-08-03 05:07:18 +00:00
```bash
2018-08-06 08:42:15 +00:00
# stop HDFS services
2018-08-03 05:07:18 +00:00
sbin/stop-dfs.sh
2018-08-06 08:42:15 +00:00
# format HDFS meta data
bin/hadoop namenode -format
2018-08-06 08:42:15 +00:00
# restart HDFS services
2018-08-03 05:07:18 +00:00
sbin/start-dfs.sh
```
2018-08-06 08:42:15 +00:00
## Run a test job
To make sure youui have successfully setup the Hadoop cluster, just run the floowing commands to see if it is executed well.
2018-08-03 05:07:18 +00:00
```bash
# prepare input data
bin/hadoop dfs -mkdir -p /user/root/input
2018-08-03 05:07:18 +00:00
2018-08-06 08:42:15 +00:00
# copy files to input path
bin/hadoop dfs -put etc/hadoop/* /user/root/input
2018-08-03 05:07:18 +00:00
2018-08-06 08:42:15 +00:00
# submit the job
2018-08-08 07:43:11 +00:00
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar grep input output 'dfs[a-z.]+'
```
2018-08-06 08:42:15 +00:00
## Browse the web UI
You can expose the ports in the script, but I'd rather not since the slaves shoule occupy the same ports.
To access the web UI, deploy another (socks5) proxy to route the traffic.
If you don't one, try [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/), it is rather easy to use.
2018-08-06 10:43:07 +00:00
Visit [hadoop-master:8088](http://hadoop-master:8088) fo YARN pages.
2018-08-06 08:42:15 +00:00
2018-08-06 10:43:07 +00:00
Visit [hadoop-master:50070](http://hadoop-master:50070) fo YARN pages.
2018-08-06 08:42:15 +00:00
## Custom configuration
2018-08-06 08:42:15 +00:00
To persist data or modify the conf files, refer to the following script.
2018-08-06 08:42:15 +00:00
The `/config/hadoop` path is where new conf files to be replaces, you don't have to put all the files.
```bash
docker service create \
--name hadoop-master \
--hostname hadoop-master \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
--mount type=bind,source=/data/hadoop/hdfs/master,target=/tmp/hadoop-root \
--mount type=bind,source=/data/hadoop/logs/master,target=/usr/local/hadoop/logs \
newnius/hadoop:2.7.4
```