Dockerfiles/hadoop/2.8.1/README.md

116 lines
2.9 KiB
Markdown
Raw Normal View History

2018-08-06 09:14:20 +00:00
# Deploy one Hadoop Cluster with docker
2018-08-06 09:14:20 +00:00
## Start Master
2018-08-06 09:14:20 +00:00
```bash
docker service create \
2018-08-06 09:14:20 +00:00
--name hadoop-master \
--hostname hadoop-master \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.8.1
```
2018-08-06 09:14:20 +00:00
## Start slaves
```bash
docker service create \
2018-08-06 09:14:20 +00:00
--name hadoop-slave1 \
--hostname hadoop-slave1 \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.8.1
```
2018-08-06 09:14:20 +00:00
```bash
docker service create \
2018-08-06 09:14:20 +00:00
--name hadoop-slave2 \
--network swarm-net \
--hostname hadoop-slave2 \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.8.1
```
2018-08-06 09:14:20 +00:00
```bash
docker service create \
2018-08-06 09:14:20 +00:00
--name hadoop-slave3 \
--hostname hadoop-slave3 \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
newnius/hadoop:2.8.1
```
2018-08-06 09:14:20 +00:00
## Init for the first time
2018-08-06 09:14:20 +00:00
#### format dfs first
Run these commands on the master node.
2018-08-06 09:14:20 +00:00
```bash
# stop HDFS services
sbin/stop-dfs.sh
2018-08-06 09:14:20 +00:00
# format HDFS meta data
bin/hadoop namenode -format
2018-08-06 09:14:20 +00:00
# restart HDFS services
sbin/start-dfs.sh
```
2018-08-06 09:14:20 +00:00
## Run a test job
To make sure youui have successfully setup the Hadoop cluster, just run the floowing commands to see if it is executed well.
2018-08-06 09:14:20 +00:00
```bash
# prepare input data
bin/hadoop dfs -mkdir -p /user/root/input
# copy files to input path
bin/hadoop dfs -put etc/hadoop/* /user/root/input
2018-08-06 09:14:20 +00:00
# submit the job
2018-08-08 07:43:11 +00:00
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar grep input output 'dfs[a-z.]+'
2018-08-06 09:14:20 +00:00
```
2018-08-06 09:14:20 +00:00
## Browse the web UI
You can expose the ports in the script, but I'd rather not since the slaves shoule occupy the same ports.
2018-08-06 09:14:20 +00:00
To access the web UI, deploy another (socks5) proxy to route the traffic.
If you don't one, try [newnius/docker-proxy](https://hub.docker.com/r/newnius/docker-proxy/), it is rather easy to use.
2018-08-06 10:43:07 +00:00
Visit [hadoop-master:8088](http://hadoop-master:8088) fo YARN pages.
2018-08-06 10:43:07 +00:00
Visit [hadoop-master:50070](http://hadoop-master:50070) fo YARN pages.
2018-08-06 09:14:20 +00:00
## Custom configuration
2018-08-06 09:14:20 +00:00
To persist data or modify the conf files, refer to the following script.
2018-08-06 09:14:20 +00:00
The `/config/hadoop` path is where new conf files to be replaces, you don't have to put all the files.
```bash
docker service create \
--name hadoop-master \
--hostname hadoop-master \
--network swarm-net \
--replicas 1 \
--detach=true \
--endpoint-mode dnsrr \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/data/hadoop/config,target=/config/hadoop \
--mount type=bind,source=/data/hadoop/hdfs/master,target=/tmp/hadoop-root \
--mount type=bind,source=/data/hadoop/logs/master,target=/usr/local/hadoop/logs \
newnius/hadoop:2.8.1
```