Construct the platform for Big Data project
Configure Zookeeper
- on each server, configure the java environment
1
2
3
4
5$ tar -zxvf zookeeper-3.4.6.tar.gz
$ mv zookeeper-3.4.6 zookeeper
$ cd zookeeper/conf
$ cp zoo_sample.cfg zoo.cfg
$ vi zoo.cfg
Server role | example-data01(namenode1) | example-data02(namenode2) | example-data03(datanode1) | example-data04(datanode2) |
---|---|---|---|---|
NameNode | YES | YES | NO | NO |
DataNode | NO | NO | YES | YES |
JournalNode | YES | YES | YES | NO |
ZooKeeper | YES | YES | YES | NO |
ZKFC | YES | YES | NO | NO |
- 8080 => example-data01:50070
- 8000 => example-data02:50070
Hadoop configure
hosts
1
2
3
4*.*.*.* example-data01
*.*.*.* example-data02
*.*.*.* example-data03
*.*.*.* example-data04$ vi ~/.bashrc
1
2
3
4
5# vim ~/.bashrc
export ZOO_HOME=/home/admin/zookeeper
export ZOO_LOG_DIR=/home/admin/zookeeper/logs
export PATH=$PATH:$ZOO_HOME/bin
# source ~/.bashrc$ mkdir -p /data/hadoop/zookeeper/{data,logs}
- $ cp ~/zookeeper/conf/zoo_sample.cfg ~/zookeeper/conf/zoo.cfg
$ vim zoo.cfg
1
2
3
4
5
6
7
8
9# vim /home/admin/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/admin/zookeeper/data
clientPort=2181
server.1=example-data01:2888:3888
server.2=example-data02:2888:3888
server.3=example-data03:2888:3888$ create myid in zkData file -> echo 1 > myid
- $ scp -r zookeeper/* admin@example-data02:~ -> echo 2 > myid
- $ scp -r zookeeper/* admin@example-data03:~ -> echo 3 > myid
- $ cp ~/zookeeper to each server ( example-data02, example-data03)
- on example-data01
- check status by using
- $ ./bin/zkCli.sh -server example-data01:2181
- $ or ./bin/zkCli.sh -server example-data02:2181
- exit by using “quit”
- on each node to perform
- ./bin/zkServer.sh start
- ./bin/zkServer.sh status
Configure
- example-data01(Namenode) example-data02(Sec-namenode) example-data03(datanode) example-data04 (datanode)
On example-data01
Step 1:
- vim ~/.bashrc ( then copy the ~/.bashrc to example-data02( namenode) & example-data03 (datanode)
1 | # java configure |
- Step 2:
1 | $ mkdir -p /home/admin/hadoop/{pids,storage} |
- Step 3:
configure the core-site.xml, hadoop-env.sh, hdfs-site.xml
- configure core-site.xml
1 | <configuration> |
- configure hadoop-env.sh
1 | export JAVA_HOME=/usr/lib/jvm/java-8-oracle |
- configure slaves
1 | example-data03 |
- Step 4:
scp hadoop to each node ( example-data02, example-data03, example-data04)
- Step 5:
Setup the HDFS Cluster with ZooKeeper Using Demo (example)
on the namenode1 (example-data01)
1
$ hdfs zkfc -formatZK
on each zookeeper node
1
2$ hadoop-daemon.sh start journalnode
// $ ./sbin/hadoop-daemons.sh --hostnames 'example-data01 example-data02 example-data03' start journalnode // this command is used to start the journalnode of zookeeperon the namenode1 (example-data01)
1
$ hdfs namenode -format
on the namenode1 (example-data01)
1
$ ./sbin/hadoop-daemon.sh start namenode
on namenode2 (example-data02)
1
$ hdfs namenode -bootstrapStandby
on namenode2 (example-data02)
1
$ ./sbin/hadoop-daemon.sh start namenode
on the two namenode (namenode1, namenode2)
1
$ ./sbin/hadoop-daemon.sh start zkfc
on the all datanode (datanode1, datanode2)
1
$ ./sbin/hadoop-daemon.sh start datanode
still now, we can see the following name on namenode1 (example-data01)
QuorumPeerMain, NameNode, DFSZKFailoverController(basically), JournalNode
Step 6: Testing the HDFS’s function
-1
2
3
4
5
6# login on web
example-data01: namenode (active) => example-data01:9000
website: example-hadoop.cloudapp.net:8080
example-data02: namenode (standby) => example-data02:9000
website: example-hadoop.cloudapp.net:8000
# $ hadoop-daemon.sh start namenodeStep 7: Import , excute on namenode1 (example-data01)
Before Starting HDFS , Formating the zookeeper
1
2
3
4$ hdfs zkfc -formatZK
Start the HDFS --> $ cd /home/admin/hadoop && sbin/start-dfs.sh
Stop the HDFS --> $ cd /home/admin/hadoop && sbin/stop-dfs.sh
$ start-dfs.shcheck the status on two namenode
1
2$ hdfs haadmin -getServiceState nn1 --> active
$ hdfs haadmin -getServiceState nn2 --> standbywhen the stop-dfs.sh turn to error
1
2$ vi ~/.bashrc
export HADOOP_PID_DIR=/home/hadoop/pids
hdfs dfsadmin -report
Spark on Yarn Reference
configure additional two file
yarn-site.xml & maperd-site.xml