HDFS configure with Zookeeper (02)

Construct the platform for Big Data project

Configure Zookeeper

  • on each server, configure the java environment
    1
    2
    3
    4
    5
    $ tar -zxvf zookeeper-3.4.6.tar.gz
    $ mv zookeeper-3.4.6 zookeeper
    $ cd zookeeper/conf
    $ cp zoo_sample.cfg zoo.cfg
    $ vi zoo.cfg
Server role example-data01(namenode1) example-data02(namenode2) example-data03(datanode1) example-data04(datanode2)
NameNode YES YES NO NO
DataNode NO NO YES YES
JournalNode YES YES YES NO
ZooKeeper YES YES YES NO
ZKFC YES YES NO NO
  • 8080 => example-data01:50070
  • 8000 => example-data02:50070

Hadoop configure

  • hosts

    1
    2
    3
    4
    *.*.*.*  example-data01
    *.*.*.* example-data02
    *.*.*.* example-data03
    *.*.*.* example-data04
  • $ vi ~/.bashrc

    1
    2
    3
    4
    5
    # vim ~/.bashrc
    export ZOO_HOME=/home/admin/zookeeper
    export ZOO_LOG_DIR=/home/admin/zookeeper/logs
    export PATH=$PATH:$ZOO_HOME/bin
    # source ~/.bashrc
  • $ mkdir -p /data/hadoop/zookeeper/{data,logs}

  • $ cp ~/zookeeper/conf/zoo_sample.cfg ~/zookeeper/conf/zoo.cfg
  • $ vim zoo.cfg

    1
    2
    3
    4
    5
    6
    7
    8
    9
    # vim /home/admin/zookeeper/conf/zoo.cfg
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/home/admin/zookeeper/data
    clientPort=2181
    server.1=example-data01:2888:3888
    server.2=example-data02:2888:3888
    server.3=example-data03:2888:3888
  • $ create myid in zkData file -> echo 1 > myid

  • $ scp -r zookeeper/* admin@example-data02:~ -> echo 2 > myid
  • $ scp -r zookeeper/* admin@example-data03:~ -> echo 3 > myid
  • $ cp ~/zookeeper to each server ( example-data02, example-data03)
  • on example-data01
    • check status by using
    • $ ./bin/zkCli.sh -server example-data01:2181
    • $ or ./bin/zkCli.sh -server example-data02:2181
    • exit by using “quit”
  • on each node to perform
    • ./bin/zkServer.sh start
    • ./bin/zkServer.sh status

Configure

- example-data01(Namenode) example-data02(Sec-namenode) example-data03(datanode)  example-data04 (datanode)

On example-data01

  • Step 1:

    • vim ~/.bashrc ( then copy the ~/.bashrc to example-data02( namenode) & example-data03 (datanode)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# java configure
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

# hadoop configure
export HADOOP_HOME=/home/admin/hadoop
export PATH=$HADOOP_HOME/bin:$PATH

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

#zookeeper configure

export ZOO_HOME=/home/admin/zookeeper
export PATH=$PATH:$ZOO_HOME/bin
export ZOO_LOG_DIR=/home/admin/zookeeper/logs

#
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_PID_DIR=/home/admin/hadoop/pids
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME

export PATH=$HADOOP_HOME/sbin:$PATH
  • Step 2:
1
2
3
4
5
6
$ mkdir -p /home/admin/hadoop/{pids,storage}
$ mkdir -p /home/admin/hadoop/storage/{hdfs,tmp,journal}
$ mkdir -p /home/admin/hadoop/storage/hdfs/{name,data}
in example-data03 & example-data04
$ mkdir -p /datadriver/hdfs/data
$ sudo chown admin /datadriver/hdfs/data/
  • Step 3:

configure the core-site.xml, hadoop-env.sh, hdfs-site.xml

  • configure core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/admin/hadoop/storage/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>example-data01:2181,example-data02:2181,example-data03:2181</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property>
</configuration>
configure hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<property>
<name>dfs.ha.namenodes.masters</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/admin/hadoop/storage/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
// <value>file:/home/admin/hadoop/storage/hdfs/data</value>
<value>file:///datadriver/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.rpc-address.masters.nn1</name>
<value>example-data01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.masters.nn1</name>
<value>example-data01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.masters.nn2</name>
<value>example-data02:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.masters.nn2</name>
<value>example-data02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://example-data01:8485;example-data02:8485;example-data03:8485/masters</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/admin/hadoop/storage/journal</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence(hdfs)
shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/admin/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
</configuration>
  • configure hadoop-env.sh
1
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
  • configure slaves
1
2
example-data03
example-data04
  • Step 4:

scp hadoop to each node ( example-data02, example-data03, example-data04)

  • Step 5:

Setup the HDFS Cluster with ZooKeeper Using Demo (example)

  • on the namenode1 (example-data01)

    1
    $ hdfs zkfc -formatZK
  • on each zookeeper node

    1
    2
    $ hadoop-daemon.sh start journalnode
    // $ ./sbin/hadoop-daemons.sh --hostnames 'example-data01 example-data02 example-data03' start journalnode // this command is used to start the journalnode of zookeeper
  • on the namenode1 (example-data01)

    1
    $ hdfs namenode -format
  • on the namenode1 (example-data01)

    1
    $ ./sbin/hadoop-daemon.sh start namenode
  • on namenode2 (example-data02)

    1
    $ hdfs namenode -bootstrapStandby
  • on namenode2 (example-data02)

    1
    $ ./sbin/hadoop-daemon.sh start namenode
  • on the two namenode (namenode1, namenode2)

    1
    $ ./sbin/hadoop-daemon.sh start zkfc
  • on the all datanode (datanode1, datanode2)

    1
    $ ./sbin/hadoop-daemon.sh start datanode
  • still now, we can see the following name on namenode1 (example-data01)

    QuorumPeerMain, NameNode, DFSZKFailoverController(basically), JournalNode

  • Step 6: Testing the HDFS’s function
    -

    1
    2
    3
    4
    5
    6
    #  login on web
    example-data01: namenode (active) => example-data01:9000
    website: example-hadoop.cloudapp.net:8080
    example-data02: namenode (standby) => example-data02:9000
    website: example-hadoop.cloudapp.net:8000
    # $ hadoop-daemon.sh start namenode
  • Step 7: Import , excute on namenode1 (example-data01)

    • Before Starting HDFS , Formating the zookeeper

      1
      2
      3
      4
      $ hdfs zkfc -formatZK
      Start the HDFS --> $ cd /home/admin/hadoop && sbin/start-dfs.sh
      Stop the HDFS --> $ cd /home/admin/hadoop && sbin/stop-dfs.sh
      $ start-dfs.sh
    • check the status on two namenode

      1
      2
      $ hdfs haadmin -getServiceState nn1 --> active
      $ hdfs haadmin -getServiceState nn2 --> standby
    • when the stop-dfs.sh turn to error

      1
      2
      $ vi ~/.bashrc
      export HADOOP_PID_DIR=/home/hadoop/pids
  • hdfs dfsadmin -report

Spark on Yarn Reference

configure additional two file
yarn-site.xml & maperd-site.xml