Configuring Hadoop on ubuntu
Recommended Number of hosts
1 for Namenode
1 for Jobtracker & Secondary NameNode
3 for Datanodes & Task tracker
For the purpose of this i would use the following hosts names
create /etc/apt/sources.d/cdh3.list file and add the repo info
deb http://archive.cloudera.com/debian lucid-cdh3u3 contrib
run
sudo apt-get update
You would also need to install sun-java6-jdk, sun-java6-jre & sun-java6-jvm
on namenode.example.com run
sudo apt-get install hadoop-namenode
on jobtracker.example.com
sudo apt-get install hadoop-jobtracker, hadoop-secondarynamenode
on slave{1,2,3}.example.com
sudo apt-get install hadoop-datanode hadoop-tasktracker
Configurations for namenode & jobtracker
$ cat /usr/lib/hadoop/conf/core-site.xml
Property Name :fs.default.name Property Value : hdfs://namenode.example.com/
/usr/lib/hadoop/conf/hdfs-site.xml
Property Name :dfs.data.dir
Property Value : /grid/g1/hadoop-data/hadoop-${user.name}
Property Name :dfs.name.dir
Property Value :/grid/g1/grid-image1, /grid/g1/grid-image2
$ cat /usr/lib/hadoop/conf/mapred-site.xml
Property Name :mapred.job.tracker
Property Value : jobtracker.example.com:8021
Configurations for datanode and tasktracker
Recommended Number of hosts
1 for Namenode
1 for Jobtracker & Secondary NameNode
3 for Datanodes & Task tracker
For the purpose of this i would use the following hosts names
- namenode.example.com
- jobtracker.example.com
- slave1.example.com,slave2.example.com,slave3.example.com,
create /etc/apt/sources.d/cdh3.list file and add the repo info
deb http://archive.cloudera.com/debian lucid-cdh3u3 contrib
run
sudo apt-get update
You would also need to install sun-java6-jdk, sun-java6-jre & sun-java6-jvm
on namenode.example.com run
sudo apt-get install hadoop-namenode
on jobtracker.example.com
sudo apt-get install hadoop-jobtracker, hadoop-secondarynamenode
on slave{1,2,3}.example.com
sudo apt-get install hadoop-datanode hadoop-tasktracker
Configurations for namenode & jobtracker
$ cat /usr/lib/hadoop/conf/core-site.xml
Property Name :
/usr/lib/hadoop/conf/hdfs-site.xml
Property Name :
Property Name :
Property Value :
$ cat /usr/lib/hadoop/conf/mapred-site.xml
Property Name :
$cat /usr/lib/hadoop/conf/hdfs-site.xml
Property Name : dfs.data.dir
Property Value : /grid/g1/hadoop-data/hadoop-${user.name} (Add extra disk locations)
$cat /usr/lib/hadoop/conf/mapred-site.xml
Property Name : mapred.job.tracker
Property Value : jobtracker.example.com
Property Name : mapred.tasktracker.map.tasks.maximum
Property Value :$number_of_maps depending on your host config
Property Name : mapred.tasktracker.reducer.tasks.maximum
Property Value : $number_of_maps depending on your host config
Config for secondary namenode to be in hdfs-site.xml
Property Name : dfs.secondary.http.address
Property Value : jobtracker.example.com:50090
Property Name : dfs.http.address
Property Value : namenode.example.com:50070
Property Name : fs.checkpoint.dir
Property Value : /grid/g1/checkpoint,/grid/g2/checkpoint
Property Name : fs.checkpoint.edits.dir
Property Value : /grid/g1/checkpoint/edits,/grid/g2/checkpoint/edits
General Commands & Steps
On namenode
mkdir -p /grid/g1/hadoop-data/; sudo chown hdfs /grid/g1/hadoop-data
sudo mkdir -p /grid/g1/grid-image1 /grid/g1/grid-image2
sudo chown -R hdfs /grid/g1/grid-image1 /grid/g1/grid-image2
sudo -u hdfs hadoop namenode -format
sudo /etc/init.d/hadoop-namenode start
sudo -u hdfs hadoop fs -mkdir /user/mapred
sudo -u hdfs hadoop fs -mkdir /user/hdfs
sudo -u hdfs hadoop fs -chown hdfs /user/hdfssudo -u hdfs hadoop fs -chown mapred /user/mapred
on jobtracker
sudo mkdir -p /grid/g1/checkpoint/edits /grid/g2/checkpoint/edits
sudo chown -R hdfs /grid/g1/checkpoint/edits /grid/g2/checkpoint/edits
sudo /etc/init.d/hadoop-*jobtracker start
sudo /etc/init.d/hadoop*secondarynamenode start
on slave nodes
sudo mkdir -p /grid/g1/hadoop-data/
sudo chown hdfs -R /grid/g1/hadoop-data
sudo /etc/init.d/hadoop*datanode start
sudo /etc/init.d/hadoop-tasktracker start
* Please check the logs to confirm if the host has come up
you should not be able to check the UI
http://namenode.example.com:50070
http://jobtracker.example.com:50030
http://jobtracker.example.com:50090 for secondary namenode