Coders Blog

Tuesday, January 29, 2013

Configuring Distributed Hadoop Cluster on ubuntu 10.04

Configuring Hadoop on ubuntu

Recommended Number of hosts

1 for Namenode
1 for Jobtracker & Secondary NameNode
3 for Datanodes & Task tracker

For the purpose of this i would use the following hosts names

namenode.example.com
jobtracker.example.com
slave1.example.com,slave2.example.com,slave3.example.com,

On All the hosts that you have for the cluster do the following

create /etc/apt/sources.d/cdh3.list file and add the repo info

deb http://archive.cloudera.com/debian lucid-cdh3u3 contrib
run

sudo apt-get update

You would also need to install sun-java6-jdk, sun-java6-jre & sun-java6-jvm

on namenode.example.com run

sudo apt-get install hadoop-namenode

on jobtracker.example.com

sudo apt-get install hadoop-jobtracker, hadoop-secondarynamenode

on slave{1,2,3}.example.com

sudo apt-get install hadoop-datanode hadoop-tasktracker

Configurations for namenode & jobtracker

$ cat /usr/lib/hadoop/conf/core-site.xml

Property Name : fs.default.nameProperty Value : hdfs://namenode.example.com/

/usr/lib/hadoop/conf/hdfs-site.xml

Property Name : dfs.data.dir
Property Value : /grid/g1/hadoop-data/hadoop-${user.name}

Property Name : dfs.name.dir
Property Value : /grid/g1/grid-image1, /grid/g1/grid-image2

$ cat /usr/lib/hadoop/conf/mapred-site.xml

Property Name : mapred.job.tracker
Property Value : jobtracker.example.com:8021

Configurations for datanode and tasktracker

$cat /usr/lib/hadoop/conf/hdfs-site.xml

Property Name : dfs.data.dir
Property Value : /grid/g1/hadoop-data/hadoop-${user.name} (Add extra disk locations)

$cat /usr/lib/hadoop/conf/mapred-site.xml

Property Name : mapred.job.tracker
Property Value : jobtracker.example.com

Property Name : mapred.tasktracker.map.tasks.maximum
Property Value :$number_of_maps depending on your host config

Property Name : mapred.tasktracker.reducer.tasks.maximum
Property Value : $number_of_maps depending on your host config

Config for secondary namenode to be in hdfs-site.xml

Property Name : dfs.secondary.http.address
Property Value : jobtracker.example.com:50090

Property Name : dfs.http.address
Property Value : namenode.example.com:50070

Property Name : fs.checkpoint.dir
Property Value : /grid/g1/checkpoint,/grid/g2/checkpoint

Property Name : fs.checkpoint.edits.dir
Property Value : /grid/g1/checkpoint/edits,/grid/g2/checkpoint/edits

General Commands & Steps

On namenode

mkdir -p /grid/g1/hadoop-data/; sudo chown hdfs /grid/g1/hadoop-data
sudo mkdir -p /grid/g1/grid-image1 /grid/g1/grid-image2
sudo chown -R hdfs /grid/g1/grid-image1 /grid/g1/grid-image2
sudo -u hdfs hadoop namenode -format
sudo /etc/init.d/hadoop-namenode start

sudo -u hdfs hadoop fs -mkdir /user/mapred
sudo -u hdfs hadoop fs -mkdir /user/hdfs

sudo -u hdfs hadoop fs -chown hdfs /user/hdfssudo -u hdfs hadoop fs -chown mapred /user/mapred

on jobtracker

sudo mkdir -p /grid/g1/checkpoint/edits /grid/g2/checkpoint/edits
sudo chown -R hdfs /grid/g1/checkpoint/edits /grid/g2/checkpoint/edits
sudo /etc/init.d/hadoop-*jobtracker start
sudo /etc/init.d/hadoop*secondarynamenode start

on slave nodes

sudo mkdir -p /grid/g1/hadoop-data/
sudo chown hdfs -R /grid/g1/hadoop-data
sudo /etc/init.d/hadoop*datanode start
sudo /etc/init.d/hadoop-tasktracker start

* Please check the logs to confirm if the host has come up

you should not be able to check the UI

http://namenode.example.com:50070
http://jobtracker.example.com:50030
http://jobtracker.example.com:50090 for secondary namenode

Wednesday, June 30, 2010

How big internet really is: A day in the internet

?ui=2&view=att&th=125717a7042eb7df&attid=0.1&disp=attd&realattid=ii_125717a7042eb7df&zw

Wednesday, April 14, 2010

Fwd: Amazing

So many famous people in the same painting

--
"To bug is human, to debug is divine".
--
Phone: +91 99162 93063
IM/e-mail: jabirahmed@yahoo.com , jabirahmed@gmail.com

Monday, February 15, 2010

Is language really important

I have tried my hands on most of the languages to check if one is better than the other. I rather found all of them to be useful in their own way. Its important for the developer to choose the right language/tools to make the application or program better.

For example one could/should use c/c++ for system level stuff, writing device drivers etc . But if a developer wants to write a web application with C/C++ then that is going to be a nightmare.

Languages like PERL/ruby/python have uses mostly in the *NIX (Linux/Unix) world where the developer or the system admin uses them to do stuff that could save a lot of human labor.

PHP has been in news for a while now.. and it seems to be a nice language for which it was initially developed .. Its a good language for small/medium size web apps .. but the number of extensions and modules are lesser as compared to cpan for PERL

My all time favorite has been PERL . which has the ability to do both system and web programming. and its highly customizable to make it efficient and fast.

LAMP - Linux apache Mysql and Perl/PHP has been the best combination on the internet and a lot of successful website still serve thousands of pages using them.

so the point is unless you are really writing something that is going to change the world don't really bother too much on things like ruby (vs) perl (vs) php (vs) python.. each can solve you problem in the way you want it to.

http://news.cnet.com/8301-13505_3-10453213-16.html

Tuesday, February 9, 2010

Laptop repair, We Reach Infotech in Koramangala,Bangalore

Wereach infotech claimed to service all laptops upto chip level and said they could fix my laptop too. I was surprised that such a small place could do such stuff. But the truth was they were cheats and didnt know what they were doing. they would just reinstall the OS or try to fix my laptop by some trial and error method which never worked fine.

Vijay Bhasker reddy is the person who runs the place can talk to you and make it seem that he was born to fix laptops and he can do anything .. the truth is they are so care less they lost some switches from my laptop and also stole the remote that i had given along with the laptop.

#606
1st Floor
80 Feet road, 8th Block
Koramangala
Bangalore -560095

Friday, January 15, 2010

ssh-agent : ssh agent forwading

aFirst create ssh-keys using

ssh-keygen -t dsa
ssh-keygen -t rsa

Assuming they are written to default location lets proceed.

Check if there are any existing keys

-bash-2.05b$ ssh-add -l
Could not open a connection to your authentication agent.

Start the ssh-agent

-bash-2.05b$ ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-HYsGOxmf/agent.78451; export SSH_AUTH_SOCK;
SSH_AGENT_PID=78452; export SSH_AGENT_PID;
echo Agent pid 78452;
-bash-2.05b$ SSH_AUTH_SOCK=/tmp/ssh-HYsGOxmf/agent.78451; export SSH_AUTH_SOCK;
-bash-2.05b$ SSH_AGENT_PID=78452; export SSH_AGENT_PID;

Add Your keys

-bash-2.05b$ ssh-add .ssh/id_dsa
Identity added: .ssh/id_dsa (.ssh/id_dsa)
-bash-2.05b$ ssh-add .ssh/id_rsa
Identity added: .ssh/id_rsa (.ssh/id_rsa)
-bash-2.05b$ ssh-add -l
1024 6a:70:08:2b:71:83:31:98:90:8f:99:f8:8d:96:55:0f .ssh/id_dsa (DSA)
2048 7a:01:2e:1f:88:ef:3:b6:48:3c:ee:d:dd:b4:6b:ff .ssh/id_rsa (RSA)
-bash-2.05b$ ssh -A $username@whicheverhost.com
Last login: Fri Oct 9 09:49:47 2009 from somehost.com

[root@somehost.com ~]# ssh username@whicheverhost-two.com
Last login: Fri Oct 9 09:46:18 2009 from x.com
[root@whicheverhost-two ~]#

you should also have your public keys pushed to .authorized_keys on both the hosts

Friday, July 24, 2009

Apache hadoop architecture & using hadoop

Apache Hadoop

Apache Hadoops is an open source effort to help process large data in a short time using the multiple nodes.

Want to try hadoop

If you just want to try hadoop and get the feel of it you should try www.cloudera.com it has the entire package bundled for you to get the initial hang of it

Migrating to hadoop

If you have some jobs that run on ur server and u want to port to grid. SORRY its not possible to migrate that directly to GRID you would have to re-write the code in Map-Reduce algorithm.