Wednesday, June 19, 2013
CDH3 Hadoop Installation on Ubuntu 12.04 LTS
I have been struggling around a bit to install Hadoop in pseudodistributed mode due to very minute loopholes.
At a higher level I would say it is highly due to the linux permissions. If we take a look into the cloudera installation guide we can get to know that all the directories have pre-defined users/groups being set.
So I planned to quickly write up this blog so that you could have your local cluster up and running within minutes.
First install Oracle-6-Java and set the environment variable JAVA_HOME appropriately.
After this edit the ~/.bashrc and set the JAVA_HOME at the end of the file
Now that you have set the JAVA_HOME the next step is to update the apt-repository to pull in all the required CDH3 components
First and foremost we need to install SSH as all the components talk to each other via a SSH connection.
Next download the CDH3 components via apt.
Cloudera does not support Ubuntu Precise Pangolin (Ubuntu 12.04 LTS) however I tried to install the Lucid package and things work just fine.
Next we try to point the installation to a pseudo distributed mode
Though JAVA_HOME was set as a environment variable there is a need to set it up at /etc/hadoop-0.20/conf/hadoop-env.sh.
All the components will read all the configuration values from the /etc/hadoop-0.20/conf directory
Thats it you are good to go!
Next format the namenode with the following command
However I faced several issues when formatting the namenode. It was probably due to the /var/log/hadoop-0.20* directory not being properly set up. Namenode on starting up needs the namenode dir to be set up in a particular fashion with pre-defined permissions for every single directory.
If you are facing issues on Namenode like ERROR- Could not create a directory in /var/log/hadoop-0.20 * or issues like ERROR - could not replace the directory try to run the following fix up commands and you will be good to go with
With this you are in a stage to start all the daemons right away
Verify that all the components are up and running
Okay now its all the time to start off using the local cluster !:)
Subscribe to:
Post Comments (Atom)
command to start all hadoop process should use $service start instead of $service stop in start.sh file
ReplyDeleteI get a lot of great information here and this is what I am searching for Hadoop. Thank you for your sharing. I have bookmark this page for my future reference.Thanks so much for the work you have put into this post.
ReplyDeleteHadoop Training in hyderabad