Testmate: CDH3 Hadoop Installation on Ubuntu 12.04 LTS

I have been struggling around a bit to install Hadoop in pseudodistributed mode due to very minute loopholes. At a higher level I would say it is highly due to the linux permissions. If we take a look into the cloudera installation guide we can get to know that all the directories have pre-defined users/groups being set. So I planned to quickly write up this blog so that you could have your local cluster up and running within minutes. First install Oracle-6-Java and set the environment variable JAVA_HOME appropriately. After this edit the ~/.bashrc and set the JAVA_HOME at the end of the file Now that you have set the JAVA_HOME the next step is to update the apt-repository to pull in all the required CDH3 components First and foremost we need to install SSH as all the components talk to each other via a SSH connection. Next download the CDH3 components via apt. Cloudera does not support Ubuntu Precise Pangolin (Ubuntu 12.04 LTS) however I tried to install the Lucid package and things work just fine. Next we try to point the installation to a pseudo distributed mode Though JAVA_HOME was set as a environment variable there is a need to set it up at /etc/hadoop-0.20/conf/hadoop-env.sh. All the components will read all the configuration values from the /etc/hadoop-0.20/conf directory Thats it you are good to go! Next format the namenode with the following command However I faced several issues when formatting the namenode. It was probably due to the /var/log/hadoop-0.20* directory not being properly set up. Namenode on starting up needs the namenode dir to be set up in a particular fashion with pre-defined permissions for every single directory. If you are facing issues on Namenode like ERROR- Could not create a directory in /var/log/hadoop-0.20 * or issues like ERROR - could not replace the directory try to run the following fix up commands and you will be good to go with With this you are in a stage to start all the daemons right away Verify that all the components are up and running Okay now its all the time to start off using the local cluster !:)

2 comments:

Personal FinanceMay 19, 2014 at 1:11 AM
command to start all hadoop process should use $service start instead of $service stop in start.sh file
mareddyonlineJuly 19, 2014 at 10:20 PM
I get a lot of great information here and this is what I am searching for Hadoop. Thank you for your sharing. I have bookmark this page for my future reference.Thanks so much for the work you have put into this post.
Hadoop Training in hyderabad

Testmate

Wednesday, June 19, 2013

CDH3 Hadoop Installation on Ubuntu 12.04 LTS

2 comments:

About Me