Wednesday, June 19, 2013

CDH3 Hadoop Installation on Ubuntu 12.04 LTS

I have been struggling around a bit to install Hadoop in pseudodistributed mode due to very minute loopholes. At a higher level I would say it is highly due to the linux permissions. If we take a look into the cloudera installation guide we can get to know that all the directories have pre-defined users/groups being set. So I planned to quickly write up this blog so that you could have your local cluster up and running within minutes. First install Oracle-6-Java and set the environment variable JAVA_HOME appropriately. After this edit the ~/.bashrc and set the JAVA_HOME at the end of the file Now that you have set the JAVA_HOME the next step is to update the apt-repository to pull in all the required CDH3 components First and foremost we need to install SSH as all the components talk to each other via a SSH connection. Next download the CDH3 components via apt. Cloudera does not support Ubuntu Precise Pangolin (Ubuntu 12.04 LTS) however I tried to install the Lucid package and things work just fine. Next we try to point the installation to a pseudo distributed mode Though JAVA_HOME was set as a environment variable there is a need to set it up at /etc/hadoop-0.20/conf/hadoop-env.sh. All the components will read all the configuration values from the /etc/hadoop-0.20/conf directory Thats it you are good to go! Next format the namenode with the following command However I faced several issues when formatting the namenode. It was probably due to the /var/log/hadoop-0.20* directory not being properly set up. Namenode on starting up needs the namenode dir to be set up in a particular fashion with pre-defined permissions for every single directory. If you are facing issues on Namenode like ERROR- Could not create a directory in /var/log/hadoop-0.20 * or issues like ERROR - could not replace the directory try to run the following fix up commands and you will be good to go with With this you are in a stage to start all the daemons right away Verify that all the components are up and running Okay now its all the time to start off using the local cluster !:)

2 comments:

  1. command to start all hadoop process should use $service start instead of $service stop in start.sh file

    ReplyDelete
  2. I get a lot of great information here and this is what I am searching for Hadoop. Thank you for your sharing. I have bookmark this page for my future reference.Thanks so much for the work you have put into this post.
    Hadoop Training in hyderabad

    ReplyDelete