Thursday, August 1, 2013

Hadoop Chain Mapper

Sometimes before the reducer could actually take on its job- there may be a actual need to process the actual data to make it more suitable.

 
To illustrate this we want the Mapper to operate in phases.
Let us say we consider two phases.
First phase does the cleaning up task for removing the unwanted data.
The second phase does the actual mapping task - the output of the final second phase mapper is passed to the reducer.


Thus we could chain several mappers and indicate the hadoop system to work accordingly.

Wednesday, June 19, 2013

CDH3 Hadoop Installation on Ubuntu 12.04 LTS

I have been struggling around a bit to install Hadoop in pseudodistributed mode due to very minute loopholes. At a higher level I would say it is highly due to the linux permissions. If we take a look into the cloudera installation guide we can get to know that all the directories have pre-defined users/groups being set. So I planned to quickly write up this blog so that you could have your local cluster up and running within minutes. First install Oracle-6-Java and set the environment variable JAVA_HOME appropriately. After this edit the ~/.bashrc and set the JAVA_HOME at the end of the file Now that you have set the JAVA_HOME the next step is to update the apt-repository to pull in all the required CDH3 components First and foremost we need to install SSH as all the components talk to each other via a SSH connection. Next download the CDH3 components via apt. Cloudera does not support Ubuntu Precise Pangolin (Ubuntu 12.04 LTS) however I tried to install the Lucid package and things work just fine. Next we try to point the installation to a pseudo distributed mode Though JAVA_HOME was set as a environment variable there is a need to set it up at /etc/hadoop-0.20/conf/hadoop-env.sh. All the components will read all the configuration values from the /etc/hadoop-0.20/conf directory Thats it you are good to go! Next format the namenode with the following command However I faced several issues when formatting the namenode. It was probably due to the /var/log/hadoop-0.20* directory not being properly set up. Namenode on starting up needs the namenode dir to be set up in a particular fashion with pre-defined permissions for every single directory. If you are facing issues on Namenode like ERROR- Could not create a directory in /var/log/hadoop-0.20 * or issues like ERROR - could not replace the directory try to run the following fix up commands and you will be good to go with With this you are in a stage to start all the daemons right away Verify that all the components are up and running Okay now its all the time to start off using the local cluster !:)

Wednesday, May 29, 2013

A need to learn Scala ?

Here are a few pointers which is increasing my urge to start learning this new platform.

1.Scala is compelling because it feels like a dynamically typed scripting language, due to its succinct syntax and type inference.

2.It still gives all the benefits of static typing, object oriented modelling, functional programming, advanced type systems.

3. Yet the notion is that the simplicity is deceptive.

I guess its always good to poke in see how deceptive it can be :)

Thursday, May 23, 2013

Google protocol buffers



Running the examples in the source base have issues which are related to the classpath.
Somehow the readme does not state this issue clearly. 
This can result in a wasteful effort.


What I did to resolve the issue in order to run the examples successfully

1)
vmanohar@ubuntu:~/protobuf-2.5.0/java$ mvn install
vmanohar@ubuntu:~/protobuf-2.5.0/java$ mvn package

This generates the protobuf-java-2.5.0.jar in the target directory.
This has to be included in the javac -classpath and java -classpath commands.

2)
Edit the make file appropriately
Edit the make file in /home/vmanohar/protobuf-2.5.0/examples
 
javac_middleman: AddPerson.java ListPeople.java protoc_middleman
    javac -classpath /home/vmanohar/protobuf-2.5.0/java/target/protobuf-java-2.5.0.jar  AddPerson.java ListPeople.java com/example/tutorial/AddressBookProtos.java
    @touch javac_middleman
Add the highlighted -classpath.


3) Now run the make java command 


vmanohar@ubuntu:~/protobuf-2.5.0/examples$ make java

4)Compilation should be successful.
The make file auto-generates a script file add_person_java to ease out the quick-start but it actually ends up amplifying the efforts to find a fix. So do not use the script/ update the script.

I used the following command

java -classpath .:/home/vmanohar/protobuf-2.5.0/java/target/protobuf-java-2.5.0.jar  AddPerson addRecords

Here addRecords is a new file and hence the logic ends up generated logic ends up creating a new file.




--OUTPUT

addRecords: File not found.  Creating a new file.
Enter person ID: 123
Enter name: Peter
Enter email address (blank for none): xyz@gmail.com
Enter a phone number (or leave blank to finish): 234
Is this a mobile, home, or work phone? mobile
Enter a phone number (or leave blank to finish):