Tweets by @sreemancloudera

Wednesday 5 December 2012

Bigdata Baby Steps

It is first time, that i felt i am so determined this time and very enthusiastic about learning Big Data and realized that HADOOP is playing bigger role over there. So started digging out the information across all the web resources, It was not a cake walk for me because there were multiple problems that i had faced while installing HADOOP in windows. So just trying to see if mentioning these would help some one.
 Here are the problems that i had faced and how did i resolve them. I have mentioned the source of that information as well for their credit. I thought of mentioning each and every problem but do not remember all the information.




Installing Apache Hadoop for Windows.
Step1 : First install CYGWIN
               I installed it but when i tried the below ssh local host it was not done . So here is the solution that I followed.

$ ssh localhost
    ssh: connect to host localhost port 22: Connection refused
if you are facing this problem in windows xp follow these steps to open port for ssh
Go to windows firewall of security section in control panel
Exceptions->add port
Give port name as ssh and number as 22.
Select option TCP
Click on Ok
This will help you open ssh from cygwin
For local application development like hadoop on windows please
please change the scope "localhost /ip address " in the custom list

I've read about the restrictions on accessing shares while logged   
into a Windows system with the Cygwin ssh daemon.  We are interested   
in this to do remote builds, and it would be nice to access network   
shares.  We only really need one user to be able to log in, so I   
thought I'd change the CYGWIN sshd service to run as that user.   
However, when I changed the service and tried to start it, I got the   
following error message: "The VYGWIN sshd servcice on Local COmputer   
started and then stopped."  Any ideas what's going on? 

I tried to revert to having the service started by the .\sshd user,   
but I can't get that to work no either!  I think it's because I am   
using the wrong password.  How can I change or reset the password on   
that account? 
It's not a month since Larry posted this (thanks, BTW), and this   
issue has bubbled up to the top again.  I have tried various ways to   
get the sshd service started as a domain user (instead of the local   
sshd_server user) and can not get it to work.  What is the correct   
syntax to specify a domain user with cygrunsrv?  This is what I have   
tried: 

   cygrunsrv -I sshd -u "DOMAINNAME\USERNAME" -w PASSWORD -d "CYGWIN   
sshd" -p /usr/sbin/sshd -a -D -e "CYGWIN=bin tty smbntsec" -y tcpip 

This successfully installs the service, and if I look at it in the   
Services panel it shows the correct username (DOMAIN\USERNAME), but   
if I try to start the service I always get the error "The Cygwin sshd   
service in Local Computer started and then stopped".  If I substitute   
sshd_server for the user and supply the correct password, the sshd   
service starts correctly.  But I want to start the service as a   
domain user so that I can access network shares and resolve some   
build issues with Visual Studio that are apparently caused by not   
being fully authenticated. 
Useful links
http://www.petrikainulainen.net/programming/apache-hadoop/install-and-configure-apache-hadoop-to-run-in-a-pseudo-distributed-mode/

http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html

After installing CYGWIN, you need to make sure that all your HADOOP files are placed into CYGWIN directory itself otherwise you will get errors.

STEP 2: Tried to run the command bin/hadoop but got the below error

Note if you get an error like "./bin/hadoop: line 2: $'\r': command not found" This can happen when the line endings in the hadoop script files become corrupted. To repair them, run the following set of commands: 

dos2unix bin/hadoop
dos2unix bin/*.sh
dos2unix conf/*.sh
 
http://www.infosci.cornell.edu/hadoop/windows.html

STEP3: After running the above commands, At least i could see system is recognizing the HADOOP.

Felt very happy and patted on my back happily but the below errors smiled at me

b009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ bin/hadoop
: No such file or directoryn
bin/hadoop: line 60: syntax error near unexpected token `$'in\r''
'in/hadoop: line 60: `case "`uname`" in

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix bin/hadoop
dos2unix: converting file bin/hadoop to Unix format ...

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix bin/*.sh
dos2unix: converting file bin/hadoop-config.sh to Unix format ...
dos2unix: converting file bin/hadoop-daemon.sh to Unix format ...
dos2unix: converting file bin/hadoop-daemons.sh to Unix format ...
dos2unix: converting file bin/slaves.sh to Unix format ...
dos2unix: converting file bin/start-all.sh to Unix format ...
dos2unix: converting file bin/start-balancer.sh to Unix format ...
dos2unix: converting file bin/start-dfs.sh to Unix format ...
dos2unix: converting file bin/start-jobhistoryserver.sh to Unix format ...
dos2unix: converting file bin/start-mapred.sh to Unix format ...
dos2unix: converting file bin/stop-all.sh to Unix format ...
dos2unix: converting file bin/stop-balancer.sh to Unix format ...
dos2unix: converting file bin/stop-dfs.sh to Unix format ...
dos2unix: converting file bin/stop-jobhistoryserver.sh to Unix format ...
dos2unix: converting file bin/stop-mapred.sh to Unix format ...

sb009239@BTLAP05063 /cygdrive/c/cygwin/usr/local/hadoop-1.0.4
$ dos2unix conf/*.sh
dos2unix: converting file conf/hadoop-env.sh to Unix format ...

STEP 4 error:

bin/hadoop: line 325: /cygdrive/c/Program: No such file or directory


STEP 5: After that I blindly followed the below command as is written in "HADOOP in ACTION " guide

$ bin/hadoop jar hadoop-*-examples.jar
/cygdrive/c/Program Files/Java/jdk1.6.0_31
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-*-examples.jar
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: hadoop-*-examples.jar (The filename, directory name, or volume label syntax is incorrect)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:127)
        at java.util.jar.JarFile.<init>(JarFile.java:135)
        at java.util.jar.JarFile.<init>(JarFile.java:72)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

bin/hadoop jar hadoop-examples-1.0.4.jar

OOPs, no not resolved, tried 

$ bin/hadoop jar wordcount.jar
/cygdrive/c/Program Files/Java/jdk1.6.0_31
Exception in thread "main" java.io.IOException: Error opening job jar: wordcount.jar
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: wordcount.jar (The system cannot find the file specified)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:127)
        at java.util.jar.JarFile.<init>(JarFile.java:135)
        at java.util.jar.JarFile.<init>(JarFile.java:72)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

Finally arrived at

$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount
/cygdrive/c/Program Files/Java/jdk1.6.0_31
Usage: wordcount <in> <out>

So modified to

$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount input output1
/cygdrive/c/Program Files/Java/jdk1.6.0_31
12/11/12 16:23:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/11/12 16:23:37 ERROR security.UserGroupInformation: PriviledgedActionException as:sb009239 cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-sb009239\mapred\staging\sb0092391774446851\.staging to 0700
java.io.IOException: Failed to set permissions of path: \tmp\hadoop-sb009239\mapred\staging\sb0092391774446851\.staging to 0700
        at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)