Hadoop,Mahout Basic Installations,Configurations and Settings – Big Data Analysis

1> Download link for hadoop

2> Installation link for Hadoop
    — http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

3> Required softwares for installing hadoop
    — Java 1.6 and higher version

4> Adding a user called hadoop

    — to add the user provide following commands in terminal
        $ sudo addgroup hadoop
          $ sudo adduser –ingroup hadoop hadoop
    –This will add the user hadoop and the group hadoop to your local machine.

5> Configuring ssh for that user

    — This is required by HADOOP for managing the nodes by remote and local user
    — To do this we need to provide following commands in terminal
        $ su – hadoop
        $ ssh-keygen -t rsa -P “”
        — you will be asked to enter the file to save key just press enter
        $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys /to enable SSH to access local machine
        $ ssh localhost //to test the SSH
        — you will be asked whether to continue to connect, say yes

6> Disabling IP-V6

    — we need to change the sysctl.conf file which is in /etc/ and provide the command
        $ sudo gedit /etc/sysctl.conf

    — At the end of file add this line
        “net.ipv6.conf.all.disable_ipv6 = 1”
    — save and restart the system or reload the sysctl.conf file
        $ sudo sysctl -p
    — to check whether it has been disabled give the command
        $ ip a | grep inet
        — if we did not get any output then IPv6 is diasabled

9> Extracting Hadoop

    — extract from the tar.gz file
    — provide the following coomands to install

        $ cd /home/hadoop //to install in our specified path
        $ sudo tar xfz hadoop-0.20.2.tar.gz
        $ sudo mv hadoop-0.20.2 hadoop
        $ sudo chown -R hadoop:hadoop hadoop

10> Configuring hadoop with java

    — To configure HADOOP with java we need to change the hadoop-env.sh file which is in /home/hadoop/hadoop/conf

    –Uncomment the following line in the file
        “export JAVA_HOME=/usr/lib/j2sdk1.6-sun”
        — set the path to the java installed in your system
        — in my system it is “export JAVA_HOME=/usr/java/jdk1.6.0_29”

11> Site Specific configuration

    — All site specific configuration are done in core-site.xml, hdfs-site.xml and mapred-site.xml files which is present in /home/hadoop/hadoop/conf

    — Add the following properties to the hdfs-site.xml file within


        <description>The name of the default file system.  A URI whose
        scheme and authority determine the FileSystem implementation.  The
        uri’s scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class.  The uri’s authority is used to
        determine the host, port, etc. for a filesystem.</description>


    — Core-site.xml


        <description>A base for other temporary directories.</description>



    — Mapred-site.xml file within






10> Formatting the name node

    — This is to be done the first time when a Hadoop cluster is set
    — To format the filesystem provide the following code

        $ /home/hadoop/hadoop/bin/hadoop namenode -format

    — if formatted successfully at last you will get a message as
          SHUTDOWN_MSG: Shutting down NameNode at localhost/localhost

11> To start hadoop

    — To start change directory to hadoop and give the following command

        $ bin/start-all.sh

12> To check checking whether Hadoop processes are running

    –To check provide the command as
        $ jps

13> To stop hadoop

    –To stop provide the command as
        $ bin/stop-all.sh

14> Run a MapReduce job

    — Copy the text files into the a temp folder inside HADOOP

        $ bin/hadoop dfs -copyFromLocal /path/of/file temp

    — To check the files

        $ bin/hadoop dfs -ls

    — To perform the word count operation

        $ bin/hadoop jar hadoop-0.14.2-examples.jar wordcount gutenberg gutenberg-output

    –To check the output

        $ bin/hadoop dfs -cat temp-output/part-00000

15> Use the web interfaces

    — To use the default interface of HADOOP change the hdfs-site.xml file add the port number

    –Example : http://localhost:50030 port number for job tracker
            http://localhost:50060 port number to mapred.job.tracker for task tacker
            http://localhost:50070 port number to hadoop.tmp.dir for HDFS name node

About ashokabhat

I am a C,C ,JAVA,Adobe Flex,.NET Programmer Currently working as a Software Developer
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s