Create VMWare Bundle with all Packages – Hadoop,Hive,Mahout,Tomcat

1> Installation of VM-WARE
Download latest 4.0 version, 32-bit or 64-bit VM-WARE Player shell script according to the system configuration.

2> Download link for VM-WARE Player
http://www.vmware.com/go/downloadplayer

3> Run the shell script which will guide you through further process of installing player

4> Download virtual machine appliances for ubuntu Desktop

5> Start VM-WARE Player and select open existing project, go to the path where the .vmx file placed of ubuntu Desktop, select that file and say open

6> Default user will be hadoop, and password “hadoop”

7> Installing hadoop, Download hadoop version 0.20.2

8> Download link for hadoop
http://www.bizdirusa.com/mirrors/apache/hadoop/common/hadoop-0.20.2/

9> Installation link for Hadoop
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

10> Required softwares for installing hadoop
— Java 1.6 and higher version

11> Adding a user called hadoop

— to add the user provide following commands in terminal
$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hadoop
–This will add the user hadoop and the group hadoop to your local machine.

12> Configuring ssh for that user

— This is required by HADOOP for managing the nodes by remote and local user
— To do this we need to provide following commands in terminal
$ su – hadoop
$ ssh-keygen -t rsa -P “”
— you will be asked to enter the file to save key just press enter
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys /to enable SSH to access local machine
$ ssh localhost //to test the SSH
— you will be asked whether to continue to connect, say yes

13> Disabling IP-V6

— we need to change the sysctl.conf file which is in /etc/ and provide the command
$ sudo gedit /etc/sysctl.conf

— At the end of file add this line
“net.ipv6.conf.all.disable_ipv6 = 1”
— save and restart the system or reload the sysctl.conf file
$ sudo sysctl -p
— to check whether it has been disabled give the command
$ ip a | grep inet
— if we did not get any output then IPv6 is diasabled

14> Extracting Hadoop

— extract from the tar.gz file
— provide the following coomands to install

$ cd /home/hadoop //to install in our specified path
$ sudo tar xfz hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop
$ sudo chown -R hadoop:hadoop hadoop

15> Configuring hadoop with java

— To configure HADOOP with java we need to change the hadoop-env.sh file which is in /home/hadoop/hadoop/conf

–Uncomment the following line in the file
“export JAVA_HOME=/usr/lib/j2sdk1.6-sun”
— set the path to the java installed in your system
— in my system it is “export JAVA_HOME=/usr/java/jdk1.6.0_29”

16> Site Specific configuration

— All site specific configuration are done in core-site.xml, hdfs-site.xml and mapred-site.xml files which is present in /home/hadoop/hadoop/conf

— Add the following properties to the hdfs-site.xml file within

dfs.default.name
hdfs://localhost:54310
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.

— Core-site.xml

hadoop.tmp.dir
/home/hadoop/hadoop/tmp-dir
A base for other temporary directories.

fs.default.name
hdfs://localhost:54310

— Mapred-site.xml file within

mapred.job.tracker
hdfs://localhost:54311

dfs.name.dir
/home/hadoop/hadoop/tmp-dir/name-dir

dfs.data.dir
/home/hadoop/hadoop/tmp-dir/data-dir

17> Formatting the name node

— This is to be done the first time when a Hadoop cluster is set
— To format the filesystem provide the following code

$ /home/hadoop/hadoop/bin/hadoop namenode -format

— if formatted successfully at last you will get a message as
/***********************************************************
SHUTDOWN_MSG: Shutting down NameNode at 129.168.1.106/192.168.1.106
**********************************************************/

18> To start hadoop

— To start change directory to hadoop and give the following command

$ bin/start-all.sh

19> To check checking whether Hadoop processes are running

–To check provide the command as
$ jps

20> To stop hadoop

–To stop provide the command as
$ bin/stop-all.sh

21> Run a MapReduce job

— Copy the text files into the a temp folder inside HADOOP

$ bin/hadoop dfs -copyFromLocal /path/of/file temp

— To check the files

$ bin/hadoop dfs -ls

— To perform the word count operation

$ bin/hadoop jar hadoop-0.14.2-examples.jar wordcount gutenberg gutenberg-output

–To check the output

$ bin/hadoop dfs -cat temp-output/part-00000

22> Use the web interfaces

— To use the default interface of HADOOP change the hdfs-site.xml file add the port number

–Example : http://192.168.1.106:50030 port number for job tracker
http://192.168.1.106:50060 port number to mapred.job.tracker for task tacker
http://192.168.1.106:50070 port number to hadoop.tmp.dir for HDFS name node

23> Installing hive
Download hive with version 0.8.1

24> Download link for hive
http://apache.mesi.com.ar/hive/hive-0.8.1/

25> Extract Hive to the same directory where hadoop is installed and change directory to it, //not necessarily

26> Changing configuration files of hive

— open conf/hive-env.sh.template file and add the following properties to it.

HADOOP_HOME=/home/hadoop/hadoop
export HIVE_CONF_DIR=/home/hadoop/hive-0.8.1/conf

— open bin/hive and add the property

HADOOP_HOME=/home/hadoop/hadoop

27> Starting hive

$ bin/hive

28> If successfully started you will get a hive terminal with (“>”) prompt

29> To close hive say quit;

30> Starting hadoop, hive and tomcat at the start of ubuntu

1.create an executable sh file which contains following.

. /home/hadoop/hadoop/conf/hadoop-env.sh
export HPATH=/home/hadoop/hadoop
export TPATH=/home/hadoop/tomcat6
export HLOCK=/var/lock/subsys
RETVAL=0
PIDFILE=$HLOCK/hadoop-hdfs-master.pid
desc=”Hadoop Master daemon”
sudo chmod 777 /home/hadoop/hadoop -R
start() {

echo -n $”Starting $desc (hadoop): ”
su hadoop -c $HPATH/bin/start-all.sh $1
sh $TPATH/bin/startup.sh
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch $HLOCK/hadoop-master
return $RETVAL
}

stop() {
echo -n $”Stopping $desc (hadoop): ”
su hadoop -c $HPATH/bin/stop-all.sh $2
sh $TPATH/bin/shutdown.sh
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f $HLOCK/hadoop-master $PIDFILE
}
restart() {
stop
start
}

case “$1″ in
start)
start
;;
upgrade)
upgrade
;;
stop)
stop
;;
restart)
restart
;;
*)
echo $”Usage: $0 {start|stop|restart|try-restart}”
exit 1
esac

exit $RETVAL

2.In the above lines check for HPATH and TPATH which are hadoop and tomcat directory path.

3.For hive start up create another executable file which contains following

sudo touch /var/run/hive-thrift.pid
sudo touch /var/run/hive-thrift-java.pid
sudo touch /var/log/hive-thrift.log
sudo chown hadoop:admin /var/run/hive-thrift.pid
sudo chown hadoop:admin /var/log/hive-thrift.log
sudo chown hadoop:admin /var/run/hive-thrift-java.pid
# Paths to configuration, binaries, etc
HIVE_LOG=/var/log/hive-thrift.log
HIVE_USER=”hadoop”
Hive_BIN=/home/hadoop/hive-0.8.1
if [ ! -f $HIVE_BIN ]; then
echo “File not found: $HIVE_BIN”
exit 1
fi

pidfile=${PIDFILE-/var/run/hive-thrift.pid}
pidfile_java=${PIDFILE_JAVA-/var/run/hive-thrift-java.pid}
RETVAL=0

start() {
echo -n $”Starting $prog: ”
cd /home/hadoop/hive-0.8.1
sudo -u hadoop sh -c “bin/hive –service hiveserver” $HIVE_USER >> $HIVE_LOG 2>&1 &
runuser_pid=$!
echo $runuser_pid > $pidfile

java_pid=$(ps -eo pid,ppid,fname | awk “{ if (\$2 == $runuser_pid && \$3 ~ /java/) { print \ $1 } }”)
echo $java_pid > $pidfile_java
disown -ar

RETVAL=$?
echo
return $RETVAL
}

stop() {
echo -n $”Stopping $prog: ”
if kill `cat $pidfile` && kill `cat $pidfile_java`; then
RETVAL=0
echo_success
else
RETVAL=1
echo_failure
fi
echo
[ $RETVAL = 0 ] && rm -f ${pidfile} ${pidfile_java}
}

status_fn() {
if [ -f $pidfile_java ] && checkpid `cat $pidfile_java`; then
echo “hive-thrift is running”
exit 0
else
echo “hive-thrift is stopped”
exit 1
fi
}

case “$1″ in
start)
start
;;
stop)
stop
;;
status)
status_fn
;;
restart)
stop
start
;;
*)
echo $”Usage: $prog {start|stop|restart|status}”
RETVAL=3
esac

exit $RETVAL

4.Once both the files are created save those files in path /etc/init.d/

5.After saving both files update rc.d by using following commands.

update-rc.d hadoop-tomcat defaults

update-rc.d hive-thrift defaults

6.To check whether startup commands are on, use following command.

chkconfig –list | grep hadoop-tomcat

you must get following results for above command

hadoop-tomcat 0:off 1:off 2:on 3:on 4:on 5:on 6:off

notice that 2,3,4,5 are on that will trigger startup do the same for hive thrift to check whether it is on or not.

Advertisements

About ashokabhat

I am a C,C ,JAVA,Adobe Flex,.NET Programmer Currently working as a Software Developer
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Create VMWare Bundle with all Packages – Hadoop,Hive,Mahout,Tomcat

  1. the effort you made to write this article is the proof of how you like to help us, thanks for all. lista de email lista de email lista de email lista de email lista de email

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s