Create VMWare Bundle with all Packages – Hadoop,Hive,Mahout,Tomcat

1> Installation of VM-WARE
Download latest 4.0 version, 32-bit or 64-bit VM-WARE Player shell script according to the system configuration.

2> Download link for VM-WARE Player

3> Run the shell script which will guide you through further process of installing player

4> Download virtual machine appliances for ubuntu Desktop

5> Start VM-WARE Player and select open existing project, go to the path where the .vmx file placed of ubuntu Desktop, select that file and say open

6> Default user will be hadoop, and password “hadoop”

7> Installing hadoop, Download hadoop version 0.20.2

8> Download link for hadoop

9> Installation link for Hadoop

10> Required softwares for installing hadoop
— Java 1.6 and higher version

11> Adding a user called hadoop

— to add the user provide following commands in terminal
$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hadoop
–This will add the user hadoop and the group hadoop to your local machine.

12> Configuring ssh for that user

— This is required by HADOOP for managing the nodes by remote and local user
— To do this we need to provide following commands in terminal
$ su – hadoop
$ ssh-keygen -t rsa -P “”
— you will be asked to enter the file to save key just press enter
$ cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys /to enable SSH to access local machine
$ ssh localhost //to test the SSH
— you will be asked whether to continue to connect, say yes

13> Disabling IP-V6

— we need to change the sysctl.conf file which is in /etc/ and provide the command
$ sudo gedit /etc/sysctl.conf

— At the end of file add this line
“net.ipv6.conf.all.disable_ipv6 = 1”
— save and restart the system or reload the sysctl.conf file
$ sudo sysctl -p
— to check whether it has been disabled give the command
$ ip a | grep inet
— if we did not get any output then IPv6 is diasabled

14> Extracting Hadoop

— extract from the tar.gz file
— provide the following coomands to install

$ cd /home/hadoop //to install in our specified path
$ sudo tar xfz hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop
$ sudo chown -R hadoop:hadoop hadoop

15> Configuring hadoop with java

— To configure HADOOP with java we need to change the file which is in /home/hadoop/hadoop/conf

–Uncomment the following line in the file
“export JAVA_HOME=/usr/lib/j2sdk1.6-sun”
— set the path to the java installed in your system
— in my system it is “export JAVA_HOME=/usr/java/jdk1.6.0_29”

16> Site Specific configuration

— All site specific configuration are done in core-site.xml, hdfs-site.xml and mapred-site.xml files which is present in /home/hadoop/hadoop/conf

— Add the following properties to the hdfs-site.xml file within
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.

— Core-site.xml

A base for other temporary directories.

— Mapred-site.xml file within


17> Formatting the name node

— This is to be done the first time when a Hadoop cluster is set
— To format the filesystem provide the following code

$ /home/hadoop/hadoop/bin/hadoop namenode -format

— if formatted successfully at last you will get a message as
SHUTDOWN_MSG: Shutting down NameNode at

18> To start hadoop

— To start change directory to hadoop and give the following command

$ bin/

19> To check checking whether Hadoop processes are running

–To check provide the command as
$ jps

20> To stop hadoop

–To stop provide the command as
$ bin/

21> Run a MapReduce job

— Copy the text files into the a temp folder inside HADOOP

$ bin/hadoop dfs -copyFromLocal /path/of/file temp

— To check the files

$ bin/hadoop dfs -ls

— To perform the word count operation

$ bin/hadoop jar hadoop-0.14.2-examples.jar wordcount gutenberg gutenberg-output

–To check the output

$ bin/hadoop dfs -cat temp-output/part-00000

22> Use the web interfaces

— To use the default interface of HADOOP change the hdfs-site.xml file add the port number

–Example : port number for job tracker port number to mapred.job.tracker for task tacker port number to hadoop.tmp.dir for HDFS name node

23> Installing hive
Download hive with version 0.8.1

24> Download link for hive

25> Extract Hive to the same directory where hadoop is installed and change directory to it, //not necessarily

26> Changing configuration files of hive

— open conf/ file and add the following properties to it.

export HIVE_CONF_DIR=/home/hadoop/hive-0.8.1/conf

— open bin/hive and add the property


27> Starting hive

$ bin/hive

28> If successfully started you will get a hive terminal with (“>”) prompt

29> To close hive say quit;

30> Starting hadoop, hive and tomcat at the start of ubuntu

1.create an executable sh file which contains following.

. /home/hadoop/hadoop/conf/
export HPATH=/home/hadoop/hadoop
export TPATH=/home/hadoop/tomcat6
export HLOCK=/var/lock/subsys
desc=”Hadoop Master daemon”
sudo chmod 777 /home/hadoop/hadoop -R
start() {

echo -n $”Starting $desc (hadoop): ”
su hadoop -c $HPATH/bin/ $1
sh $TPATH/bin/
[ $RETVAL -eq 0 ] && touch $HLOCK/hadoop-master
return $RETVAL

stop() {
echo -n $”Stopping $desc (hadoop): ”
su hadoop -c $HPATH/bin/ $2
sh $TPATH/bin/
[ $RETVAL -eq 0 ] && rm -f $HLOCK/hadoop-master $PIDFILE
restart() {

case “$1″ in
echo $”Usage: $0 {start|stop|restart|try-restart}”
exit 1

exit $RETVAL

2.In the above lines check for HPATH and TPATH which are hadoop and tomcat directory path.

3.For hive start up create another executable file which contains following

sudo touch /var/run/
sudo touch /var/run/
sudo touch /var/log/hive-thrift.log
sudo chown hadoop:admin /var/run/
sudo chown hadoop:admin /var/log/hive-thrift.log
sudo chown hadoop:admin /var/run/
# Paths to configuration, binaries, etc
if [ ! -f $HIVE_BIN ]; then
echo “File not found: $HIVE_BIN”
exit 1


start() {
echo -n $”Starting $prog: ”
cd /home/hadoop/hive-0.8.1
sudo -u hadoop sh -c “bin/hive –service hiveserver” $HIVE_USER >> $HIVE_LOG 2>&1 &
echo $runuser_pid > $pidfile

java_pid=$(ps -eo pid,ppid,fname | awk “{ if (\$2 == $runuser_pid && \$3 ~ /java/) { print \ $1 } }”)
echo $java_pid > $pidfile_java
disown -ar

return $RETVAL

stop() {
echo -n $”Stopping $prog: ”
if kill `cat $pidfile` && kill `cat $pidfile_java`; then
[ $RETVAL = 0 ] && rm -f ${pidfile} ${pidfile_java}

status_fn() {
if [ -f $pidfile_java ] && checkpid `cat $pidfile_java`; then
echo “hive-thrift is running”
exit 0
echo “hive-thrift is stopped”
exit 1

case “$1″ in
echo $”Usage: $prog {start|stop|restart|status}”

exit $RETVAL

4.Once both the files are created save those files in path /etc/init.d/

5.After saving both files update rc.d by using following commands.

update-rc.d hadoop-tomcat defaults

update-rc.d hive-thrift defaults

6.To check whether startup commands are on, use following command.

chkconfig –list | grep hadoop-tomcat

you must get following results for above command

hadoop-tomcat 0:off 1:off 2:on 3:on 4:on 5:on 6:off

notice that 2,3,4,5 are on that will trigger startup do the same for hive thrift to check whether it is on or not.


About ashokabhat

I am a C,C ,JAVA,Adobe Flex,.NET Programmer Currently working as a Software Developer
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Create VMWare Bundle with all Packages – Hadoop,Hive,Mahout,Tomcat

  1. the effort you made to write this article is the proof of how you like to help us, thanks for all. lista de email lista de email lista de email lista de email lista de email

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s