Hadoop Installation on Single Machine

- January 23, 2014

To Download and Install Hadoop, the prerequisites are

1. Linux based OS 64-bit OS like

Ubuntu

CentOS

Fedora ... etc

I preferred to use Ubuntu 12.04LTS, later 14.04 LTS(upcomming version)

2. JAVA 1.6 or 1.7 JDK

Go to Downloads folder

> cd Downloads

Un-zip the hadoop tar file

>sudo tar xzf hadoop-1.1.2.tar.gz

I created a folder in /home/hduser/

>mkdir Installations

Move the Hadoop Un-Zip folder to Installations Directory, pointing as Hadoop

>sudo mv /home/hduser/Downloads/hadoop-1.2.1 hadoop

Giving some permissions to hadoop folder

>sudo addgroup hadoop

>sudo chown -R hduser:hadoop hadoop

Restart the terminal inorder to get .bashrc file with some content

>gksudo gedit .bashrc

add the following lines to end of this page

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)

export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45/

# Set Hadoop-related environment variables

export HADOOP_HOME=/home/hduser/Installations/hadoop-1.2.1

# Some convenient aliases and functions for running Hadoop-related commands

unalias fs &> /dev/null

alias fs="hadoop fs"

unalias hls &> /dev/null

alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and

# compress job outputs with LZOP (not covered in this tutorial):

# Conveniently inspect an LZOP compressed file from the command

# line; run via:

# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo

# Requires installed 'lzop' command.

lzohead () {

hadoop fs -cat $1 | lzop -dc | head -1000 | less

}

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:$HADOOP_HOME/bin

Save the file and exit the terminal.

Now its time to modify the configuratins in core-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh

In hadoop-env.sh

export JAVA_HOME as in snap

In core-site.xml file, add text below, between tags

hadoop.tmp.dir
/app/hadoop/tmp

A base for other temporary directories.

fs.default.name

hdfs://localhost:9000

The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.

In hdfs-site.xml,

We need to set replication factor as follows

dfs.replication

Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

In mapred-site.xml, Configure mapred as follows

mapred.job.tracker

localhost:54311

The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

save all the files and close

Open the terminal and go for following steps for first use.

>cd $HADOOP_HOME

>bin/hadoop namenode -format

>start-all.sh

>jps

after getting jps, you will find 5 daemons which cluster is ready

goto web browser for GUI for Hadoop

http://localhost:50070

In coming blog, i am going to give Multi-Cluster Setup

Happy Hadooping !!

Search This Blog

Big Data Trendz

UPDATES

Hadoop Installation on Single Machine

Comments

Post a Comment

Popular posts from this blog

Cloudera Data Hub: Where Agility Meets Control

3X FASTER INTERACTIVE QUERY WITH APACHE HIVE LLAP

Introduction to HDFS Erasure Coding in Apache Hadoop

Big Data Trendz