Hadoop 2.6 Installing on Ubuntu 14.04 (Single-Node Cluster)

Installing Hadoop 2.6 on Ubuntu 14.04 (Single-Node Cluster)

Let’s see how to install a single-node Hadoop cluster backed by the Hadoop Distributed File System on Ubuntu.

1. Update the Source list:

2. Check if Java is installed

3. Add a dedicated Hadoop user

4. Install ssh

How to install ssh and sshd?

ssh : The command we use to connect to remote machines – the client.
sshd : The daemon that is running on the server and allows clients to connect to the server.
The ssh is pre-enabled on Linux, but in order to start sshd daemon, we need to install ssh first. Use this command to do that :

Let’s check if it is installed properly:

5. Create and setup ssh certificates

Hadoop requires ssh access to manage its nodes, i.e. remote machines plus our local machine. For our single-node setup of Hadoop, we, therefore need to configure SSH access to localhost.

So, we need to have ssh up and running on our machine and configured it to allow ssh public key authentication.

Hadoop uses ssh (to access its nodes) which would normally require the user to enter a password. However, this requirement can be eliminated by creating and setting up ssh certificates using the following commands. If asked for a filename just leave it blank and press the enter key to continue.

The second command adds the newly created key to the list of authorized keys so that Hadoop can use ssh without prompting for a password.

We can check if ssh works:

6. Install Hadoop

Move the Hadoop installation to the /usr/local/hadoop directory using the following command:

To resolve this error, login in as a root user, and then add hduser to sudo:

Let’s now move the Hadoop installation to the /usr/local/hadoop directory:

7. Setup Configuration Files

We need to modify following files to complete the Hadoop setup:

  1. ~/.bashrc
  2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  3. /usr/local/hadoop/etc/hadoop/core-site.xml
  4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
  5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

7.1. ~/.bashrc:

7.2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh

We need to set JAVA_HOME by modifying hadoop-env.sh file.

7.3. /usr/local/hadoop/etc/hadoop/core-site.xml:

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up. This file can be used to override the default settings that Hadoop starts with.

Open the file and enter the following in between the <configuration> </configuration> tag:

7.4. /usr/local/hadoop/etc/hadoop/mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains /usr/local/hadoop/etc/hadoop/mapred-site.xml.template file which has to be renamed/copied with the name mapred-site.xml:

The mapred-site.xml file is used to specify which framework is being used for MapReduce. We need to enter the following content in between the <configuration> </configuration> tag:

7.5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being used. It is used to specify the directories which will be used as the namenode and the datanode on that host.

Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation.
This can be done using the following commands:

Open the file and enter the following content in between the <configuration> </configuration> tag:

8. Format the New Hadoop Filesystem

Now, the Hadoop file system needs to be formatted so that we can start to use it. The format command should be issued with write permission since it creates current directory under /usr/local/hadoop_store/hdfs/namenode folder:

Note that hadoop namenode -format command should be executed once before we start using Hadoop. If this command is executed again after Hadoop has been used, it’ll destroy all the data on the Hadoop file system.

9. Starting Hadoop

Now it’s time to start the newly installed single node cluster. We can use start-all.sh or (start-dfs.sh and start-yarn.sh)

Let’s check if it’s really up and running:

The output means that we now have a functional instance of Hadoop running on our VPS (Virtual private server). Another way to check is using netstat:

10. Stopping Hadoop

We run stop-all.sh or (stop-dfs.sh and stop-yarn.sh) to stop all the daemons running on our machine:

11. Hadoop Web Interfaces

Let’s start the Hadoop again and see its Web UI:

11.1. http://localhost:50070/ – web UI of the NameNode daemon

hadoop-installation-1

11.2. http://localhost:50090/ SecondaryNameNode

hadoop-installation-3

Ref: 1, 2

2 Comments

  1. Hi Ranjan,

    Thanks for the amazing post. Successfully installed hadoop cluster.

    Can you please post basic hadoop lessons for absolute beginners?

    Thanks,
    Nilanjan

Leave a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

Fork me on GitHub