How to Install and Configure Hadoop on Ubuntu 22.04

Hadoop is the software library that is used for storing a large volume of the web application’s data. Thousands of machines can be managed using a single server with the help of Hadoop. It is an open-source and reliable software library developed by the Apache development team. 

Hadoop provides a number of libraries and tools to increase its functionality. Popular tools of Hadoop are Apache Hbase and Apache Flink. This post demonstrates the installation and configuration of the Hadoop on Ubuntu Jammy Jellyfish. 

What is the Installation Method of Hadoop on Ubuntu?

Use the source package of Hadoop to install it on Ubuntu 22.04 by following the below-mentioned instructions. 

Step 1: Open the Terminal

First, open and launch the terminal of Ubuntu:

Step 2: Update Ubuntu’s Packages

Next is to upgrade all the packages to their recent update using the command:

$ sudo apt update && sudo apt upgrade -y

Step 3: Install the Java

Install the Java package as it is required for the installation of the “Hadoop” with the below-mentioned command:

$ sudo apt install default-jdk default-jre -y

Verify the installation by displaying the Java version:

$ java --version

Now for the ease we will create a separate user “hadoop” for running the hadoop with the command:

$ sudo adduser hadoop

Add the hadoop user to the sudo group with the command:

$ sudo usermod -aG sudo hadoop

Finally switch to the hadoop user staying in the terminal with the command:

$ sudo su - hadoop

Also, Install the openssh and its client server after switching to the new user that is hadoop:

$ sudo apt install openssh-server openssh-client -y

Generate the private and public key with the ssh-keygen:

$ ssh-keygen -t rsa

Run the next mentioned command to add the public keys to “authorized_keys”:

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Change the file permission of the authorized_keys:

$ sudo chmod 640 ~/.ssh/authorized_keys

Verify the configuration of SSH:

$ ssh localhost

Step 4: Download the tar Package of Hadoop

When the upgradation is completed, download the tar package of the Hadoop with the wget command from its official website:

$ sudo wget https://downloads.apache.org/hadoop/common/stable/hadoop-3.3.6.tar.gz

Step 5: Extract the Compressed tar Package of Hadoop

Extract the downloaded tar package with the below-mentioned command at /usr/local/ directory:

$ sudo tar -xzvf hadoop-3.3.6.tar.gz

Move the extracted folder to the “/usr/local/hadoop” with the cd command:

$ sudo mv hadoop-3.3.6 /usr/local/hadoop

Create a new directory for saving the logs while using the hadoop:

$ sudo mkdir /usr/local/hadoop/logs

Change the permissions of the hadoop directory:

$ sudo chown -R hadoop:hadoop /usr/local/hadoop

Step 6: Configure Java Environment Variables

Before the configuration of the Java Environment Variables, find the location of the installed Java package:

$ dirname $(dirname $(readlink -f $(which java)))

Open the bashrc file with the nano text editor (you can also use anyother text editor as well):

$  nano ~/.bashrc

Copy-paste the below-mentioned lines in the opened file to configure the hadoop:

export HADOOP_HOME=/usr/local/hadoopexport HADOOP_INSTALL=$HADOOP_HOMEexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/binexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Then using the shortcut keys of CTRL+S to save the changes of the file and CTRL+X to exit the nano text editor, reload the bashrc file with the command:

$ source ~/.bashrc

Step 7: Configure the Environment Variable of Hadoop

Using the nano text editor, configure the Hadoop environment variable:

$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

For setting the Java_HOME variable type the below mentioned lines in the end of the file:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"

Save the file and exit the nano text editor, also navigate to the hadoop directory:

$ cd /usr/local/hadoop/lib

Now download the activation key of the hadoop using the command:

$ sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-api/1.2.0/javax.activation-api-1.2.0.jar

Confirm the installation of the hadoop by displaying its version:

$ hadoop version

How to Configure the Hadoop on Ubuntu 22.04?

First, create two different directories of the datanode and namenode with the mkdir command:

$ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode} && sudo chown -R hadoop:hadoop /home/hadoop/hdfs

After this open the core-ste.xml file with the command:

$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following lines, replacing the “ubuntu” with your machine hostname:

<property>
      <name>fs.default.name</name>
      <value>hdfs://0.0.0.0:9000</value>
      <description>The default file system URI</description>
  </property>

Update the path of namenode and datanode directories in the hdfs-site.xml file:

$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Copy and paste the below-mentioned lines into the new file:

<property>
      <name>dfs.replication</name>
      <value>1</value>
  </property>

  <property>
      <name>dfs.name.dir</name>
      <value>file:///home/hadoop/hdfs/namenode</value>
  </property>

  <property>
      <name>dfs.data.dir</name>
      <value>file:///home/hadoop/hdfs/datanode</value>
  </property>

Close the file by saving the edited file and open the mapred-site file:

$ nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Paste the following lines and save the file:

<property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>

The last file to be edited is yarn-site with the nano text editor:

$ nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Write the following lines in the new file and save it before exiting the text editor:

<property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

Format the namenode directory to start the “Hadoop Cluster” for the operation of Hadoop:

$ hdfs namenode -format

Start the Hadoop cluster:

$ start-dfs.sh

Also, start the yarn service with the command:

$ start-yarn.sh

To display all the services of Hadoop, execute the command:

$ jps

Allow t8088 for the operation of the Hadoop service and open the browser then type the URL “localhost:8088”:

To stop the service of the yarn and Hadoop, use the command:

$ stop-dfs.sh && stop-yarn.sh

This is all about the installation and configuration of the Hadoop on Ubuntu. 

Conclusion

Hadoop can be installed on Ubuntu by downloading its package from the Download section of its official website. It can be installed by setting up the environment variables. This post explained all the installation and configuration steps of the Hadoop on Ubuntu.