Integrating LVM with Hadoop and providing Elasticity to Datanode Storage

Integrating LVM with Hadoop and providing Elasticity to Datanode Storage

November 05, 2020

What is LVM?

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes.

With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

What is Hadoop?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Let's get started:

First launch an instance in AWS and configure it as Master
launch an instance in AWS.

Now we need to configure it as Master, to do that first we need two software file installed in the terminal.

1. jdk-8u171-linux-x64.rpm
2. hadoop-1.2.1-1.x86_64.rpm

To install JDK file use the command rpm -ivh jdk-8u171-linux-x64.rpm

now to install hadoop file use command rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force

Create a directory e.g mkdir /master

Now go inside directory /etc/hadoop

Configure hdfs-site.xml file , to configure it open it with editor vi hdfs-site.xml

Now between <configuration> </configuration> tags add 👇

Now configure core-site.xml file 👇

Now format the directory that we have created named /master using command hadoop namenode -format

Now start the service using command hadoop-daemon.sh start namenode and use jps to check it is started or not.

Configure the Slave node(Datanode)
Like above install jdk and hadoop file in datanode terminal.

Create a directory e.g mkdir /slave2 in slave node we don't need to format directory.

After that go to the directory /etc/hadoop and configure hdfs-site.xml and core-site.xml file.

hdfs-site.xmf file configuration 👇

core-site.xml file configuration 👇

here that IP is of master node.

Now start the Datanode use the command hadoop-daemon.sh start datanode and use jps to check it is started or not.

Now go to Master node and use command hadoop dfsadmin -report to check datanode is connected or not.

here one more thing you can see is the storage provided by Datanode to Masternode that is 46.9 GB and we will be changing that using LVM.

Now let's start with our main topic 🤘

For this demonstration i have attached two storage to Vm named HDD1 and HDD2.

Now check how many storage are attached to the OS
To check that use command fdisk -l

Create Physical Volume(PV)
To create Physical Volume use command pvcreate storage1_name and same for storage2

pvcreate /dev/sdb

To display the Physical Volume use command pvdisplay storage_name

Create Volume Group(VG)
To create Volume Group use command: vgcreate vg_name storage1_name storage2_name

vgcreate ARTHvg /dev/sdb /dev/sdc

To display Volume Group use command: Vgdisplay vg_name

vgdisplay ARTHvg

Create Logical Volume(LV)
To create Logical Volume use command: lvcreate --size lv_size --name lv_name vg_name

lvcreate --size 5G --name myLV11 ARTHvg

To display Logical volume use command: lvdisplay

Format the LV
To format the LV use the command : mkfs.ext4 /dev/vg_name/lv_name

mkfs.ext4 /dev/ARTHvg/myLV11

Mount the Directory that you have created while configuring Datanode.
To mount the directory use command: mount /dev/vg_name/lv_name /directory name

mount /devARTHvg/myLV11 /slave2

Now go to the Master node

use command hadoop dfsadmin -report to check the storage provided by datanode.

Now you can see the storage provided to the masternode is 4.86 GB earlier it was 46.9 GB

That's how we can integrate LVM Partition with Hadoop and provide Elasticity to Datanode Storage.
⭐Automating LVM Partition using Python Script 👇

Automating LVM Partition

⭐Source code of Python Script👇

Python Script

If you find it helpful then follow me on github and give star to the repo.😉

Thank you!!

Comments