Elasticity in the Storage of distributed Hadoop Cluster

Amit Kumar
6 min readJan 13, 2022

🤔🤔Have you ever thought that how we can automate the storage size as and when required ?? Yes !! It is possible 🤩

Today we will implement such a powerful concept of Elasticity in the storage i.e., Logical Volume Management (LVM). Lets discuss about it with deep insights then we will move towards the implementation.

What is the Problem Statement ?

Suppose we have a requirement of extending the storage capacity of our datanote without loosing the previously stored data. It seems to be normal but it is not !! Because when we mount a new block of storage it will remove the previous format or file system of that directory.

Hint : We will be using the concept of LVM.

Logical Volume Management is the concept of Linux. In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

Lets move to the implementation. Right now i have already setup a hadoop cluster, where i have one Master and One Slave node. Here i am going to show how we can dynamically increase the storage size of this Slave node, further you can implement it on as many nodes as you can.

Step 1 : Check the running and configured Hadoop Cluster

Since, we do not require much storage in our MasterNode. So we will move to the Slave node, this time i have attached two external EBS volumes one of 20 GiB and other is 30 GiB.

Step 2 : Check the available volumes.

To check the available hard disk in our data node, we use “fdisk -l” command :

Now my plan is to use these highlighted EBS Volumes but not separately, in a combined form i.e., we want a combined capacity of 50 GiB and here we have that is 20 GiB + 30 GiB.

These separate volumes are also known as Physical Volumes(PV). When we combine them and create a single block (logically ) then that logical block will be known as Volume Group(VG).

This Volume Group will be treated as a new device. This will be called a Logical Volume (because it is not real).

Step 3: Create a Volume Group

To create a volume group, we require physical volume. To convert this block devices into the physical volume we use the command “pvcreate device_path

Before using this command, we need to install the tool/software that will implement this concept for us i.e., lvm2.

Command : “yum install lvm2 -y

Now, create two physical volumes one of 20 GiB and other for 30 GiB.

Check whether they have been created or not ?? Using command “pvdisplay

So we have two physical volumes, lets bind them in a single logical block. to create one , we use the command “vgcreate vg_name pv_name

Lets look its status, using command “vgdisplay vg_name

An interesting fact about volume group is that , we can as many volumes as we want.

So we are all set with the volume group. Now we are ready to create logical partitions and to use them. 🤩🤩We can create as many partitions as we want.

Step 4: Create a Logical volume of some specific size (as per the requirement).

To create a logical volume, we use the command

To get the information about this logical volume, we use the command “lvdisplay vg_name/lv_name

Note : We can have multiple Volume groups in a system.

Step 5: Using the created Logical Volume by mounting it to the Slave Node storage folder.

In my case, the storage folder is /dn.

To mount this logical block on this storage directory, we use the command “mount block_name directory_name” but before mounting we need to format this first!! to format i am using mkfs.ext4 file System.

Now our volume is ready to be mounted!!

Check whether it has been mounted or not, we use “lsblk” command

Quickly, lets check that it has been updated in the cluster or not ?

Now store some data into the cluster

To put the data, i have used “hadoop fs -put filename /

Suppose there is a situation where we have completely used the capacity of this storage block and we want some more storage to be added in the same without loosing the previously stored data. Here comes the benefits of extending this storage size.

Step 6: Extend the storage size

To extend the storage, we use the command

Lets check its status

Since in the details of the logical volume it is showing that its size is increased to 40 GB from 35 GB. But right now we have 35 GB , a formatted block and 5 GB , an unformat block. It is very much obvious that we cannot allocate the unformat block to store some data in it.

The actual structure of the logical block will be :

Now this is the situation where we cannot use mkfs to format that part, because if we use this, then the complete device will be reformatted and we will loose our data that we have previously installed. So here we use “resize2fs path_of_lv” command.

Now, if we check the allocation status we will get the updated block size

Lets check the status of hadoop cluster

Wait a min 🤔!!we haven’t checked that our previous data is being there or not??

🤩🤩yeah!!! we have complete data. You can match the timing of the data that was visited and its availability.

I hope you like this concept, and i believe that this would be helpful to you — to solve the great industry use cases. 😇

💫Keep Sharing, Keep Learning💫

Thank You !!!

--

--