Configuring Hadoop Cluster using Ansible !

This article will help you to launch your Hadoop Cluster on the top of AWS-instances using Ansible.

Arpit Bisane
3 min readMar 7, 2021

Ansible

Ansible is an automation tool that is used for configuration management. It is a very powerful tool written in python language, it has thousands of modules using which it works, Ansible gets its intelligence from its modules.

Hadoop

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.

Update the configuration file of Ansible

/etc/ansible/ansible.cfg

EC2 instances allow key-based authentication, hence we must mention the path of the private key.

The most important part is Privilege Escalation. Root powers are required if we want to configure anything in the instance. But ec2-user is a general user with limited powers. Privilege Escalation is used to give sudo powers to a general user.

Ansible-Configuration-file

How to create Ansible Role ?

Roles let you automatically load related vars_files, tasks, handlers, and other Ansible artifacts based on a known file structure. Once you group your content in roles, you can easily reuse them and share them with other users.

Below is the command you have to run for creating ansible-role.

ansible-galaxy init name_of_role

Create Ansible-Role for Launching 3 AWS-instances.

Below is the content of tasks of Ansible-Role for launching 1 master and 2 Slave nodes. This role will launch the AWS-instances and dynamically allocate the IP’s to the respective host groups of ansible.

→ ec2 is an ansible module that helps in launching the AWS-instances.

→ add_host is an ansible module that helps us to add IP dynamically in a temporary inventory variable. hostname holds the public IP of the instances.

wait_for is another ansible module that helps in checking whether the instances are ready. The public DNS of the instances can be used to check whether SSH service has started on port number 22 or not. Once the Instance is ready to do SSH the next module will be executed.

Ansible-Role for Configuring Hadoop-Master-Node.

hdfs-site.xml template for Hadoop-Master-node.

core-site.xml template for Hadoop-Master-node.

Ansible-Role for Configuring Hadoop-Master-Node.

hdfs-site.xml template for Hadoop-Slave-node.

core-site.xml template for Hadoop-Slave-node.

Ansible-Playbook to execute all the roles.

Successfully Configured Hadoop-Cluster on top of AWS-cloud using Ansible !

Thanks for reading this blog !

Let’s connect on LinkedIn : Arpit Bisane

--

--