Important Hadoop Configurtions

I am a Tech Enthusiast having 13+ years of experience in ๐๐ as a ๐๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐๐ง๐ญ, ๐๐จ๐ซ๐ฉ๐จ๐ซ๐๐ญ๐ ๐๐ซ๐๐ข๐ง๐๐ซ, ๐๐๐ง๐ญ๐จ๐ซ, with 12+ years in training and mentoring in ๐๐จ๐๐ญ๐ฐ๐๐ซ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ , ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ , ๐๐๐ฌ๐ญ ๐๐ฎ๐ญ๐จ๐ฆ๐๐ญ๐ข๐จ๐ง ๐๐ง๐ ๐๐๐ญ๐ ๐๐๐ข๐๐ง๐๐. I have ๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐๐ 10,000+ ๐ฐ๐ป ๐ท๐๐๐๐๐๐๐๐๐๐๐๐ and ๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐ ๐๐๐๐ 500+ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ in the areas of ๐๐จ๐๐ญ๐ฐ๐๐ซ๐ ๐๐๐ฏ๐๐ฅ๐จ๐ฉ๐ฆ๐๐ง๐ญ, ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ , ๐๐ฅ๐จ๐ฎ๐, ๐๐๐ญ๐ ๐๐ง๐๐ฅ๐ฒ๐ฌ๐ข๐ฌ, ๐๐๐ญ๐ ๐๐ข๐ฌ๐ฎ๐๐ฅ๐ข๐ณ๐๐ญ๐ข๐จ๐ง๐ฌ, ๐๐ซ๐ญ๐ข๐๐ข๐๐ข๐๐ฅ ๐๐ง๐ญ๐๐ฅ๐ฅ๐ข๐ ๐๐ง๐๐ ๐๐ง๐ ๐๐๐๐ก๐ข๐ง๐ ๐๐๐๐ซ๐ง๐ข๐ง๐ . I am interested in ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐ ๐๐ฅ๐จ๐ ๐ฌ, ๐ฌ๐ก๐๐ซ๐ข๐ง๐ ๐ญ๐๐๐ก๐ง๐ข๐๐๐ฅ ๐ค๐ง๐จ๐ฐ๐ฅ๐๐๐ ๐, ๐ฌ๐จ๐ฅ๐ฏ๐ข๐ง๐ ๐ญ๐๐๐ก๐ง๐ข๐๐๐ฅ ๐ข๐ฌ๐ฌ๐ฎ๐๐ฌ, ๐ซ๐๐๐๐ข๐ง๐ ๐๐ง๐ ๐ฅ๐๐๐ซ๐ง๐ข๐ง๐ new subjects.
In this article, we will learn about important Hadoop configuration files
hadoop-env.sh
Environment variables that are used in the scripts to run Hadoop
Exploring core-site.xml
All the configuration settings related to Hadoop core such as I/O settings that are common to HDFS and MapReduce.
Reference Link: https://hadoop.apache.org/docs/r2.6.2/hadoop-project-dist/hadoop-common/core-default.xml
Exploring hdfs-site.xml
Configuration settings for HDFS daemons, the namenode, the secondary namenode and the data nodes.
Configuring Replication Factor, Block Size, Directory Specific Details, Permission, Security Level
dfs.block.size
This property is used to change the block size from its default size i.e 128 MB
<property>
<name>dfs.block.sizw</name>
<value>134217728</value>
</property>
dfs.replication
This property is used to change the replication factor from its default.
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
dfs.permissions
If true, enable permission checking in HDFS. If false permission checking turned off.
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
dfs.namenode.name.dir
This property determines where on the local files system the DFS NameNode should store the NameNode table.
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/npntraining/dfs/namenode</value>
</property>
dfs.datanode.data.dir
This property determines where on the local files system the DFS DataNode should store the blocks#
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/npntraining/dfs/namenode</value>
</property>
Exploring mapred-site.xml
Configuration settings for MapReduce daemons : the ResourceManager and the NodeManager
mapreduce.framework.name
The runtime framework for executing MapReduce jobs. Can be one of local, classic or YARN
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
mapreduce.map.memory.mb
The amount of memory to request from the scheduler for each map task.
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
mapreduce.map.cpu.vcores
The number of virtual cores to request from the scheduler for each map task..
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
mapreduce.reduce.memory.mb
The amount of memory to request from the scheduler for each reduce task.
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>
mapreduce.reduce.cpu.vcores
The number of virtual cores to request from the scheduler for each reduce task.
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
</property>
Exploring yarn-site.xml
mapreduce.framework.name
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
masters
A list of machines (one per line) that each run a secondary namenode.
slaves
A list of machines (one per line) that each run a DataNode and a NodeManager.
...
Connect with me on LinkedIn, If you are looking for 1:1 mentorship for a career and interviews connect me on ๐ญ๐จ๐ฉ๐ฆ๐๐ญ๐.๐ข๐จ/๐ง๐๐ฏ๐๐๐ง๐ฉ๐ง



