介绍
Hadoop Distributed File System简称 HDFS,是一个分布式文件系统。HDFS 有着高容错性(fault-tolerent)的特点,并且设计用来部署在低廉的(low-cost)硬件上。而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS 放宽了(relax)POSIX 的要求(requirements)这样可以实现流的形式访问(streaming access)文件系统中的数据。HDFS 开始是为开源的 apache 项目 nutch 的基础结构而创建,HDFS 是 hadoop 项目的一部分,而 hadoop 又是 lucene 的一部分。
官方网站
https://hadoop.apache.org/
软件准备
- hadoop-3.3.4.tar.gz
下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
- jdk-8u361-linux-x64.tar.gz
下载地址:https://share.weiyun.com/uwm6F1la
环境列表
master | 192.168.199.201 |
slave0 | 192.168.199.202 |
slave1 | 192.168.199.203 |
关闭防火墙并禁用开机自启
systemctl disable firewalld --now
关闭seLinux
# 永久关闭重启后生效
sed -i 's/=enforcing/=disabled/g' /etc/selinux/config
# 临时关闭
setenforce 0
修改主机名
# IP:192.168.199.201
hostnamectl set-hostname master
# IP:192.168.199.202
hostnamectl set-hostname slave0
# IP:192.168.199.203
hostnamectl set-hostname slave1
修改hosts文件
cat >> /etc/hosts <<EOF
192.168.199.201 master
192.168.199.202 slave0
192.168.199.203 slave1
EOF
配置免密钥登录
- 生成公钥文件
ssh-keygen -t rsa
- 对master进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@master
- 对slave0进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave0
- 对slave1进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave1
- 在master服务器输入下面的命令,都不需要输入密码,则配置成功
ssh master
ssh slave0
ssh slave1
安装JDK
- 创建java目录
mkdir /usr/local/java
cd /usr/local/java
- 把准备的jdk-8u361-linux-x64.tar.gz上传到该目录下进行解压
tar xzf jdk-8u361-linux-x64.tar.gz
- 配置环境变量
echo "export JAVA_HOME=/usr/local/java/jdk1.8.0_361" >> /root/.bash_profile
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> /root/.bash_profile
source /root/.bash_profile
- 验证变量是否生效
[root@master ~]# java -version
java version "1.8.0_361"
Java(TM) SE Runtime Environment (build 1.8.0_361-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.361-b10, mixed mode)
- 拷贝jdk和.bash_profile文件到slave0、slave1中
scp -r /usr/local/java root@slave0:/usr/local
scp -r /root/.bash_profile root@slave0:/root
ssh root@slave0 "source /root/.bash_profile"
scp -r /usr/local/java root@slave1:/usr/local
scp -r /root/.bash_profile root@slave1:/root
ssh root@slave1 "source /root/.bash_profile"
Hadoop安装与环境配置
1、上传hadoop-3.3.4.tar.gz到/opt目录下,解压并变更属主属组
cd /opt/
tar xzf hadoop-3.3.4.tar.gz
mv hadoop-3.3.4 hadoop
chown -R root:root hadoop
2、创建数据目录
mkdir -p /opt/hadoop/{tmp,hdfs/{name,data}}
3、配置hadoop-env.sh
sed -i 's@# export JAVA_HOME=@export JAVA_HOME=\/usr\/local\/java\/jdk1.8.0_361\/@g' /opt/hadoop/etc/hadoop/hadoop-env.sh
grep JAVA_HOME= /opt/hadoop/etc/hadoop/hadoop-env.sh
4、配置core-site.xml
vim /opt/hadoop/etc/hadoop/core-site.xml
# <configuration>之间添加如下内容:
<!-- 指定Hadoop所使用的文件系统schema(URL),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的储存目录,默认是/tmp/hadoop-${user.name} -->
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
</property>
5、配置hdfs-site.xml
vim /opt/hadoop/etc/hadoop/hdfs-site.xml
#<configuration>之间添加如下内容:
<property>
<name>dfs.replication</name>
<value>1</value>
<description>HDFS数据块的副本存储个数</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/hdfs/name</value>
<description>为了保证元数据的安全一般配置多个不同目录</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/hdfs/data</value>
<description>datanode的数据存储目录</description>
</property>
- secondary namenode 运行节点的信息,在namenode不同节点上进行配置
<property>
<name>dfs.secondary.http.address</name>
<value>master:50090</value>
<description>secondarynamenode运行节点的信息和namenode不同节点</description>
</property>
6、配置yarn-site.xml
vim /opt/hadoop/etc/hadoop/yarn-site.xml
#<configuration>之间添加如下内容:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>YARN集群为MapReduce程序提供的shuffle服务</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
7、配置mapred-site.xml
vim /opt/hadoop/etc/hadoop/mapred-site.xml
#<configuration>之间添加如下内容:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
8、配置works
- hadoop-3.0.0版本以前该文件名称为slaves
cat > /opt/hadoop/etc/hadoop/workers <<EOF
master
slave0
slave1
EOF
9、配置Hadoop环境变量
echo "export HADOOP_HOME=/opt/hadoop" >> /root/.bash_profile
echo "export PATH=\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\$PATH" >> /root/.bash_profile
echo "export HDFS_NAMENODE_USER=root" >> /root/.bash_profile
echo "export HDFS_DATANODE_USER=root" >> /root/.bash_profile
echo "export HDFS_SECONDARYNAMENODE_USER=root" >> /root/.bash_profile
echo "export YARN_RESOURCEMANAGER_USER=root" >> /root/.bash_profile
echo "export YARN_NODEMANAGER_USER=root" >> /root/.bash_profile
10、复制hadoop相关目录到slave0和slave1
scp -r /opt/hadoop root@slave0:/opt
scp -r /root/.bash_profile root@slave0:/root
ssh root@slave0 "source /root/.bash_profile"
ssh root@slave0 "/bin/bash /opt/hadoop/useradd.sh"
scp -r /opt/hadoop root@slave1:/opt
scp -r /root/.bash_profile root@slave1:/root
ssh root@slave1 "source /root/.bash_profile"
ssh root@slave1 "/bin/bash /opt/hadoop/useradd.sh"
11、格式化文件系统
- 仅master执行该命令,且只能执行一次
source /root/.bash_profile
hadoop namenode -format
12、启动hadoop
[root@master ~]# start-all.sh
Starting namenodes on [master]
Last login: Tue Oct 11 23:18:57 CST 2022 from master on pts/1
Starting datanodes
Last login: Tue Oct 11 23:53:33 CST 2022 on pts/0
slave0: WARNING: /opt/hadoop/logs does not exist. Creating.
slave1: WARNING: /opt/hadoop/logs does not exist. Creating.
Starting secondary namenodes [master]
Last login: Tue Oct 11 23:53:35 CST 2022 on pts/0
Starting resourcemanager
Last login: Tue Oct 11 23:53:44 CST 2022 on pts/0
Starting nodemanagers
Last login: Tue Oct 11 23:54:16 CST 2022 on pts/0
[root@master ~]# jps
2631 SecondaryNameNode
2935 ResourceManager
2280 NameNode
2424 DataNode
3067 NodeManager
3619 Jps
[root@master ~]# ssh slave0 "/usr/local/java/jdk1.8.0_361/bin/jps"
1795 DataNode
1908 NodeManager
2015 Jps
[root@master ~]# ssh slave1 "/usr/local/java/jdk1.8.0_361/bin/jps"
1747 DataNode
1862 NodeManager
1965 Jps
13、关闭hadoop
stop-all.sh
14、historyserver的启动与关闭
mapred --daemon start historyserver
mapred --daemon stop historyserver
15、Web Interfaces
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
Daemon | Web Interface | Notes |
NameNode | http://nn_host:port/ | Default HTTP port is 9870. |
ResourceManager | http://rm_host:port/ | Default HTTP port is 8088. |
MapReduce JobHistory Server | http://jhs_host:port/ | Default HTTP port is 19888. |
http://192.168.199.201:9870
http://192.168.199.201:18088
http://192.168.199.201:19888
到这里就配置完成了,以上代码和界面均能实现,则hadoop部署成功!
分享、在看与点赞
只要你点,我们就是胖友
来自: Centos 7之搭建Hadoophttps://mp.weixin.qq.com/s?__biz=Mzk0NTQ3OTk3MQ==&mid=2247486447&idx=1&sn=02bf9445d7e23023ae75e5f0097df9a3&chksm=c31583a3f4620ab57eab9f8e36e42d9592a92ce48ddd198be4a4fd4a7be96d58dc8267a0245d&token=355315523&lang=zh_CN#rd