1、Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以不需要了解分布式底层细节的情况下,开发分布式程序。充分利用集群进行高速运算和存储。
2、下载Hadoop,我们在清华大学的镜像站下载
Index of /apache/hadoop/core/hadoop-3.3.6 (tsinghua.edu.cn)
3、下载之后解压缩
4、安装相应的环境
4.1、jdk1.8或者jdk11,
4.2、配置hadoop环境
5、修改hadoop配置
5.1、修改start-all.cmd中的配置
@rem start hdfs daemons if hdfs is present
if exist %HADOOP_HDFS_HOME%\sbin\start-dfs.cmd (
call %HADOOP_HDFS_HOME%\sbin\start-dfs.cmd --config %HADOOP_CONF_DIR%
)
@rem start yarn daemons if yarn is present
if exist %HADOOP_YARN_HOME%\sbin\start-yarn.cmd (
call %HADOOP_YARN_HOME%\sbin\start-yarn.cmd --config %HADOOP_CONF_DIR%
)
5.2、修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
</configuration>
5.3、mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5.4、hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/D:/bigdata/hadoop/data/namenode</value> //注意前面部分路径修改为自己的
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/D:/bigdata/hadoop/data/datanode</value> //注意前面部分路径修改为自己的
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
5.5、core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/D:/bigdata/hadoop/data/tmp</value> //注意前面部分路径修改为自己的
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>
</configuration>
5.6、需要拷贝winutils下的winutils.exe,hadoop.dll到hadoop的bin下面
在/ect/hadoop/hadoop-env.cmd中设置set JAVA_HOME=D:/Java/jdk1.8.0_311,jdk的环境地址
修改start-yarn.cmd的配置
setlocal enabledelayedexpansion
echo starting yarn daemons
if not defined HADOOP_BIN_PATH (
rem set HADOOP_BIN_PATH=%~dp0
set HADOOP_BIN_PATH=%HADOOP_HOME%\bin
)
if "%HADOOP_BIN_PATH:~-1%" == "\" (
set HADOOP_BIN_PATH=%HADOOP_BIN_PATH:~0,-1%
)
set DEFAULT_LIBEXEC_DIR=%HADOOP_BIN_PATH%\..\libexec
if not defined HADOOP_LIBEXEC_DIR (
set HADOOP_LIBEXEC_DIR=%DEFAULT_LIBEXEC_DIR%
)
call %HADOOP_LIBEXEC_DIR%\yarn-config.cmd %*
if "%1" == "--config" (
shift
shift
)
@rem start resourceManager
start "Apache Hadoop Distribution" %HADOOP_HOME%\bin\yarn resourcemanager
@rem start nodeManager
start "Apache Hadoop Distribution" %HADOOP_HOME%\bin\yarn nodemanager
@rem start proxyserver
@rem start "Apache Hadoop Distribution" %HADOOP_HOME%\yarn proxyserver
endlocal
5.7、输入hdfs namenode -format格式化
hdfs namenode -format
5.8、进入hadoop/sbin目录执行start-all.cmd
D:\bigdata\hadoop\sbin>start-all.cmd
This script is Deprecated. Instead use start-dfs.cmd and start-yarn.cmd
starting yarn daemons
5.9、jps查看
D:\bigdata\hadoop\sbin>jps
8448 Jps
28360 NameNode
29592 ResourceManager
18108 NodeManager
20940 DataNode
5.10、启动成功后界面
http://localhost:8088/cluster
http://localhost:9870/
单机版的hadoop就启动成功