大数据开发-Hadoop之YARN介绍以及实战

文章目录

    • YARN基本介绍
    • YARN的结构分析
    • YARN中的调度器
    • 实际案例:YARN多资源队列的配置和使用

YARN基本介绍

  • 实现Hadoop集群的资源共享
  • 不仅支持MapReduce,还支持Spark,Flink等计算

YARN的结构分析

  • 主要复制集群资源的管理和调度,支持主从架构,主节点可以有两个,从节点可以有多个
  • ResourceManager:主节点主要负责集群资源的管理和分配
  • NodeManager:从节点主要负责当前机器的资源管理
  • NodeManager启动时会向ResourceManager注册,注册信息中包含该节点可分配的CPU内存总量

u=430293359,2454015283&fm=253&fmt=auto&app=138&f=JPEG

YARN中的调度器

  • FIFO Schedule:先进先出调度策略
  • Capacity Schedule:FIFO Schedule多队列版本
  • Fail Schedule:多队列,多用户共享资源

集群中默认使用Capacity Schedule,可以在YARN管理页面查看到。在实际工作中使用最多的也是Capacity Schedule。

image-20240307121222241

实际案例:YARN多资源队列的配置和使用

  • 增加online(实时任务,一直执行的任务)和offline(离线任务)队列
  • 向offline中提交任务

修改配置

[root@hadoop01 hadoop-3.2.0]# vim etc/hadoop/capacity-scheduler.xml 
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run 
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare 
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare 
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,online,offline</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>70</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.online.capacity</name>
    <value>10</value>
    <description>online queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.offline.capacity</name>
    <value>20</value>
    <description>offline queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>70</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.online.maximum-capacity</name>
    <value>10</value>
    <description>
      The maximum capacity of the online queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.offline.maximum-capacity</name>
    <value>20</value>
    <description>
      The maximum capacity of the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
     </description>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.default-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
     </description>
   </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers.
      When setting this parameter, the size of the cluster should be taken into account.
      We use 40 as the default value, which is approximately the number of nodes in one rack.
      Note, if this value is -1, the locality constraint in the container request
      will be ignored, which disables the delay scheduling.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
      Number of additional missed scheduling opportunities over the node-locality-delay
      ones, after which the CapacityScheduler attempts to schedule off-switch containers,
      instead of rack-local ones.
      Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
      attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
      after 40+20=60 missed opportunities.
      When setting this parameter, the size of the cluster should be taken into account.
      We use -1 as the default value, which disables this feature. In this case, the number
      of missed opportunities for assigning off-switch containers is calculated based on
      the number of containers and unique locations specified in the resource request,
      as well as the size of the cluster.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
      Controls the number of OFF_SWITCH assignments allowed
      during a node's heartbeat. Increasing this value can improve
      scheduling rate for OFF_SWITCH containers. Lower values reduce
      "clumping" of applications on particular nodes. The default is 1.
      Legal values are 1-MAX_INT. This config is refreshable.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
      Whether RM should fail during recovery if previous applications'
      queue is no longer valid.
    </description>
  </property>

</configuration>

拷贝到其它两台机器

[root@hadoop01 hadoop]# scp -rq capacity-scheduler.xml hadoop02:/home/soft/hadoop-3.2.0/etc/hadoop/
[root@hadoop01 hadoop]# scp -rq capacity-scheduler.xml hadoop03:/home/soft/hadoop-3.2.0/etc/hadoop/

重启

[root@hadoop01 hadoop-3.2.0]# sbin/start-all.sh
Starting namenodes on [hadoop01]
Last login: Thu Mar  7 12:25:50 CST 2024 on pts/1
Starting datanodes
Last login: Thu Mar  7 12:28:23 CST 2024 on pts/1
Starting secondary namenodes [hadoop01]
Last login: Thu Mar  7 12:28:26 CST 2024 on pts/1
Starting resourcemanager
Last login: Thu Mar  7 12:28:31 CST 2024 on pts/1
Starting nodemanagers
Last login: Thu Mar  7 12:28:38 CST 2024 on pts/1
You have new mail in /var/spool/mail/root
[root@hadoop01 hadoop-3.2.0]# jps
37651 SecondaryNameNode
37380 NameNode
37896 ResourceManager
34569 JobHistoryServer
38221 Jps

image-20240307122955814

/**
     * map阶段
     */
    public static class MyMapProcess extends Mapper<LongWritable, Text, Text, LongWritable> {
        /**
         * 实现map函数
         * @param k1
         * @param v1
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void map(LongWritable k1, Text v1, Context context) throws IOException, InterruptedException {
            // k1代表每行数据的行首偏移量 v1代表每行的内容
            // 对获取的数据每一行切割
            String[] words = v1.toString().split(" ");
            for (String word: words
                 ) {
                // 封装为<k2,v2>的形式
                Text k2 = new Text(word);
                LongWritable v2 = new LongWritable(1L);
                context.write(k2, v2);
            }
        }
    }

    /**
     * reduce阶段
     * 针对<k2,{v2...}>这样的数据进行累加求和,转换为<k3,v3></>
     */
    public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {

        @Override
        protected void reduce(Text k2, Iterable<LongWritable> v2s, Context context) throws IOException, InterruptedException {
            long sum = 0L;
            for (LongWritable v2: v2s
                 ) {
                sum += v2.get();
            }
            context.write(k2, new LongWritable(sum));
        }
    }

    /**
     * 组装job=map+reduce
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        if (args.length < 2) {
            System.out.println("请输入两个目录地址");
            return;
        }
        Configuration entries = new Configuration();
        String[] remainingArgs = new GenericOptionsParser(entries, args).getRemainingArgs();

        Job job = Job.getInstance(entries);
        // 必须设置
        job.setJarByClass(WordCountJob.class);
        // 指定输入路径,可以是文件也可以是目录
        FileInputFormat.setInputPaths(job, new Path(remainingArgs[0]));

        // 只能指定一个不存在的目录
        FileOutputFormat.setOutputPath(job, new Path(remainingArgs[1]));

        // 指定map
        job.setMapperClass(MyMapProcess.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        // reduce指定
        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        // 提交job
        job.waitForCompletion(true);
    }
[root@hadoop01 hadoop-3.2.0]# bin/hadoop jar demo-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.example.hadoop.demo.mapreduce.WordCountJob -Dmapreduce.job.queuename=offline /test/hello.txt /pointqueue

(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);

    // 提交job
    job.waitForCompletion(true);
}

```shell
[root@hadoop01 hadoop-3.2.0]# bin/hadoop jar demo-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.example.hadoop.demo.mapreduce.WordCountJob -Dmapreduce.job.queuename=offline /test/hello.txt /pointqueue

image-20240307131316798

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/436609.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【论文阅读】单词级文本攻击TAAD2.2

TAAD2.2论文概览 0.前言1-101.Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Frameworka. 背景b. 方法c. 结果d. 论文及代码 2.TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attacka. 背景b…

javaWebssh水利综合信息管理系统myeclipse开发mysql数据库MVC模式java编程计算机网页设计

一、源码特点 java ssh水利综合信息管理系统是一套完善的web设计系统&#xff08;系统采用ssh框架进行设计开发&#xff09;&#xff0c;对理解JSP java编程开发语言有帮助&#xff0c;系统具有完整的源代码和数据库&#xff0c;系统主要采用B/S模式开发。开发环境为TOMCA…

BJFU|操作系统考试复习纲要(思维导图版)

纲要涵盖五个章节&#xff0c;每节一图。红色框部分为必考重点&#xff0c;建议认真复习。

数据结构与算法-归并排序

引言 在计算机科学的广阔领域中&#xff0c;数据结构与算法犹如两大基石&#xff0c;支撑着软件系统高效运行。本文将深度剖析一种基于分治策略的排序算法——归并排序&#xff0c;并探讨其原理、实现步骤以及优缺点&#xff0c;以期帮助读者深入理解这一高效的排序方法。 一、…

用开发CesiumJS模拟飞机飞行应用(一,基本功能)

本部分向您展示如何构建您的第一个 Cesium 应用程序&#xff0c;以可视化模拟从旧金山到哥本哈根的真实航班&#xff0c;并使用 FlightRadar24收集的雷达数据。您将学习如何&#xff1a; 在网络上设置并部署您的 Cesium 应用程序。 添加全球 3D 建筑物、地形和图像的基础图层。…

MySQL 学习笔记(基础篇 Day2)

「写在前面」 本文为黑马程序员 MySQL 教程的学习笔记。本着自己学习、分享他人的态度&#xff0c;分享学习笔记&#xff0c;希望能对大家有所帮助。推荐先按顺序阅读往期内容&#xff1a; 1. MySQL 学习笔记&#xff08;基础篇 Day1&#xff09; 目录 3 函数 3.1 字符串函数 3…

PostgreSQL开发与实战(6.2)体系结构2

作者&#xff1a;太阳 二、逻辑架构 graph TD A[database] -->B(schema) B -->C[表] B -->D[视图] B -->E[触发器] C -->F[索引] tablespace 三、内存结构 Postgres内存结构主要分为 共享内存 与 本地内存 两部分。共享内存为所有的 background process提供内…

VI-ORBSLAM2编译运行

ORB-SLAM2编译运行 源码地址电脑配置环境配置编译轨迹保存为tum格式运行结果Euroc数据集 源码地址 源码链接&#xff1a;https://github.com/jingpang/LearnVIORB 电脑配置 Ubuntu 18.04 ROS Melodic GTSAM 4.0.2 CERES 1.14.0 pcl1.8vtk8.2.0opencv3.2.0 环境配置 之前…

简易版手淘视频播放器开发心路历程

需求背景 简单描述一下这个功能&#xff1a;在一个走马灯组件里面第一屏是一个视频&#xff0c;第二屏第三屏是图片&#xff0c;点击播放视频&#xff0c;播放过程中滚动窗口&#xff0c;视频 fixed 在窗口顶部&#xff0c;回到顶部&#xff0c;视频还原&#xff0c;两个窗口视…

Aigtek:功率放大器的选型技巧有哪些

功率放大器在电子设备中扮演着重要的角色&#xff0c;它能够将输入信号放大到所需要的功率水平。在选择功率放大器时&#xff0c;我们需要考虑多个因素&#xff0c;包括功率需求、频率响应、失真和稳定性等。本文将介绍功率放大器选型的一些技巧&#xff0c;帮助您找到适合的功…

基于OpenCV的图形分析辨认05(补充)

目录 一、前言 二、实验内容 三、实验过程 一、前言 编程语言&#xff1a;Python&#xff0c;编程软件&#xff1a;vscode或pycharm&#xff0c;必备的第三方库&#xff1a;OpenCV&#xff0c;numpy&#xff0c;matplotlib&#xff0c;os等等。 关于OpenCV&#xff0c;num…

java基础-锁之volatilesynchronized

文章目录 volatilevolatile内存语义volatile的可见性volatile无法保证原子性volatile禁止重排优化硬件层的内存屏障volatile内存语义的实现下面是基于保守策略的JMM内存屏障插入策略。下面是保守策略下&#xff0c;volatile写插入内存屏障后生成的指令序列示意图下图是在保守策…

数据结构——lesson6二叉树基础

前言 hellohello~这里是土土数据结构学习笔记&#x1f973;&#x1f973; &#x1f4a5;个人主页&#xff1a;大耳朵土土垚的博客 &#x1f4a5; 所属专栏&#xff1a;数据结构学习笔记 &#x1f4a5;对于数据结构顺序表链表有疑问的都可以在上面数据结构的专栏进行学习哦~感…

JuiceSSH结合Cpolar实现公网远程SSH访问内网Linux系统

文章目录 1. Linux安装cpolar2. 创建公网SSH连接地址3. JuiceSSH公网远程连接4. 固定连接SSH公网地址5. SSH固定地址连接测试 处于内网的虚拟机如何被外网访问呢?如何手机就能访问虚拟机呢? cpolarJuiceSSH 实现手机端远程连接Linux虚拟机(内网穿透,手机端连接Linux虚拟机) …

echarts 模拟时间轴播放效果

x,y轴为数值轴&#xff0c;通过设置bar的数据模拟时间播放。标签可通过formatter自定义为时间&#xff0c;播放/停止/速度可通过setInterval来控制(待完善) 代码可直接放echart官方示例执行 let data [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,100]; option {color: [#3398DB],toolti…

代码随想录算法训练营第二天|977、有序数组的平方

977. 有序数组的平方 已解答 简单 相关标签 相关企业 给你一个按 非递减顺序 排序的整数数组 nums&#xff0c;返回 每个数字的平方 组成的新数组&#xff0c;要求也按 非递减顺序 排序。 示例 1&#xff1a; 输入&#xff1a;nums [-4,-1,0,3,10] 输出&#xff1a;[0,1,9,16,…

Linux Ubuntu系统安装MySQL并实现公网连接本地数据库【内网穿透】

文章目录 前言1 .安装Docker2. 使用Docker拉取MySQL镜像3. 创建并启动MySQL容器4. 本地连接测试4.1 安装MySQL图形化界面工具4.2 使用MySQL Workbench连接测试 5. 公网远程访问本地MySQL5.1 内网穿透工具安装5.2 创建远程连接公网地址5.3 使用固定TCP地址远程访问 前言 本文主…

基于Flask的宠物领养系统的设计与实现

基于Flask的宠物领养系统的设计与实现 涉及技术&#xff1a;python3.10flaskmysql8.0 系统分为普通用户和管理员两种角色&#xff0c;普通用户可以浏览搜索宠物&#xff0c;申请领养宠物&#xff1b;管理员可以分布宠物信息&#xff0c;管理系统等。 采用ORM模型创建数据&am…

chrome浏览器插件content.js和background.js还有popup都是什么,怎么通讯

popup 在用户点击扩展程序图标时&#xff08;下图中的下载图标&#xff09;&#xff0c;都可以设置弹出一个popup页面。而这个页面中自然是可以包含运行的js脚本的&#xff08;比如就叫popup.js&#xff09;。它会在每次点击插件图标——popup页面弹出时&#xff0c;重新载入。…

高并发服务器模型

高并发服务器模型 1.高并发服务器模型--select2.高并发服务器模型--poll3.epoll模型3.1 epoll原理3.2epoll反应堆 1.高并发服务器模型–select 我们知道实现服务器的高并发&#xff0c;可以用多线程或多进程去实现。但还可以利用多路IO技术:select来实现&#xff0c;它可以同时…