CUDA系列-Mem-9

这里写目录标题

  • Static Architecture
    • .Abstractions provided by CUSW_UNIT_MEM_MANAGER
      • Memory Object (CUmemobj)
    • Memory Descriptor(CUmemdesc)
    • Memory Block(CUmemblock)
      • Memory Bins
      • Suballocations in Memory Block
      • Functional description
    • Memory Manager

你可能觉得奇怪,这些不是cuda programming model中的内容啊,其实这是cuda runtimes ,还记得那份泄漏出来的代码吗?
This section describes static aspects of the CUSW_UNIT_MEM_MANAGER unit’s architecture.


Static Architecture

The CUSW_UNIT_MEM_MANAGER provides abstractions for allocating memory by other units of CUDA driver or from user space applications through CUSW_UNIT_CUDART. It also provides abstractions to share the memory allocated in the Host with the Device.

CUDA driver sees memory allocations in the form of “Memory objects”. A Memory object represents the chunk of memory allocated. It abstracts all layers involved in the memory management and provides APIs to other units of CUDA driver to allocate/free and map/unmap memory along with other auxiliary functionalities.

To avoid fragmentation of memory, the memory objects are allocated from a bigger fixed size “Memory Block”. Each Memory block has a “Memory Descriptor” which contains all the memory attributes associated with a memory block. The “Memory Manager” maintains all the memory blocks which are allocated in a given CUDA context.

The diagram below shows how CUSW_UNIT_MEM_MANAGER interfaces with different units of CUDA driver. Since memory management needs support from the underlying drivers, CUSW_UNIT_MEM_MANAGER depends on the NVRM driver to accomplish its tasks in memory management. In line with the top-level architectural philosophy (refer to the CUDA Architecture document), the unit also contains parts of Hardware Abstraction Layer (HAL) and Driver Model Abstraction layer (DMAL), which it uses internally.
在这里插入图片描述

.Abstractions provided by CUSW_UNIT_MEM_MANAGER

CUSW_UNIT_MEM_MANAGER primarily consists of three types of abstractions. The Memory object, Memory Descriptor and the Memory Manager. This section describes each abstraction in detail.

Memory Object (CUmemobj)

A memory object represents a memory allocation. It contains the size, device virtual address, host virtual address and other related information about a memory allocation. It is also possible for one context to share memory with other, in which case the memory object also contains the sharing information. Memory object abstracts all underlying implementation and is the only way for other units in the CUDA driver to interface with CUSW_UNIT_MEM_MANAGER.
Functional description
Memory Object abstraction provides functionalities to

Map/unmap an existing memory object to host memory.

Get parent block’s CUmemflags/CUmemdesc for a given memory object.

To get the data needed to share given memory object with another context.

To get the absolute device virtual address of memory object.

To get the memory object for a given Virtual Address or Range.

To get the user-visible device pointer/host pointer for a given memory object and vice versa.

To get the logical byte size of a given memory object.

To get the context of a given memory object.

To get the shared instance of a given memory object.

To check if the memory allocated in host/device and its associated cache attributes.

To check if the memory allocated is pinned/managed memory.

To execute the given cache operation on cpu side on the memory region pointed by given memory object.

To Mark/Unmark the given memory object to ensure that synchronous memory operations on the given memory object are always fully synchronous.

Memory Descriptor(CUmemdesc)

Memory descriptor contains memory attributes associated with a memory allocation.

While requesting for memory, the other units of CUDA driver can provide specific attributes for the memory allocation request. To get/set the attributes for a memory allocation, other units of the CUDA driver can use the interfaces provided by the Memory object.

Memory Block(CUmemblock)

Memory block is a superblock which encapsulates the physical allocation.

Each Memory Block is associated with a Memory Descriptor as explained above which describes the attributes of the memory block. It contains a OS specific structure(dmal) to hold OS driver specific handles and data. The size of a particular memory block is based on specified memory type(like pushbuffer, generic etc) and arch specific requirements for the MMU. Generally the size of a memory block is the size of HUGE Page supported by the Device. Memory block can be created with shared allocation from another context.

Memory Bins

An effective method of avoiding fragmentation of memory is by grouping similar sized allocations together. Since the size of the memory block is big, sub-allocations can be made inside a memory block. Memory bins try to achieve that by creating bins of varying size and associating each memory block with it. For example, 5 bins are created with size 1KB, 4KB, 16KB, 64KB and 256KB during initialization of memory manager. During a memory allocation request when a new memblock is created, based on the allocation request size one of the above memory bin is assigned to the memblock. Future similar sized allocations requests are serviced by suballocating from that memblock.

Suballocations in Memory Block

During a memory allocation request, efforts are made to find an already existing suitable Memory Block which has the same memory attributes(in the form of Memory descriptor) as that of the incoming request and has free area in which the requested size worth of memory can be safely allocated. If such a memory block is found and it belongs to the same memory bin as that of requested size then memory will be sub allocated in that block and a memory object is created to represent the same and returned, otherwise new block is created.

Functional description

Memory Block abstraction provides some functionalities which are used internally and not exposed to components outside CUDA memory manager.

Alloc/Free memblock

Allocate UVA for a given memblock

Map/Unmap given memblock to host

Map/Unmap given memblock to device

To get memory range based on the kind of device mapping chosen

To get the UVA address for a given memblock

Memory Manager

Memory allocations happen within a CUDA Context. So each CUDA context has an instance of Memory Manager which has data structures to track all memory allocations done in a given CUDA context. It also provides synchronization primitives to protect the common data structures from concurrent access.

CUSW_UNIT_MEM_MANAGER maintains different Virtual Address regions within which the VA is assigned for a particular memory allocation request. The choice of the VA region depends mainly on page size and the requested memory type.

The available memory ranges are

Dptr32 - It is the 32 bit device pointer range to which memory is mapped for special allocations that need 32 bit addresses due to some H/W unit requirements

Function Memory - Function memory range for CUDA GPU function code

Small Page Region - If the requested page size is 4KB then allocations are made in this region

Big Page Region - If the requested page size is device specific page size (64KB or 128KB depending on device architecture) then allocations are made in this region.

The class diagram below depicts the relationship between various data types involved in memory management in cuda driver.
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/723788.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

分班查询,一键发布,老师们都在用的分班查询系统

老师们开学季马上又要到了,回想起了每年埋头苦干,对着一堆堆的学生名单,一个个手动分配班级,再一个个通知家长和学生的日子,那种手忙脚乱,生怕出错的紧张感,是不是还历历在目?每次分…

工作人员能从轧钢测径仪上获取哪些有效信息?

轧钢测径仪安装在轧钢生产线中,无论是热轧还是冷轧,都不能阻挡测径仪的高速无损高精检测。它采用八轴测量系统,能全方位检测外径尺寸,并且配备了测控软件系统,为工作人员提供更加丰富的产线信息。 普通轧钢测径仪能获…

汽车IVI中控开发入门及进阶(三十一):视频知识扫盲

有效的视频资源管理需要集成许多不同的底层技术,共同为用户提供给定应用程序的最佳体验。其中许多技术是从早期电视广播中使用的技术演变而来的。其他方法,如用于通过网络流式传输视频的压缩方法,相对较新且不断发展。 以下详细概述了与图形和视频处理和传输相关的一些基本…

微信小程序入门1

什么是微信小程序? 与传统的原生应用相比,微信小程序是一种全新的连接用户与服务的应用,它可以在微信内被便捷地获取和传播,同时具有良好的用户体验。微信小程序是运行在微信中的应用,是一种不需要下载即可使用的应用…

大众标准化杂志大众标准化杂志社大众标准化编辑部2024年第10期目录

计量技术 计量标准监管体系下的新型技术和新业态监管研究 姚雪艳; 1-3 基于电力能源计量技术节能降耗应用研究 肖冰雪; 4-6 标准课堂《大众标准化》投稿:cnqikantg126.com 现代药物分析技术在中药鉴定中的应用与标准化探索 秦士慧;李婷; 7-9 基于装…

分布式操作系统入门:可的哥(Codigger)引领新潮流

早期,大型机系统盛行,随后个人计算机如Windows、Mac OS等操作系统普及。随着技术发展和计算需求增长,单机系统的局限显现,推动了分布式操作系统的崛起。操作系统演进显露出从单机到多机、从集中到分散的趋势。传统单机系统在处理大…

集合系列(二十六) -利用LinkedHashMap实现一个LRU缓存

一、什么是 LRU LRU是 Least Recently Used 的缩写,即最近最少使用,是一种常用的页面置换算法,选择最近最久未使用的页面予以淘汰。 简单的说就是,对于一组数据,例如:int[] a {1,2,3,4,5,6},…

【Linux】Centos升级到国产操作系统OpenAnolis

一、前言 Anolis OS 7生态上和依赖管理上保持跟CentOS7.x兼容,一键式迁移脚本centos2anolis.py,实现CentOS7.x到Anolis OS 7的平滑迁移 使用迁移脚本前需要注意如下事项: 迁移涉及到软件包的重新安装,是不可逆过程,…

破解神器!Mathtype7.6最新破解版下载体验教程

大家好啊,我是你们的种草机👩‍💼🚀!今天来分享一个超级给力的学术利器——Mathtype7.6最新破解版。作为一名经常和公式打交道的科研狗🐶🔬,这个软件简直是救星啊! Math…

cs144 LAB1 基于滑动窗口的碎片字节流重组器

一.StreamReassembler.capacity 的意义 StreamReassembler._capacity 的含义: ByteStream 的空间上限是 capacityStreamReassembler 用于暂存未重组字符串片段的缓冲区空间 StreamReassembler.buffer 上限也是 capacity蓝色部分代表了已经被上层应用读取的已重组数…

XL5300 dTOF测距模块 加镜头后可达7.6米测距距离 ±4%测距精度

XL5300 直接飞行时间(dToF)传感器是一个整体方案dTOF 模组,应用设计简单。片内集成了单光子雪崩二极管(SPAD)接收阵列以及VCSEL激光发射器。利用自主研发的 SPAD 和独特的ToF 采集与处理技术,XL5300模块可实…

无线麦克风哪个品牌音质最好,领夹麦克风品牌排行榜前十名推荐

​在数字化时代的背景下,声音的传播与记录变得日益重要。无论是会议室、教室还是户外场所,无线领夹麦克风凭借其便携性和稳定的连接性能,成为人们沟通表达的首选工具。面对众多选择,我为你精选了几款性能卓越且性价比高的无线领夹…

Minillama3->pt训练

GitHub - leeguandong/MiniLLaMA3: llama3的迷你版本,包括了数据,tokenizer,pt的全流程llama3的迷你版本,包括了数据,tokenizer,pt的全流程. Contribute to leeguandong/MiniLLaMA3 development by creating an account on GitHub.https://github.com/leeguandong/MiniLL…

org.springframework.boot:spring-boot-starter-parent:pom:2.3.4.RELEAS

前言 git上拉了一个项目构建过程中无论是clean还是install都报错 注:很看不惯某博主一点简单的经验分享都要开VIP才能查看的作风 org.springframework.boot:spring-boot-starter-parent:pom:2.3.4.RELEASE failed to transfer from https://maven.aliyun.com/rep…

关于INCA的几个实用功能

01--VUI窗口设计 这个可以按照自己的想法设计INCA观测或标定窗口 首先进入到INCA的环境内,点击实验→加载VUI窗口 选择空的窗口 打开后如下所示: 点击UI开发模式,如下图 如下: 添加标定量、观测量、示波器 窗口的大小需要在开发…

破除“数据孤岛”新策略:Data Fabric(数据编织)和逻辑数据平台

今天,我们已经进入到一个数据爆发的时代,仅 2022 年,我国数据产量就高达 8.1ZB,同比增长 22.7%,数据产量位居世界第二。数据作为新型生产资料,是企业数智化运营的基础,已快速融入到生产、分配、…

资源宝库网站!人人必备的神器!

面对网络中海量的内容,一个高效、便捷的网络导航工具,可以帮助我们快速查找使用网络资源。无论是职场精英还是学生党,使用导航网站都可以帮助我们提升效率。下面小编就来和大家分享一款资源宝库网站-办公人导航-实用的办公生活导航网站&#…

C++240618

1> 思维导图 2> 完善对话框,点击登录对话框, 如果账号和密码匹配,则弹出信息对话框,给出**提示”登录成功“** ,提供一个 **OK按钮**,用户点击**OK后**,**关闭登录界面**, 跳转…

【AI】如何改换Ollama的模型存储位置

【背景】 ollama在构筑AI应用时是用于统一管理模型库的核心组成部分。默认存放ollama模型库的位置是C盘的用户文件夹的.llama-》model下。但是这样C盘很容易占满。 插一句话,越来越觉得不分区有不分区的方便。 好了,有没有办法改变ollama的默认模型存放…

HarmonyOS 角落里的知识 —— 状态管理

一、前言 在探索 HarmonyOS 的过程中,我们发现了许多有趣且实用的功能和特性。有些总是在不经意间或者触类旁通的找到。或者是某些开发痛点。其中,状态管理是ArkUI开发非常核心的一个东西,我们进行了大量的使用和测试遇到了许多奇奇怪怪的问…