全局内存在片外。
特点是:容量最大、延迟最大、使用最多
全局内存中的数据是所有线程可见的,Host端可见,且具有与程序相同的生命周期
动态全局内存
主机代码使用CUDA运行时API : cudaMalloc 声明内存空间; cudaFree 释放全局内存
静态全局内存
使用__device__关键字声明静态全局内存,编译器在编译时就确定了数据。必须在主机函数和核函数的外部声明。核函数可以直接使用静态全局内存。
主机代码如果需要使用静态全局内存需要:
cudaMemcpyToSymbol : 将主机变量传递给静态全局变量
cudaMemcpyFromSymbol :将静态全局变量传递给主机变量
#include <cuda_runtime.h>
#include "cuda_runtime_api.h"
#include <iostream>
#include <device_launch_parameters.h>
static void CheckCudaErrorAux(const char*, unsigned, const char*, cudaError_t);
#define CUDA_CHECK_RETURN(value) CheckCudaErrorAux(__FILE__,__LINE__, #value, value)
__device__ int offsetx = 1;
__device__ int d_iVal[2];
__global__ void kernel()
{
d_iVal[0] += offsetx;
d_iVal[1] -= offsetx;
printf("offsetx = %d, d_iVal = (%d, %d)\n", offsetx, d_iVal[0], d_iVal[1]);
}
int main()
{
int h_iV[2] = { 10,20 };
printf("h_iV = (%d, %d)\n", h_iV[0], h_iV[1]);
CUDA_CHECK_RETURN(cudaMemcpyToSymbol(d_iVal, h_iV, sizeof(int) * 2));
dim3 grid(1);
dim3 block(1);
kernel <<<grid, block >>> ();
CUDA_CHECK_RETURN(cudaDeviceSynchronize());
CUDA_CHECK_RETURN(cudaMemcpyFromSymbol(h_iV, d_iVal, sizeof(int) * 2));
printf("h_iV = (%d, %d)\n", h_iV[0], h_iV[1]);
CUDA_CHECK_RETURN(cudaDeviceReset());
return 0;
}
static void CheckCudaErrorAux(const char* file, unsigned line, const char* statement, cudaError_t err)
{
if (err == cudaSuccess)
return;
std::cerr << statement << " returned: " << cudaGetErrorName(err) << " \t : " << cudaGetErrorString(err) << "(" << err << ") at " << file << ":" << line << std::endl;
exit(1);
}
下面摘抄cuda官方指导手册里,关于cudaDeviceReset和内存释放的一些描述:
关于cudaDeviceReset()的函数声明
host_ cudaError_t cudaDeviceReset ( void )
Destroy all allocations and reset all state on the current device in the current process.
关于cudaDeviceReset()被调用时的具体操作
When a host thread calls cudaDeviceReset(), this destroys the primary context of the device the host thread currently operates on (i.e., the current device as defined in Device Selection). The next runtime function call made by any host thread that has this device as current will create a new primary context for this device.
关于调用cudaDeviceReset()来释放内存
The memcheck tool can detect leaks of allocated memory.
Memory leaks are device side allocations that have not been freed by the time the context is destroyed. The memcheck tool tracks device memory allocations created using the CUDA driver or runtime APIs. Starting in CUDA 5, allocations that are created dynamically on the device heap by calling malloc() inside a kernel are also tracked.
For an accurate leak checking summary to be generated, the application’s CUDA context must be destroyed at the end. This can be done explicitly by calling cuCtxDestroy() in applications using the CUDA driver API, or by calling cudaDeviceReset() in applications programmed against the CUDA run time API.
The --leak-check full option must be specified to enable leak checking.
官方意思应该是,的确需要通过cuCtxDestroy或cudaDeviceReset对cuda申请的资源(CUDA contex)进行释放。
但是通过本篇出现的错误总结:在不熟悉cuda的情况下,释放cuda资源时、尤其是进行reset操作时,要注意释放时机;例如,可以在整个进程结束的时候,对cuda进行reset,毕竟GPU和CPU执行程序是异步的,且根据目前了解发现,不少cuda函数或工具,对gpu操作时、是直接对整块GPU操作、而不是其中的某“线程”。
cudaDeviceReset 是将cudaSetDevice所设置的当前关联的gpu设备,重置,即当前关联的gpu设备之前申请的资源都会被清空。
参考:
关于错误使用cudaDeviceReset()函数,导致多线程下cuda错误、进程崩溃的问题-CSDN博客cudaDeviceReset-CSDN博客