深入解析stressapptest源码的OsLayer：操作系统相关的抽象接口详解

深入解析stressapptest的OsLayer：操作系统相关的抽象接口详解

一、类概述
二、类属性
三、主要方法
四、功能架构
- 4.1、Initialize()接口
- 4.2、VirtualToPhysical()函数
- 4.3、FlushPageCache(void)函数
- 4.4、FastFlush()函数
- 4.5、FindDimm(uint64, char *, int)函数
- 4.6、FindHugePages()函数
- 4.7、FindFreeMemSize()函数
- 4.8、AllocateTestMem(int64, uint64)函数
- 4.9、PciOpen/Read/Write()
- 4.10、Open/Read/WriteMSR()
- 4.11、CpuStressWorkload()函数
五、总结

一、类概述

OsLayer类定义了一层操作系统相关的抽象接口,主要用于简化跨平台编程中的底层操作系统调用,隐藏底层系统的差异性。
在这里插入图片描述

二、类属性

用于管理内存和错误报告等功能。

testmem_:表示分配的测试内存块地址。
testmemsize_:表示分配内存块的大小。
totalmemsize_:系统可用内存总量。
min_hugepages_bytes_:最小hugepages大小要求。
error_diagnoser_:指向错误诊断器对象。

三、主要方法

实现了常见的底层系统操作封装。

Initialize():类初始化,检测CPU功能等。
VirtualToPhysical():虚拟到物理地址转换。
AllocateTestMem():分配测试内存。
FreeTestMem():释放测试内存。
PrepareTestMem():预处理测试内存。
ReleaseTestMem():释放预处理资源。
ErrorReport():错误报告。
PciRead/Write():读写PCI设备空间。
ReadMSR/WriteMSR():读写模型特定寄存器MSR。
CpuStressWorkload():CPU压力测试样例代码。

四、功能架构

OsLayer类定义了一个平台无关的底层系统操作接口层，提供了简单易用的API。

4.1、Initialize()接口

OsLayer::Initialize()函数主要用于OsLayer类的初始化工作，源代码：

// OsLayer initialization.
bool OsLayer::Initialize() {
 if (!clock_) {
   clock_ = new Clock();
 }

 time_initialized_ = clock_->Now();
 // Detect asm support.
 GetFeatures();

 if (num_cpus_ == 0) {
   num_nodes_ = 1;
   num_cpus_ = sysconf(_SC_NPROCESSORS_ONLN);
   num_cpus_per_node_ = num_cpus_ / num_nodes_;
 }
 logprintf(5, "Log: %d nodes, %d cpus.\n", num_nodes_, num_cpus_);
 cpu_sets_.resize(num_nodes_);
 cpu_sets_valid_.resize(num_nodes_);
 // Create error diagnoser.
 error_diagnoser_ = new ErrorDiag();
 if (!error_diagnoser_->set_os(this))
   return false;
 return true;
}

主要功能:

如果没有设置时钟对象clock_,会新建一个Clock对象。
获取CPU信息,如核数目num_cpus_等,调用GetFeatures()检测硬件支持特性。
如果num_cpus_为空,则通过系统调用_SC_NPROCESSORS_ONLN获取在线CPU个数。
初始化cpu_sets_，CPU集合。
创建ErrorDiag对象用于错误诊断。
调用ErrorDiag的set_os()方法设置OsLayer对象,完成OsLayer和ErrorDiag对象之间的关联。

Initialize()负责OsLayer对象内部数据结构如cpu_sets_的初始化,以及与ErrorDiag对象的关联;它还调用GetFeatures()获取CPU支持信息,为后续其他接口提供基础。Initialize()可看作是OsLayer类所有接口的基础初始化过程。

OsLayer::GetFeatures()函数主要用于检测硬件支持的各种指令集特性，源代码：

// Get HW core features from cpuid instruction.
void OsLayer::GetFeatures() {
#if defined(STRESSAPPTEST_CPU_X86_64) || defined(STRESSAPPTEST_CPU_I686)
 unsigned int eax = 1, ebx, ecx, edx;
 cpuid(&eax, &ebx, &ecx, &edx);
 has_clflush_ = (edx >> 19) & 1;
 has_vector_ = (edx >> 26) & 1;  // SSE2 caps bit.

 logprintf(9, "Log: has clflush: %s, has sse2: %s\n",
           has_clflush_ ? "true" : "false",
           has_vector_ ? "true" : "false");
#elif defined(STRESSAPPTEST_CPU_PPC)
 // All PPC implementations have cache flush instructions.
 has_clflush_ = true;
#elif defined(STRESSAPPTEST_CPU_MIPS)
 // All MIPS implementations have cache flush instructions.
 has_clflush_ = true;
#elif defined(STRESSAPPTEST_CPU_ARMV7A) || defined(STRESSAPPTEST_CPU_AARCH64)
 // TODO(nsanders): add detect from /proc/cpuinfo or /proc/self/auxv.
 // For now assume neon and don't run -W if you don't have it.
 has_vector_ = true; // NEON.
#elif defined(STRESSAPPTEST_CPU_LOONGARCH)
 has_clflush_ = true;
#else
#warning "Unsupported CPU type: unable to determine feature set."
#endif
}

该函数利用编译选项和平台特定方法,动态获知CPU支持的指令集,为后续根据平台进行条件编译或优化指令选择提供基础。它完成了OsLayer针对不同CPU试验其支持能力的初步探测工作。具体如下:

根据编译选项,判断运行的CPU类型,如X86,ARM等。
对X86体系结构,通过CPUID指令获取CPU特征BIT,检测是否支持CLFLUSH和SSE2指令。
对其他体系结构,通过宏定义或者警告信息预定义是否支持Cache Flush指令。
将检测结果保存到类成员变量has_clflush_和has_vector_中。

4.2、VirtualToPhysical()函数

OsLayer::VirtualToPhysical()函数用于将虚拟地址转换为相应的物理地址。源代码：

// Translates user virtual to physical address.
uint64 OsLayer::VirtualToPhysical(void *vaddr) {
 uint64 frame, paddr, pfnmask, pagemask;
 int pagesize = sysconf(_SC_PAGESIZE);
 off_t off = ((uintptr_t)vaddr) / pagesize * 8;
 int fd = open(kPagemapPath, O_RDONLY);

 if (fd < 0)
   return 0;

 if (lseek(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
   int err = errno;
   string errtxt = ErrorString(err);
   logprintf(0, "Process Error: failed to access %s with errno %d (%s)\n",
             kPagemapPath, err, errtxt.c_str());
   if (fd >= 0)
     close(fd);
   return 0;
 }
 close(fd);

 /* Check if page is present and not swapped. */
 if (!(frame & (1ULL << 63)) || (frame & (1ULL << 62)))
   return 0;

 /* pfn is bits 0-54. */
 pfnmask = ((1ULL << 55) - 1);
 /* Pagesize had better be a power of 2. */
 pagemask = pagesize - 1;

 paddr = ((frame & pfnmask) * pagesize) | ((uintptr_t)vaddr & pagemask);
 return paddr;
}

该函数通过读取/proc/pid/pagemap文件,获得虚拟地址对应的物理页框号,并按位运算获得相应的物理地址。

主要功能:

检查vaddr参数是否在有效地址范围内。
通过系统调用lseek和read访问/proc/pid/pagemap文件,读取对应页表项获得物理页框号frame。
按位与获得页内地址,页框号左移页大小并相加获得最终物理地址。
返回转换成功获得的物理地址,失败返回0。

4.3、FlushPageCache(void)函数

这个函数的作用是清空页面缓存，以确保读取操作来自磁盘而不是缓存。

// Flush the page cache to ensure reads come from the disk.
bool OsLayer::FlushPageCache(void) {
  if (!use_flush_page_cache_)
    return true;

  // First, ask the kernel to write the cache to the disk.
  sync();

  // Second, ask the kernel to empty the cache by writing "1" to
  // "/proc/sys/vm/drop_caches".
  static const char *drop_caches_file = "/proc/sys/vm/drop_caches";
  int dcfile = open(drop_caches_file, O_WRONLY);
  if (dcfile < 0) {
    int err = errno;
    string errtxt = ErrorString(err);
    logprintf(3, "Log: failed to open %s - err %d (%s)\n",
              drop_caches_file, err, errtxt.c_str());
    return false;
  }

  ssize_t bytes_written = write(dcfile, "1", 1);
  close(dcfile);

  if (bytes_written != 1) {
    int err = errno;
    string errtxt = ErrorString(err);
    logprintf(3, "Log: failed to write %s - err %d (%s)\n",
              drop_caches_file, err, errtxt.c_str());
    return false;
  }
  return true;
}

函数首先检查是否允许使用flush_page_cache_，如果不允许，则直接返回true。接下来，函数调用了sync()函数，要求内核将缓存写入磁盘。然后打开"/proc/sys/vm/drop_caches"文件，并写入"1"来清空缓存。如果这一步操作成功，函数返回true；如果失败，将会打印日志并返回false。

4.4、FastFlush()函数

用于在性能关键的代码中快速清空CPU缓存。根据不同的CPU架构，函数会执行不同的指令来实现缓存清空操作。

在PPC架构中，使用dcbf和sync指令来清空缓存。
在X86_64或I686架构中，使用了_mm_mfence、_mm_clflush和_mm_mfence指令来清空缓存。
在MIPS架构中，使用了syscall指令调用__NR_cacheflush来清空缓存。
在ARMV7A架构中，使用了syscall指令调用__ARM_NR_cacheflush来清空缓存。
在AARCH64架构中，使用了asm volatile指令来执行多条指令序列，包括dc、dsb、ic、isb等指令来清空缓存。
在LOONGARCH架构中，使用了ibar指令来清空缓存。

对于不支持的CPU架构，则会发出警告并无法进行缓存清空操作。

  // Fast flush, for use in performance critical code.
  // This is bound at compile time, and will not pick up
  // any runtime machine configuration info.
  inline static void FastFlush(void *vaddr) {
#ifdef STRESSAPPTEST_CPU_PPC
    asm volatile("dcbf 0,%0; sync" : : "r" (vaddr));
#elif defined(STRESSAPPTEST_CPU_X86_64) || defined(STRESSAPPTEST_CPU_I686)
    // Put mfence before and after clflush to make sure:
    // 1. The write before the clflush is committed to memory bus;
    // 2. The read after the clflush is hitting the memory bus.
    //
    // From Intel manual:
    // CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
    // to be ordered by any other fencing, serializing or other CLFLUSH
    // instruction. For example, software can use an MFENCE instruction to
    // insure that previous stores are included in the write-back.
    _mm_mfence();
    _mm_clflush(vaddr);
    _mm_mfence();
#elif defined(STRESSAPPTEST_CPU_MIPS)
    syscall(__NR_cacheflush, vaddr, 32, 0);
#elif defined(STRESSAPPTEST_CPU_ARMV7A)
    // ARMv7a cachelines are 8 words (32 bytes).
    syscall(__ARM_NR_cacheflush, vaddr, reinterpret_cast<char*>(vaddr) + 32, 0);
#elif defined(STRESSAPPTEST_CPU_AARCH64)
    asm volatile("dc cvau, %0" : : "r" (vaddr));
    asm volatile("dsb ish");
    asm volatile("ic ivau, %0" : : "r" (vaddr));
    asm volatile("dsb ish");
    asm volatile("isb");
#elif defined(STRESSAPPTEST_CPU_LOONGARCH)
    // Reference linux kernel: arch/loongarch/mm/cache.c
    asm volatile("ibar 0");
#else
  #warning "Unsupported CPU type: Unable to force cache flushes."
#endif
  }

4.5、FindDimm(uint64, char *, int)函数

用于将物理地址转换为内存模块/芯片的名称。

// Translate physical address to memory module/chip name.
// Assumes interleaving between two memory channels based on the XOR of
// all address bits in the 'channel_hash' mask, with repeated 'channel_width_'
// blocks with bits distributed from each chip in that channel.
int OsLayer::FindDimm(uint64 addr, char *buf, int len) {
  if (!channels_) {
    snprintf(buf, len, "DIMM Unknown");
    return -1;
  }

  // Find channel by XORing address bits in channel_hash mask.
  uint32 low = static_cast<uint32>(addr & channel_hash_);
  uint32 high = static_cast<uint32>((addr & channel_hash_) >> 32);
  vector<string>& channel = (*channels_)[
      __builtin_parity(high) ^ __builtin_parity(low)];

  // Find dram chip by finding which byte within the channel
  // by address mod channel width, then divide the channel
  // evenly among the listed dram chips. Note, this will not work
  // with x4 dram.
  int chip = (addr % (channel_width_ / 8)) /
             ((channel_width_ / 8) / channel.size());
  string name = channel[chip];
  snprintf(buf, len, "%s", name.c_str());
  return 1;
}

函数首先检查是否存在memory channels，如果不存在，则将buf中的内容设置为"DIMM Unknown"并返回-1。

接下来，函数通过将地址位与channel_hash_掩码进行XOR运算来找到对应的channel。然后根据计算得到的channel索引查找对应的内存channel，并计算DRAM芯片的位置。在计算DRAM芯片的位置时，函数通过取模运算找到地址在channel中的位置，并按channel的大小平均分配DRAM芯片。需要注意的是，这种分配方式不适用于x4类型的DRAM。

最后，函数将找到的DRAM芯片的名称存入buf中，并返回1表示成功。

4.6、FindHugePages()函数

用于从内核接口中proc文件系统中读取系统中可用的大页面(hugepages)的数量。

// Read the number of hugepages out of the kernel interface in proc.
int64 OsLayer::FindHugePages() {
  char buf[65] = "0";

  // This is a kernel interface to query the numebr of hugepages
  // available in the system.
  static const char *hugepages_info_file = "/proc/sys/vm/nr_hugepages";
  int hpfile = open(hugepages_info_file, O_RDONLY);

  ssize_t bytes_read = read(hpfile, buf, 64);
  close(hpfile);

  if (bytes_read <= 0) {
    logprintf(12, "Log: /proc/sys/vm/nr_hugepages "
                  "read did not provide data\n");
    return 0;
  }

  if (bytes_read == 64) {
    logprintf(0, "Process Error: /proc/sys/vm/nr_hugepages "
                 "is surprisingly large\n");
    return 0;
  }

  // Add a null termintation to be string safe.
  buf[bytes_read] = '\0';
  // Read the page count.
  int64 pages = strtoull(buf, NULL, 10);  // NOLINT

  return pages;
}

函数首先打开了"/proc/sys/vm/nr_hugepages"文件，并以只读方式读取文件中的内容到buf中。然后关闭了文件句柄。

接着，函数检查了读取的字节数。如果没有读取到任何数据，则会打印错误日志并返回0。如果读取到的字节数为64（即满了整个buf），也会打印警告日志并返回0。

然后，函数在buf中补充了一个空终止字符，以确保buf中的内容是一个合法的C字符串。接着，使用strtoull函数将buf中的内容转换为int64类型的数字，并将结果返回。

4.7、FindFreeMemSize()函数

用于确定可用的系统内存大小。

int64 OsLayer::FindFreeMemSize() {
  int64 size = 0;
  int64 minsize = 0;
  if (totalmemsize_ > 0)
    return totalmemsize_;

  int64 pages = sysconf(_SC_PHYS_PAGES);
  int64 avpages = sysconf(_SC_AVPHYS_PAGES);
  int64 pagesize = sysconf(_SC_PAGESIZE);
  int64 physsize = pages * pagesize;
  int64 avphyssize = avpages * pagesize;

  // Assume 2MB hugepages.
  int64 hugepagesize = FindHugePages() * 2 * kMegabyte;

  if ((pages == -1) || (pagesize == -1)) {
    logprintf(0, "Process Error: sysconf could not determine memory size.\n");
    return 0;
  }

  // We want to leave enough stuff for things to run.
  // If the user specified a minimum amount of memory to expect, require that.
  // Otherwise, if more than 2GB is present, leave 192M + 5% for other stuff.
  // If less than 2GB is present use 85% of what's available.
  // These are fairly arbitrary numbers that seem to work OK.
  //
  // TODO(nsanders): is there a more correct way to determine target
  // memory size?
  if (hugepagesize > 0) {
    if (min_hugepages_bytes_ > 0) {
      minsize = min_hugepages_bytes_;
    } else {
      minsize = hugepagesize;
    }
  } else {
    if (physsize < 2048LL * kMegabyte) {
      minsize = ((pages * 85) / 100) * pagesize;
    } else {
      minsize = ((pages * 95) / 100) * pagesize - (192 * kMegabyte);
    }
    // Make sure that at least reserve_mb_ is left for the system.
    if (reserve_mb_ > 0) {
      int64 totalsize = pages * pagesize;
      int64 reserve_kb = reserve_mb_ * kMegabyte;
      if (reserve_kb > totalsize) {
        logprintf(0, "Procedural Error: %lld is bigger than the total memory "
                  "available %lld\n", reserve_kb, totalsize);
      } else if (reserve_kb > totalsize - minsize) {
        logprintf(5, "Warning: Overriding memory to use: original %lld, "
                  "current %lld\n", minsize, totalsize - reserve_kb);
        minsize = totalsize - reserve_kb;
      }
    }
  }

  // Use hugepage sizing if available.
  if (hugepagesize > 0) {
    if (hugepagesize < minsize) {
      logprintf(0, "Procedural Error: Not enough hugepages. "
                   "%lldMB available < %lldMB required.\n",
                hugepagesize / kMegabyte,
                minsize / kMegabyte);
      // Require the calculated minimum amount of memory.
      size = minsize;
    } else {
      // Require that we get all hugepages.
      size = hugepagesize;
    }
  } else {
    // Require the calculated minimum amount of memory.
    size = minsize;
  }

  logprintf(5, "Log: Total %lld MB. Free %lld MB. Hugepages %lld MB. "
               "Targeting %lld MB (%lld%%)\n",
            physsize / kMegabyte,
            avphyssize / kMegabyte,
            hugepagesize / kMegabyte,
            size / kMegabyte,
            size * 100 / physsize);

  totalmemsize_ = size;
  return size;
}

首先会使用sysconf函数查询物理页面数量、可用物理页面数量和页面大小。然后根据这些参数计算出系统的物理内存大小和可用物理内存大小。同时，还会调用FindHugePages方法获得大页面(hugepages)的大小。

接下来会根据不同的条件对可用内存大小进行求值：

如果系统支持大页面(hugepages)，则根据不同的条件设置最小内存需求大小minsize。
如果不支持大页面，根据物理内存的大小进行不同的计算设置minsize，并通过reserve_mb_设置保留内存大小。
最终确定size的值，如果支持大页面并可用大页面内存满足要求，则size设为hugepagesize，否则设为minsize。

最后将最终的总内存大小存储在totalmemsize_ 中，然后返回可用内存大小。

4.8、AllocateTestMem(int64, uint64)函数

通过不同的内存分配策略,如系统页、大页面、mmap等为长度为length的内存块分配物理内存,并返回分配结果。

源代码:

// Allocate the target memory. This may be from malloc, hugepage pool
// or other platform specific sources.
bool OsLayer::AllocateTestMem(int64 length, uint64 paddr_base) {
  // Try hugepages first.
  void *buf = 0;

  sat_assert(length >= 0);

  if (paddr_base)
    logprintf(0, "Process Error: non zero paddr_base %#llx is not supported,"
              " ignore.\n", paddr_base);

  // Determine optimal memory allocation path.
  bool prefer_hugepages = false;
  bool prefer_posix_shm = false;
  bool prefer_dynamic_mapping = false;

  // Are there enough hugepages?
  int64 hugepagesize = FindHugePages() * 2 * kMegabyte;
  // TODO(nsanders): Is there enough /dev/shm? Is there enough free memeory?
  if ((length >= 1400LL * kMegabyte) && (address_mode_ == 32)) {
    prefer_dynamic_mapping = true;
    prefer_posix_shm = true;
    logprintf(3, "Log: Prefer POSIX shared memory allocation.\n");
    logprintf(3, "Log: You may need to run "
                 "'sudo mount -o remount,size=100\% /dev/shm.'\n");
  } else if (hugepagesize >= length) {
    prefer_hugepages = true;
    logprintf(3, "Log: Prefer using hugepage allocation.\n");
  } else {
    logprintf(3, "Log: Prefer plain malloc memory allocation.\n");
  }

#ifdef HAVE_SYS_SHM_H
  // Allocate hugepage mapped memory.
  if (prefer_hugepages) {
    do { // Allow break statement.
      int shmid;
      void *shmaddr;

      if ((shmid = shmget(2, length,
              SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) {
        int err = errno;
        string errtxt = ErrorString(err);
        logprintf(3, "Log: failed to allocate shared hugepage "
                      "object - err %d (%s)\n",
                  err, errtxt.c_str());
        logprintf(3, "Log: sysctl -w vm.nr_hugepages=XXX allows hugepages.\n");
        break;
      }

      shmaddr = shmat(shmid, NULL, 0);
      if (shmaddr == reinterpret_cast<void*>(-1)) {
        int err = errno;
        string errtxt = ErrorString(err);
        logprintf(0, "Log: failed to attach shared "
                     "hugepage object - err %d (%s).\n",
                  err, errtxt.c_str());
        if (shmctl(shmid, IPC_RMID, NULL) < 0) {
          int err = errno;
          string errtxt = ErrorString(err);
          logprintf(0, "Log: failed to remove shared "
                       "hugepage object - err %d (%s).\n",
                    err, errtxt.c_str());
        }
        break;
      }
      use_hugepages_ = true;
      shmid_ = shmid;
      buf = shmaddr;
      logprintf(0, "Log: Using shared hugepage object 0x%x at %p.\n",
                shmid, shmaddr);
    } while (0);
  }

  if ((!use_hugepages_) && prefer_posix_shm) {
    do {
      int shm_object;
      void *shmaddr = NULL;

      shm_object = shm_open("/stressapptest", O_CREAT | O_RDWR, S_IRWXU);
      if (shm_object < 0) {
        int err = errno;
        string errtxt = ErrorString(err);
        logprintf(3, "Log: failed to allocate shared "
                      "smallpage object - err %d (%s)\n",
                  err, errtxt.c_str());
        break;
      }

      if (0 > ftruncate(shm_object, length)) {
        int err = errno;
        string errtxt = ErrorString(err);
        logprintf(3, "Log: failed to ftruncate shared "
                      "smallpage object - err %d (%s)\n",
                  err, errtxt.c_str());
        break;
      }

      // 32 bit linux apps can only use ~1.4G of address space.
      // Use dynamic mapping for allocations larger than that.
      // Currently perf hit is ~10% for this.
      if (prefer_dynamic_mapping) {
        dynamic_mapped_shmem_ = true;
      } else {
        // Do a full mapping here otherwise.
        shmaddr = mmap(NULL, length, PROT_READ | PROT_WRITE,
                       MAP_SHARED | MAP_NORESERVE | MAP_LOCKED | MAP_POPULATE,
                       shm_object, 0);
        if (shmaddr == reinterpret_cast<void*>(-1)) {
          int err = errno;
          string errtxt = ErrorString(err);
          logprintf(0, "Log: failed to map shared "
                       "smallpage object - err %d (%s).\n",
                    err, errtxt.c_str());
          break;
        }
      }

      use_posix_shm_ = true;
      shmid_ = shm_object;
      buf = shmaddr;
      char location_message[256] = "";
      if (dynamic_mapped_shmem_) {
        sprintf(location_message, "mapped as needed");
      } else {
        sprintf(location_message, "at %p", shmaddr);
      }
      logprintf(0, "Log: Using posix shared memory object 0x%x %s.\n",
                shm_object, location_message);
    } while (0);
    shm_unlink("/stressapptest");
  }
#endif  // HAVE_SYS_SHM_H

  if (!use_hugepages_ && !use_posix_shm_) {
    // If the page size is what SAT is expecting explicitly perform mmap()
    // allocation.
    if (sysconf(_SC_PAGESIZE) >= 4096) {
      void *map_buf = mmap(NULL, length, PROT_READ | PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      if (map_buf != MAP_FAILED) {
        buf = map_buf;
        mmapped_allocation_ = true;
        logprintf(0, "Log: Using mmap() allocation at %p.\n", buf);
      }
    }
    if (!mmapped_allocation_) {
      // Use memalign to ensure that blocks are aligned enough for disk direct
      // IO.
      buf = static_cast<char*>(memalign(4096, length));
      if (buf) {
        logprintf(0, "Log: Using memaligned allocation at %p.\n", buf);
      } else {
        logprintf(0, "Process Error: memalign returned 0\n");
        if ((length >= 1499LL * kMegabyte) && (address_mode_ == 32)) {
          logprintf(0, "Log: You are trying to allocate > 1.4G on a 32 "
                       "bit process. Please setup shared memory.\n");
        }
      }
    }
  }

  testmem_ = buf;
  if (buf || dynamic_mapped_shmem_) {
    testmemsize_ = length;
  } else {
    testmemsize_ = 0;
  }

  return (buf != 0) || dynamic_mapped_shmem_;
}

主要功能:

判断条件优先使用大页面分配,否则考虑posix共享内存或普通malloc分配。
使用shmget分配大页面存储,或shm_open创建posix共享内存对象。
通过mmap或memalign为非大页面 scenarios分配内存。
返回是否成功分配内存结果。

4.9、PciOpen/Read/Write()

主要涉及对PCI总线进行读写操作的函数。

PciOpen函数，用于以可读可写方式打开PCI总线上的设备文件，并返回文件描述符。如果文件打开失败，将记录错误日志并返回-1。
PciRead函数，用于从PCI设备的配置空间中读取数据。函数会根据给定的文件描述符、偏移量和数据宽度执行对应的读取操作。读取完成之后，会返回读取的数据。若在读取过程中发生错误，将会记录错误日志并返回0。
PciWrite函数，用于向PCI设备的配置空间中写入数据。函数会根据给定的文件描述符、偏移量和数据宽度将数据写入到配置空间中。如果写入发生错误（比如寻址错误或写入数据不完整），将会记录错误日志。
。

// Open a PCI bus-dev-func as a file and return its file descriptor.
// Error is indicated by return value less than zero.
int OsLayer::PciOpen(int bus, int device, int function) {
  char dev_file[256];

  snprintf(dev_file, sizeof(dev_file), "/proc/bus/pci/%02x/%02x.%x",
           bus, device, function);

  int fd = open(dev_file, O_RDWR);
  if (fd == -1) {
    logprintf(0, "Process Error: Unable to open PCI bus %d, device %d, "
                 "function %d (errno %d).\n",
              bus, device, function, errno);
    return -1;
  }

  return fd;
}


// Read and write functions to access PCI config.
uint32 OsLayer::PciRead(int fd, uint32 offset, int width) {
  // Strict aliasing rules lawyers will cause data corruption
  // on cast pointers in some gccs.
  union {
    uint32 l32;
    uint16 l16;
    uint8 l8;
  } datacast;
  datacast.l32 = 0;
  uint32 size = width / 8;

  sat_assert((width == 32) || (width == 16) || (width == 8));
  sat_assert(offset <= (256 - size));

  if (lseek(fd, offset, SEEK_SET) < 0) {
    logprintf(0, "Process Error: Can't seek %x\n", offset);
    return 0;
  }
  if (read(fd, &datacast, size) != static_cast<ssize_t>(size)) {
    logprintf(0, "Process Error: Can't read %x\n", offset);
    return 0;
  }

  // Extract the data.
  switch (width) {
    case 8:
      sat_assert(&(datacast.l8) == reinterpret_cast<uint8*>(&datacast));
      return datacast.l8;
    case 16:
      sat_assert(&(datacast.l16) == reinterpret_cast<uint16*>(&datacast));
      return datacast.l16;
    case 32:
      return datacast.l32;
  }
  return 0;
}

void OsLayer::PciWrite(int fd, uint32 offset, uint32 value, int width) {
  // Strict aliasing rules lawyers will cause data corruption
  // on cast pointers in some gccs.
  union {
    uint32 l32;
    uint16 l16;
    uint8 l8;
  } datacast;
  datacast.l32 = 0;
  uint32 size = width / 8;

  sat_assert((width == 32) || (width == 16) || (width == 8));
  sat_assert(offset <= (256 - size));

  // Cram the data into the right alignment.
  switch (width) {
    case 8:
      sat_assert(&(datacast.l8) == reinterpret_cast<uint8*>(&datacast));
      datacast.l8 = value;
    case 16:
      sat_assert(&(datacast.l16) == reinterpret_cast<uint16*>(&datacast));
      datacast.l16 = value;
    case 32:
      datacast.l32 = value;
  }

  if (lseek(fd, offset, SEEK_SET) < 0) {
    logprintf(0, "Process Error: Can't seek %x\n", offset);
    return;
  }
  if (write(fd, &datacast, size) != static_cast<ssize_t>(size)) {
    logprintf(0, "Process Error: Can't write %x to %x\n", datacast.l32, offset);
    return;
  }

  return;
}

（1）PciRead()是一个用于从PCI设备的配置空间中读取数据的函数。主要作用是根据给定的文件描述符fd、偏移量offset和数据宽度width来从PCI设备的配置空间中读取数据。函数中定义了一个匿名的union结构体datacast，用于根据不同的数据宽度进行数据的存储。然后计算出数据的字节大小size。函数使用lseek将文件指针定位到指定的偏移量，然后使用read函数从文件中读取size字节的数据，读取的数据被存储在datacast变量中。根据传入的数据宽度，利用switch语句返回对应的数据值。如果读取失败或宽度不合法的话，函数会记录错误并返回0。

（2）PciWrite()是一个用于向PCI设备的配置空间中写入数据的函数。主要作用是根据给定的文件描述符fd、偏移量offset、数据宽度width和要写入的值value，将数据写入到PCI设备的配置空间中。函数中定义了一个匿名的union结构体datacast，用于根据不同的数据宽度进行数据的存储。然后计算出数据的字节大小size。根据传入的数据宽度，利用switch语句将要写入的值value存入union结构体datacast中。然后使用lseek将文件指针定位到指定的偏移量，使用write函数将datacast中的数据写入到配置空间中。如果写入失败或者宽度不合法的话，函数会记录错误并返回。

4.10、Open/Read/WriteMSR()

用于MSR（Model Specific Register，特定于处理器模型的寄存器）的打开、读取和写入的功能。

OpenMSR函数负责打开特定核心（core）的MSR设备，并设置偏移量为指定的地址。首先构建MSR文件路径，然后使用open函数打开文件。如果打开失败，则直接返回错误代码，否则将文件指针移动到指定地址，若偏移位置与指定地址不一致则关闭文件，并记录错误日志后返回 -1 。
ReadMSR函数调用OpenMSR打开MSR文件，然后读取数据，将结果存储在参数data所指向的内存区域。返回真值表示读取成功，为假表示读取失败，会记录错误日志。不管是否成功读取，函数都会关闭MSR文件。
WriteMSR函数与ReadMSR类似，也会调用OpenMSR打开MSR文件，然后将数据写入到MSR，返回真值表示写入成功，为假表示写入失败，会记录错误日志。无论写入结果如何，函数都会关闭MSR文件。

总的来说，这些函数实现了对特定核心的MSR的打开、读取和写入操作，并在操作失败时记录了错误日志，对于错误的处理比较严谨。

// Open dev msr.
int OsLayer::OpenMSR(uint32 core, uint32 address) {
  char buf[256];
  snprintf(buf, sizeof(buf), "/dev/cpu/%d/msr", core);
  int fd = open(buf, O_RDWR);
  if (fd < 0)
    return fd;

  uint32 pos = lseek(fd, address, SEEK_SET);
  if (pos != address) {
    close(fd);
    logprintf(5, "Log: can't seek to msr %x, cpu %d\n", address, core);
    return -1;
  }

  return fd;
}

bool OsLayer::ReadMSR(uint32 core, uint32 address, uint64 *data) {
  int fd = OpenMSR(core, address);
  if (fd < 0)
    return false;

  // Read from the msr.
  bool res = (sizeof(*data) == read(fd, data, sizeof(*data)));

  if (!res)
    logprintf(5, "Log: Failed to read msr %x core %d\n", address, core);

  close(fd);

  return res;
}

bool OsLayer::WriteMSR(uint32 core, uint32 address, uint64 *data) {
  int fd = OpenMSR(core, address);
  if (fd < 0)
    return false;

  // Write to the msr
  bool res = (sizeof(*data) == write(fd, data, sizeof(*data)));

  if (!res)
    logprintf(5, "Log: Failed to write msr %x core %d\n", address, core);

  close(fd);

  return res;
}

4.11、CpuStressWorkload()函数

用于模拟CPU上的负载工作。作用是模拟CPU上的负载工作，通过浮点数数组的移动平均值计算来增加CPU负载。

该方法会初始化一个包含100个双精度浮点数的数组float_arr，并计算它们的移动平均值。在初始化数组时，对数组进行了一次随机生成并赋值操作。这里使用了条件编译，根据HAVE_RAND_R宏的定义来使用不同的随机数生成函数。如果定义了HAVE_RAND_R，就会使用rand_r函数进行随机数生成，否则使用srand和rand函数。同时，对数组的一部分元素进行了取反操作。接下来，有一个循环，循环内对数组的元素进行了移动平均值的计算，并对计算结果求和。最后，有一个条件判断用于打印日志。

这样做是为了利用CPU的计算资源，让CPU处于一个繁重的计算状态，以便进行性能测试或者负载测试。

// Generic CPU stress workload that would work on any CPU/Platform.
// Float-point array moving average calculation.
bool OsLayer::CpuStressWorkload() {
  double float_arr[100];
  double sum = 0;
#ifdef HAVE_RAND_R
  unsigned int seed = 12345;
#endif

  // Initialize array with random numbers.
  for (int i = 0; i < 100; i++) {
#ifdef HAVE_RAND_R
    float_arr[i] = rand_r(&seed);
    if (rand_r(&seed) % 2)
      float_arr[i] *= -1.0;
#else
    srand(time(NULL));
    float_arr[i] = rand();  // NOLINT
    if (rand() % 2)         // NOLINT
      float_arr[i] *= -1.0;
#endif
  }

  // Calculate moving average.
  for (int i = 0; i < 100000000; i++) {
    float_arr[i % 100] =
      (float_arr[i % 100] + float_arr[(i + 1) % 100] +
       float_arr[(i + 99) % 100]) / 3;
    sum += float_arr[i % 100];
  }

  // Artificial printf so the loops do not get optimized away.
  if (sum == 0.0)
    logprintf(12, "Log: I'm Feeling Lucky!\n");
  return true;
}

五、总结

stressapptest的OsLayer模块提供了对操作系统相关抽象接口的封装和实现，为stressapptest工具在不同操作系统平台上的性能测试和压力测试提供了统一的适配接口。该模块中的方法涉及了文件操作、内存管理、PCI总线访问、MSR（Model Specific Register，特定于处理器模型的寄存器）操作等操作系统底层资源的封装和调用，为stressapptest工具的跨平台性能提供了有力支持。通过对该模块进行深入分析可以更好地理解stressapptest工具在不同平台上的工作原理和性能测试逻辑，为在实际使用过程中进行性能测试和调优提供参考。

在这里插入图片描述