【云原生】Docker容器资源限制（CPU/内存/磁盘）

编辑

1.限制容器对内存的使用

2.限制容器对CPU的使用

3.block IO权重

4.实现容器的底层技术

1.cgroup

1.查看容器的ID

2.在文件中查找

2.namespace

1.Mount

2.UTS

3.IPC

4.PID

5.Network

6.User

1.限制容器对内存的使用

⼀个 docker host 上会运⾏若⼲容器，每个容器都需要 CPU、内存和 IO 资源。对于 KVM，VMware 等虚拟化技术，⽤户可以控制分配多少 CPU、内存资源给每个虚拟机。对于容器，Docker 也提供了类似的机制避免某个容器因占⽤太多资源⽽影响其他容器乃⾄整个 host 的性能。

内存限额与操作系统类似，容器可使⽤的内存包括两部分：物理内存和 swap。 Docker 通过下⾯两组参数来控制容器内存的使⽤量。

-m 或者--memory：设置内存的使用限额，例如100M，2G。
--memory-swap：设置内存+swap的使用限额。

当我们执行如下命令：

[root@localhost ~]# docker run -it -m 200M --memory-swap=300M centos:7

其含义是允许该容器最多使⽤ 200M 的内存和 100M 的 swap。默认情况下，上⾯两组参数为 -1，即对容器内存和 swap 的使⽤没有限制。

下⾯我们将使⽤ progrium/stress 镜像来学习如何为容器分配内存。该镜像可⽤于对容器执⾏压⼒测试。执⾏如下命令：

[root@localhost ~]# docker run -it -m 200M --memory-swap=300M progrium/stress --vm 1 --vm-bytes 280M

选项：
--vm 1：启动 1 个内存⼯作线程
--vm-bytes 280M：每个线程分配 280M 内存

因为299M在可分配的范围内，所以工作线程能够正常工作，过程是：

[root@localhost ~]# docker run -it -m 200M --memory-swap=300M progrium/stress --vm 1 --vm-bytes 299M
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogvm worker 1 [8] forked
stress: dbug: [8] allocating 313524224 bytes ...
stress: dbug: [8] touching bytes in strides of 4096 bytes ...
stress: dbug: [8] freed 313524224 bytes
stress: dbug: [8] allocating 313524224 bytes ...
stress: dbug: [8] touching bytes in strides of 4096 bytes ...
stress: dbug: [8] freed 313524224 bytes
stress: dbug: [8] allocating 313524224 bytes ...

分配299内存
释放299内存
再分配299内存
再释放299内存
一直循环

如果让工作线程分配的内存超过300M，结果如下：

[root@localhost ~]# docker run -it -m 200M --memory-swap=300M progrium/stress --vm 1 --vm-bytes 301M
stress: info: [1] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogvm worker 1 [7] forked
stress: dbug: [7] allocating 315621376 bytes ...
stress: dbug: [7] touching bytes in strides of 4096 bytes ...
stress: FAIL: [1] (416) <-- worker 7 got signal 9
stress: WARN: [1] (418) now reaping child worker processes
stress: FAIL: [1] (422) kill error: No such process
stress: FAIL: [1] (452) failed run completed in 0s

注意：分配的内存超过限额，stress线程报错，容器退出。

如果在启动容器时只指定 -m ⽽不指定 --memory-swap，那么 --memory-swap 默认为 -m 的两倍，⽐如：

docker run -it -m 200M centos

容器最多使⽤ 200M 物理内存和 200M swap。

2.限制容器对CPU的使用

默认设置下，所有容器可以平等地使⽤ host CPU 资源并且没有限制。 Docker 可以通过 -c 或 --cpu-shares 设置容器使⽤ CPU 的权重。如果不指定，默认值为 1024。与内存限额不同，通过 -c 设置的 cpu share 并不是 CPU 资源的绝对数量，⽽是⼀个相对的权重值。某个容器最终能分配到的 CPU 资源取决于它的 cpu share 占所有容器 cpu share 总和的⽐例。

换句话说：通过 cpu share 可以设置容器使⽤ CPU 的优先级。

⽐如在 host 中启动了两个容器：

[root@localhost ~]# docker run --name 'centos-1' -c 1024 --cpu 1 centos:7
[root@localhost ~]# docker run --name 'centos-2' -c 512 --cpu 1 centos:7

选项：
-c：CPU权重
--cpu：CPU核数

centos-1 的 cpu share 1024，是centos-2的两倍。当两个容器都需要 CPU 资源时，centos-1 可以得到的 CPU 是 centos-2 的两倍。

需要特别注意的是，这种按权重分配 CPU 只会发⽣在 CPU 资源紧张的情况下。如果centos-1 处于空闲状态，这时，为了充分利⽤ CPU 资源，centos-2 也可以分配到全部可⽤的 CPU。

例子：

启动 container_A，cpu share 为 1024：

[root@localhost ~]# docker run --name container_A -it -c 1024 progrium/stress --cpu 4
stress: info: [1] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 12000us
stress: dbug: [1] --> hogcpu worker 4 [7] forked
stress: dbug: [1] using backoff sleep of 9000us
stress: dbug: [1] --> hogcpu worker 3 [8] forked
stress: dbug: [1] using backoff sleep of 6000us
stress: dbug: [1] --> hogcpu worker 2 [9] forked
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogcpu worker 1 [10] forked

--cpu ⽤来设置⼯作线程的数量。因为当前 host 只有 1 颗 CPU，所以⼀个⼯作线程就能将 CPU 压满。如果 host 有多颗 CPU，则需要相应增加 --cpu 的数量。

启动 container_B，cpu share 为 512：

[root@localhost ~]# docker run --name container_B -it -c 512 progrium/stress --cpu 4
stress: info: [1] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogcpu worker 1 [7] forked

在 host 中执⾏ top，查看容器对 CPU 的使⽤情况：

[root@localhost ~]# top
top - 22:22:11 up 55 min,  4 users,  load average: 5.49, 2.17, 0.84
Tasks: 147 total,   9 running, 138 sleeping,   0 stopped,   0 zombie
%Cpu(s): 99.8 us,  0.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3865552 total,  2876864 free,   279432 used,   709256 buff/cache
KiB Swap:  2097148 total,  2095436 free,     1712 used.  3284060 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 11431 root      20   0    7308     96      0 R  66.4  0.0   1:59.51 stress
 11433 root      20   0    7308     96      0 R  66.4  0.0   2:00.93 stress
 11434 root      20   0    7308     96      0 R  66.4  0.0   1:57.31 stress
 11432 root      20   0    7308     96      0 R  66.1  0.0   2:00.42 stress
 12343 root      20   0    7308     96      0 R  33.6  0.0   0:05.46 stress
 12342 root      20   0    7308     96      0 R  33.2  0.0   0:05.40 stress
 12344 root      20   0    7308     96      0 R  33.2  0.0   0:05.48 stress
 12341 root      20   0    7308     96      0 R  32.9  0.0   0:05.40 stress

现在挂起container_A

[root@localhost ~]# docker pause container_A
container_A

top 显示 container_B 在 container_A 空闲的情况下能够⽤满整颗 CPU

3.block IO权重

默认情况下，所有容器能平等地读写磁盘，可以通过设置 --blkio-weight 参数来改变容器 block IO 的优先级。

--blkio-weight 与 --cpu-shares 类似，设置的是相对权重值，默认为 500。在下⾯的例⼦中，container_A 读写磁盘的带宽是 container_B 的两倍。

[root@localhost ~]# docker run -it --name container_A --blkio-weight 600 centos
[root@localhost ~]# docker run -it --name container_B --blkio-weight 300 centos

限制 bps 和 iops

bps 是 byte per second，每秒读写的数据量。
iops 是 io per second，每秒 IO 的次数。

可通过以下参数控制容器的 bps 和 iops：

--device-read-bps，限制读某个设备的 bps。
--device-write-bps，限制写某个设备的 bps。
--device-read-iops，限制读某个设备的 iops。
--device-write-iops，限制写某个设备的 iops。

例子：限制容器写/dev/sda的速率为30MB/s

[root@localhost ~]# docker run -it --device-write-bps /dev/sda:30MB centos:7
[root@85ec78d68f3d /]# dd if=/dev/zero of=1.txt bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 0.116481 s, 2.7 GB/s
[root@85ec78d68f3d /]# dd if=/dev/zero of=1.txt bs=1M count=300 oflag=direct
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 9.91958 s, 31.7 MB/s

通过 dd 测试在容器中写磁盘的速度。因为容器的⽂件系统是在 host /dev/sda 上的，在容器中写⽂件相当于对 host /dev/sda 进⾏写操作。另外，oflag=direct 指定⽤ direct IO ⽅式写⽂件，这样 --device-write-bps 才能⽣效。

4.实现容器的底层技术

为了更好地理解容器的特性，本节我们将讨论容器的底层实现技术。

cgroup 和 namespace 是最重要的两种技术。cgroup 实现资源限额， namespace 实现资源隔离。

1.cgroup

cgroup 全称 Control Group。Linux 操作系统通过 cgroup 可以设置进程使⽤ CPU、内存和 IO 资源的限额。-- cpu-shares、-m、--device-write-bps 实际上就是在配置 cgroup。

cgroup 到底⻓什么样⼦呢？我们可以在 /sys/fs/cgroup 中找到它。还是⽤例⼦来说明，启动⼀个容器，设置 -- cpu-shares=512：

1.查看容器的ID

[root@localhost ~]# docker ps -a --no-trunc
CONTAINER ID                                                       IMAGE      COMMAND       CREATED         STATUS                        PORTS     NAMES
85ec78d68f3d9daa9b5604845c18364760c5f1be506098e6cac81717eb0e3e69   centos:7   "/bin/bash"   5 minutes ago   Exited (127) 19 seconds ago             youthful_merkle

2.在文件中查找

在 /sys/fs/cgroup/cpu/docker ⽬录中，Linux 会为每个容器创建⼀个 cgroup ⽬录，以容器⻓ID 命名：

[root@localhost ~]# ls /sys/fs/cgroup/cpu/docker/414090cec75558162c3679b913af2c739b44bd6228f8184662e7a2e253f12a69/
cgroup.clone_children  cpuacct.stat           cpu.cfs_period_us      cpu.rt_runtime_us      notify_on_release
cgroup.event_control   cpuacct.usage          cpu.cfs_quota_us       cpu.shares             tasks
cgroup.procs           cpuacct.usage_percpu   cpu.rt_period_us       cpu.stat

⽬录中包含所有与 cpu 相关的 cgroup 配置，⽂件 cpu.shares 保存的就是 --cpu-shares 的配置，值为 512。同样的，/sys/fs/cgroup/memory/docker 和 /sys/fs/cgroup/blkio/docker 中保存的是内存以及 Block IO 的 cgroup 配置。

2.namespace

在每个容器中，我们都可以看到⽂件系统，⽹卡等资源，这些资源看上去是容器⾃⼰的。拿⽹卡来说，每个容器都会认为⾃⼰有⼀块独⽴的⽹卡，即使 host 上只有⼀块物理⽹卡。这种⽅式⾮常好，它使得容器更像⼀个独⽴的计算机。

Linux 实现这种⽅式的技术是 namespace。namespace 管理着 host 中全局唯⼀的资源，并可以让每个容器都觉得只有⾃⼰在使⽤它。换句话说，namespace 实现了容器间资源的隔离。

Linux 使⽤了六种 namespace，分别对应六种资源：Mount、UTS、IPC、PID、Network 和 User，下⾯我们分别讨论。

1.Mount

Mount namespace 让容器看上去拥有整个⽂件系统。

容器有⾃⼰的 / ⽬录，可以执⾏ mount 和 umount 命令。当然我们知道这些操作只在当前容器中⽣效，不会影响到 host 和其他容器。

2.UTS

简单的说，UTS namespace 让容器有⾃⼰的 hostname。默认情况下，容器的 hostname 是它的短ID，可以通过 -h 或 --hostname 参数设置。

[root@localhost ~]# docker run -it -h test1 centos:7
[root@test1 /]#

3.IPC

IPC namespace 让容器拥有⾃⼰的共享内存和信号量（semaphore）来实现进程间通信，⽽不会与 host 和其他容器的 IPC 混在⼀起。

4.PID

我们前⾯提到过，容器在 host 中以进程的形式运⾏。例如当前 host 中运⾏了两个容器：

[root@localhost ~]# docker inspect centos-2 -f '{{.State.Pid}}'
14300
[root@localhost ~]# docker inspect youthful_merkle -f '{{.State.Pid}}'
14568

⽽且进程的 PID 不同于 host 中对应进程的 PID，容器中 PID=1 的进程当然也不是 host 的 init 进程。也就是说：容器拥有⾃⼰独⽴的⼀套 PID，这就是 PID namespace 提供的功能。

5.Network

Network namespace 让容器拥有⾃⼰独⽴的⽹卡、IP、路由等资源

6.User

User namespace 让容器能够管理⾃⼰的⽤户，host 不能看到容器中创建的⽤户。