【k8s 高级调度--污点和容忍】

1、调度概念

在 Kubernetes 中，调度（scheduling）指的是确保 Pod 匹配到合适的节点，以便 kubelet 能够运行它们。抢占（Preemption）指的是终止低优先级的 Pod 以便高优先级的 Pod 可以调度运行的过程。驱逐（Eviction）是在资源匮乏的节点上，主动让一个或多个 Pod 失效的过程。

2、CronJob 计划任务

在k8s中周期性运行计划任务，与linux中的crontab相同

注意点：CronJob执行的时间是controllerr-manager的时间，所以一定要确保controller-manager时间是准确的。

2.1 配置文件

apiVersion: batch/v1
kind: CronJob # 定时任务
metadata:
  name: cron-job-test # 定时任务名字
spec:
  concurrencyPolicy: Allow # 并发调度策略：Allow 允许并发调度，Forbid：不允许并发执行，Replace：如果之前的任务还没执行完，就直接执行新的，放弃上一个任务
  failedJobsHistoryLimit: 1 # 保留多少个失败的任务
  successfulJobsHistoryLimit: 3 # 保留多少个成功的任务
  suspend: false # 是否挂起任务，若为 true 则该任务不会执行
#  startingDeadlineSeconds: 30 # 间隔多长时间检测失败的任务并重新执行，时间不能小于 10
  schedule: "* * * * *" # 调度策略
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: busybox
            image: busybox:1.28
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

2.2 CronJob执行

[root@k8s-master job]# kubectl create -f cron-job-pd.yaml
cronjob.batch/cron-job-test created


[root@k8s-master job]# kubectl get cronjobs
NAME            SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cron-job-test   * * * * *   False     0        <none>          11s


[root@k8s-master job]# kubectl get po
NAME                           READY   STATUS      RESTARTS       AGE
configfile-po                  0/1     Completed   0              26h
dns-test                       1/1     Running     2 (36h ago)    3d21h
emptydir-volume-pod            2/2     Running     44 (58m ago)   23h
fluentd-59k8k                  1/1     Running     1 (36h ago)    3d3h
fluentd-hhtls                  1/1     Running     1 (36h ago)    3d3h
host-volume-pod                1/1     Running     0              23h
nfs-volume-pod-1               1/1     Running     0              21h
nfs-volume-pod-2               1/1     Running     0              21h
nginx-deploy-6fb8d6548-8khhv   1/1     Running     29 (54m ago)   29h
nginx-deploy-6fb8d6548-fd9tx   1/1     Running     29 (54m ago)   29h
nginx-sc-0                     1/1     Running     0              3h20m

[root@k8s-master job]# kubectl get cronjobs
NAME            SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cron-job-test   * * * * *   False     0        42s             2m16s

[root@k8s-master job]# kubectl get po
NAME                           READY   STATUS      RESTARTS       AGE
configfile-po                  0/1     Completed   0              26h
cron-job-test-28484150-wkbgp   0/1     Completed   0              2m4s
cron-job-test-28484151-886j6   0/1     Completed   0              64s
cron-job-test-28484152-srjb4   0/1     Completed   0              4s
dns-test                       1/1     Running     2 (36h ago)    3d21h
emptydir-volume-pod            2/2     Running     46 (35s ago)   23h
fluentd-59k8k                  1/1     Running     1 (36h ago)    3d3h
fluentd-hhtls                  1/1     Running     1 (36h ago)    3d3h
host-volume-pod                1/1     Running     0              23h
nfs-volume-pod-1               1/1     Running     0              21h
nfs-volume-pod-2               1/1     Running     0              21h
nginx-deploy-6fb8d6548-8khhv   1/1     Running     29 (56m ago)   29h
nginx-deploy-6fb8d6548-fd9tx   1/1     Running     29 (56m ago)   29h
nginx-sc-0                     1/1     Running     0              3h22m

[root@k8s-master job]# kubectl logs -f cron-job-test-28484150-wkbgp
Tue Feb 27 15:50:19 UTC 2024
Hello from the Kubernetes cluster

3、初始化容器 InitContainer

相对于postStart来说，首先InitController能够保证一定在EntryPoint之前执行，而postStart不能，其次postStart更适合去执行一些命令操作，而InitController实际就是一个容器，可以在其他基础容器环境下执行更复杂的初始化功能。

3.1 在pod创建的模板中配置 initContainers 参数：

spec:
  template:
    spec:   
      initContainers:
      - image: nginx:1.20 
        imagePullPolicy: IfNotPresent
        command: [ sh,"-c","sleep 10 ；echo 'inited' >> /init.log "]
        name: init-test

3.2 修改存在的deploy资源，如下

在这里插入图片描述

3.3 更新过deploy后，新的pod有个init的过程

在这里插入图片描述

4、污点和容忍

节点亲和性是 Pod 的一种属性，它使 Pod 被吸引到一类特定的节点（这可能出于一种偏好，也可能是硬性要求）。污点（Taint）则相反——它使节点能够排斥一类特定的 Pod。
容忍度（Toleration）是应用于 Pod 上的。容忍度允许调度器调度带有对应污点的 Pod。容忍度允许调度但并不保证调度：作为其功能的一部分，调度器也会评估其他参数。
污点和容忍度（Toleration）相互配合，可以用来避免 Pod 被分配到不合适的节点上。每个节点上都可以应用一个或多个污点，这表示对于那些不能容忍这些污点的 Pod，是不会被该节点接受的。

4.1 污点（Taint）

污点：是标注在节点上的，给一个节点打上污点以后，k8s回认为尽量不要将Pod调度到该节点上，除非该pod上面表示可以容忍该污点，且一个节点可以打多个污点，此时需要pod容忍所有污点才会被调度该节点。
污点的影响：
- NoSchedule:不能容忍的pod不能被调度到该节点，但是已经存在的节点不会被驱逐。
- NoExecute:不能容忍的pod会被立即清除，能容忍的pod则会存在于节点上
- tolerationSeconds属性，如果没设置，则可以一直运行，
- tolerationSeconds:3600属性，则该pod还能继续在该节点运行3600，超过3600后会被重新调度

4.1.1 为node1节点打上污点

root@k8s-master volume]# kubectl taint node  k8s-node-01   tag=test:NoSchedule
node/k8s-node-01 tainted


[root@k8s-master volume]# kubectl describe no  k8s-node-01
Name:               k8s-node-01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    ingress=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k8s-node-01
                    kubernetes.io/os=linux
                    type=microsvc
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"66:39:6c:7a:92:99"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.10.10.178
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/cri-dockerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 19 Feb 2024 22:58:42 +0800


# 这个地方可以看到这个新加的污点信息
Taints:             tag=test:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  k8s-node-01
  AcquireTime:     <unset>
  RenewTime:       Wed, 28 Feb 2024 01:17:51 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 26 Feb 2024 11:32:55 +0800   Mon, 26 Feb 2024 11:32:55 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Wed, 28 Feb 2024 01:17:30 +0800   Mon, 26 Feb 2024 11:32:41 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 28 Feb 2024 01:17:30 +0800   Mon, 26 Feb 2024 11:32:41 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 28 Feb 2024 01:17:30 +0800   Mon, 26 Feb 2024 11:32:41 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Wed, 28 Feb 2024 01:17:30 +0800   Tue, 27 Feb 2024 01:59:54 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.10.10.177
  Hostname:    k8s-node-01
Capacity:
  cpu:                2
  ephemeral-storage:  62575768Ki
  hugepages-2Mi:      0
  memory:             3861288Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  57669827694
  hugepages-2Mi:      0
  memory:             3758888Ki
  pods:               110
System Info:
  Machine ID:                 9ee2b84718d0437fa9ea4380bdb34024
  System UUID:                A90F4D56-48C7-6739-A05A-A22B33EC7C5F
  Boot ID:                    6cb5ab07-c82b-4404-8f48-7b9abafe52f1
  Kernel Version:             3.10.0-1160.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://25.0.3
  Kubelet Version:            v1.25.0
  Kube-Proxy Version:         v1.25.0
PodCIDR:                      10.2.2.0/24
PodCIDRs:                     10.2.2.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                               ------------  ----------  ---------------  -------------  ---
  default                     fluentd-59k8k                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d4h
  default                     nginx-deploy-69ccc996f9-wqp55      100m (5%)     200m (10%)  128Mi (3%)       128Mi (3%)     61m
  ingress-nginx               ingress-nginx-controller-jn65t     100m (5%)     0 (0%)      90Mi (2%)        0 (0%)         2d5h
  kube-flannel                kube-flannel-ds-glkkb              100m (5%)     0 (0%)      50Mi (1%)        0 (0%)         8d
  kube-system                 coredns-c676cc86f-pdsl6            100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)     6d14h
  kube-system                 kube-proxy-n2w92                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         8d
  kube-system                 metrics-server-7bb86dcf48-hfpb5    100m (5%)     0 (0%)      200Mi (5%)       0 (0%)         3d3h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                500m (25%)   200m (10%)
  memory             538Mi (14%)  298Mi (8%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:              <none>

4.1.2 查看下master的污点

之前在安装ingress-nginx的时候无法安装再master节点就是因为master节点有污点
master节点的污点是：Taints: node-role.kubernetes.io/control-plane:NoSchedule

[root@k8s-master volume]# kubectl  describe no k8s-master
Name:               k8s-master
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    ingress=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k8s-master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"c2:fd:ef:b4:ea:aa"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.10.10.100
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 19 Feb 2024 22:04:42 +0800
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  k8s-master
  AcquireTime:     <unset>
  RenewTime:       Wed, 28 Feb 2024 01:21:44 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 22 Feb 2024 18:30:31 +0800   Thu, 22 Feb 2024 18:30:31 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Wed, 28 Feb 2024 01:18:30 +0800   Mon, 19 Feb 2024 22:04:38 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 28 Feb 2024 01:18:30 +0800   Mon, 19 Feb 2024 22:04:38 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 28 Feb 2024 01:18:30 +0800   Mon, 19 Feb 2024 22:04:38 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Wed, 28 Feb 2024 01:18:30 +0800   Mon, 19 Feb 2024 23:35:15 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.10.10.100
  Hostname:    k8s-master
Capacity:
  cpu:                2
  ephemeral-storage:  62575768Ki
  hugepages-2Mi:      0
  memory:             3861288Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  57669827694
  hugepages-2Mi:      0
  memory:             3758888Ki
  pods:               110
System Info:
  Machine ID:                 9ee2b84718d0437fa9ea4380bdb34024
  System UUID:                AE134D56-9F2E-B64D-9BA2-6368B1379B3A
  Boot ID:                    e0bf44a5-6a0d-4fc0-923d-f5d63089b93f
  Kernel Version:             3.10.0-1160.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://25.0.3
  Kubelet Version:            v1.25.0
  Kube-Proxy Version:         v1.25.0
PodCIDR:                      10.2.0.0/24
PodCIDRs:                     10.2.0.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                  ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-tpm8x                 100m (5%)     0 (0%)      50Mi (1%)        0 (0%)         8d
  kube-system                 coredns-c676cc86f-q7hcw               100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)     6d14h
  kube-system                 etcd-k8s-master                       100m (5%)     0 (0%)      100Mi (2%)       0 (0%)         8d
  kube-system                 kube-apiserver-k8s-master             250m (12%)    0 (0%)      0 (0%)           0 (0%)         8d
  kube-system                 kube-controller-manager-k8s-master    200m (10%)    0 (0%)      0 (0%)           0 (0%)         8d
  kube-system                 kube-proxy-xtllb                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         8d
  kube-system                 kube-scheduler-k8s-master             100m (5%)     0 (0%)      0 (0%)           0 (0%)         8d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                850m (42%)  0 (0%)
  memory             220Mi (5%)  170Mi (4%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

4.1.3 污点的删除

[root@k8s-master volume]# kubectl  taint  no k8s-master  node-role.kubernetes.io/control-plane:NoSchedule-
node/k8s-master untainted

在这里插入图片描述

4.1.4 查看k8s上的pod节点信息

[root@k8s-master volume]# kubectl get po  -o wide
NAME                            READY   STATUS    RESTARTS        AGE     IP          NODE          NOMINATED NODE   READINESS GATES
dns-test                        1/1     Running   2 (37h ago)     3d22h   10.2.1.58   k8s-node-02   <none>           <none>
fluentd-59k8k                   1/1     Running   1 (37h ago)     3d5h    10.2.2.34   k8s-node-01   <none>           <none>
fluentd-hhtls                   1/1     Running   1 (37h ago)     3d4h    10.2.1.59   k8s-node-02   <none>           <none>
nginx-deploy-69ccc996f9-stgcl   1/1     Running   1 (8m23s ago)   68m     10.2.1.78   k8s-node-02   <none>           <none>
nginx-deploy-69ccc996f9-wqp55   1/1     Running   1 (8m8s ago)    68m     10.2.2.71   k8s-node-01   <none>           <none>

4.1.5 测试这个场景，把nginx的pod删除，是否可以再master创建

在这里插入图片描述

4.1.6 测试把master的污点给加上，但是污点属性是：NoExecute

NoExecute 属性会把目前在该节点上的pod都迁移到别的节点
目前master给打上污点了，node1节点也给打上污点了，最终新的pod会跑到node2上。

在这里插入图片描述

4.2 容忍（Toleration）

容忍：是标注在pod上的，当pod被调度时，如果没有配置容忍，则该pod不会被调度到有污点的节点上，只有该pod上标注了满足某个节点的所有污点，则会被调度到这些节点

4.2.1 k8s-node-01上配置了污点，影响是：NoSchedule

污点的key-value是：tag=test
属性是：NoSchedule ”不能容忍的pod不能被调度到该节点，但是已经存在的节点不会被驱逐。“

[root@k8s-master volume]# kubectl  describe   no k8s-node-01  |  grep  -i  tain
Taints:             tag=test:NoSchedule
  Container Runtime Version:  docker://25.0.3

4.2.2 pod的spec下面配置容忍度影响是：NoSchedule ，操作是 Equal

容忍操作 operator: “Equal” 表示pod上的容忍度和节点的污点相同，才会匹配到该节点

tolerations:
  - key: tag   #  污点的key
    value: test  #污点的value
    effect: "NoSchedule"    # 污点产生的影响
    operator: "Equal"   # 表示 value与污点的value要相等，也可以设置为Exists表示存在key即可，此时可以不用配置value