【containerd错误解决系列】failed to create shim task, OCI runtime create failed, unable to retrieve OCI...
环境
# cat /etc/redhat-release
CentOS Linux release 8.0.1905 (Core)
# uname -r
4.18.0-348.rt7.130.el8.x86_64
问题及现象
1、pod的状态全部都是ContainerCreating的状态
2、查看报错信息
kubectl describe -n 命名空间 po pod名字
containerd进程有大量报错,主要有:
failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/c4847070fad34a8da9b16b5c20cdc38e28a15cfcf9913d712e4fe60d8c9029f7/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown
3、解决方案
查看现有libseccomp版本
# sudo rpm -qa | grep libseccomp
libseccomp-2.3.3-3.el8.x86_64
卸载低版本libseccomp
sudo rpm -e libseccomp-2.3.3-3.el8.x86_64 --nodeps
sudo rpm -qa | grep libseccomp
安装高版本libseccomp
yum provides libseccomp
下载安装包
wget https://repo.almalinux.org/almalinux/8/BaseOS/x86_64/os/Packages/libseccomp-2.5.2-1.el8.x86_64.rpm \
https://repo.almalinux.org/almalinux/8/AppStream/x86_64/os/Packages/libseccomp-devel-2.5.2-1.el8.x86_64.rpm \
--no-check-certificate
yum -y install *.rpm
原理
libseccomp需要高于2.4版本
containerd.io 要求安装版本为 2.4.0 的 libseccomp
具体官方依据还未找到,后续找到补充
【containerd错误解决系列】failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: failed to write "204800":
问题及现象
1、原因是pod的cpu、内存设置超过虚机设定内存大小
resources:
limits:
cpu: 2048m
memory: 2Gi
requests:
cpu: 256m
memory: 2Gi
3、解决方案
resources:
limits:
cpu: 100m
memory: 1Gi
requests:
cpu: 90m
memory: 1Gi
或者:系统优化设置
vi /etc/sysctl.conf
net.core.rmem_default = 256960
net.core.rmem_max = 513920
net.core.wmem_default = 256960
net.core.wmem_max = 513920
net.core.netdev_max_backlog = 2000
net.core.somaxconn = 2048
net.core.optmem_max = 81920
net.ipv4.tcp_mem = 131072 262144 524288
net.ipv4.tcp_rmem = 8760 256960 4088000
net.ipv4.tcp_wmem = 8760 256960 4088000
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 32768 65535
net.ipv4.tcp_max_syn_backlog = 2048
fs.file-max = 10240000
sysctl -p
ulimit -n和-u可以查看linux的最大进程数和最大文件打开数
永久生效的方法:
修改/etc/security/limits.conf文件
* soft nofile 204800
* hard nofile 204800
* soft nproc 204800
* hard nproc 204800
* 代表针对所有用户
noproc 是代表最大进程数
nofile 是代表最大文件打开数
部署Nginx
1、新增命名空间
kubectl create ns grtmnt
2、创建文件目录
mkdir /home/ops/logs
3、部署yaml文件
vi nginx.yaml
kubectl apply -f nginx.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
namespace: grtmnt
spec:
type: NodePort
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
nodePort: 32002
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: grtmnt
spec:
replicas: 1
selector:
matchLabels:
app: nginx
minReadySeconds: 5
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
livenessProbe:
tcpSocket:
port: 80
imagePullPolicy: Always
env:
- name: TZ
value: "Asia/Shanghai"
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: logs_nginx-stdout
value: "stdout"
- name: logs_nginx-logs
value: "/data/logs/nginx/*.log"
volumeMounts:
- name: logs
mountPath: /home/ops/logs
- name: time
mountPath: /etc/localtime
resources:
limits:
cpu: 100m
memory: 1Gi
requests:
cpu: 90m
memory: 1Gi
volumes:
- name: time
hostPath:
path: /etc/localtime
- name: logs
emptyDir: {}
restartPolicy: Always
案例:
mkdir /html/
vi nginx-dep.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:v1.0
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/usr/local/nginx/html/"
name: nginx-vol
volumes:
- name: nginx-vol
hostPath:
path: /html/
------
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
nodePort: 30080
type: NodePort
写了一个小脚本实现nginx服务的发布更新以及pod的启停
#!/usr/bin/env bash
dir=/html/index.html
tdir=/html/
nginx=/root/k8s/deployment/nginx-dep.yml
svc=/root/k8s/deployment/nginx-svc.yml
case $1 in
scp)
scp $dir node1:$tdir
scp $dir node2:$tdir
;;
delete)
kubectl delete -f $nginx 2>/dev/null
kubectl delete -f $svc 2>/dev/null
;;
reload)
scp $dir node1:$tdir
scp $dir node2:$tdir
sleep 3
kubectl delete -f $nginx 2>/dev/null
kubectl delete -f $svc 2>/dev/null
sleep 3
kubectl apply -f $nginx
kubectl apply -f $svc
;;
*)
echo "Uasge:please choose {scp | delete | reload }"
esac
kubectl apply -f nginx-dep.yml
kubectl get deploy -o wide