只有一个问题,原来的httpGet存活、就绪检测一直不通过,于是改为tcpSocket后pod正常。
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
修改后的yaml文件,镜像修改为阿里云
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
# - --cert-dir=/tmp
# - --secure-port=4443
# - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
# - --kubelet-use-node-status-port
# - --metric-resolution=15s
- --cert-dir=/tmp
- --secure-port=4443
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --requestheader-username-headers=X-Remote-User
- --requestheader-group-headers=X-Remote-Group
- --requestheader-extra-headers-prefix=X-Remote-Extra-
image: registry.aliyuncs.com/google_containers/metrics-server:v0.6.4
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
tcpSocket:
port: 4443
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
tcpSocket:
port: 4443
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
kubectl apply -f components.yaml
部署kube-prometheus
兼容1.27的为main分支
只克隆main分支
git clone --single-branch --branch main https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
如果直接运行kubectl apply -f manifests/setup/,会有元数据注解太长的错误
Error from server (Invalid): error when creating "manifests/setup/0prometheusCustomResourceDefinition.yaml": CustomResourceDefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
应该使用以下方式
1. kubectl apply -f manifests/setup/
2. kubectl apply --server-side -f manifests/setup
CRD正常创建后然后再执行:
kubectl apply -f manifests
镜像下载使用这个项目的方法:https://github.com/DaoCloud/public-image-mirror,在镜像前面加:m.daocloud.io
prometheus-k8s-0 pod报权限不足问题
ts=2023-08-21T03:14:34.582Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
处理:
修改prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
跨节点pind不通monitoring下的pod ip,prometheus target异常问题处理,
删除kube-prometheus/manifests下的netowrkprolic
alertmanager-networkPolicy.yaml
blackboxExporter-networkPolicy.yaml
grafana-networkPolicy.yaml
kubeStateMetrics-networkPolicy.yaml
nodeExporter-networkPolicy.yaml
prometheusAdapter-networkPolicy.yaml
prometheus-networkPolicy.yaml
prometheusOperator-networkPolicy.yaml
使用ServiceMonitor添加监控:
以ingress-nginx为例
修改ingress-nginx.yaml的service
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/port: '10254' #新增内容
prometheus.io/scrape: 'true' #新增内容
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.8.1
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
externalTrafficPolicy: Local
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- appProtocol: http
name: http
port: 80
protocol: TCP
targetPort: http
- appProtocol: https
name: https
port: 443
protocol: TCP
targetPort: https
- appProtocol: http #新增内容
name: prometheus #新增内容
port: 10254 #新增内容
protocol: TCP #新增内容
targetPort: 10254 #新增内容
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
type: LoadBalancer
# ingress-nginx-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ingress-nginx # ServiceMonitor名称
namespace: monitoring # ServiceMonitor所在名称空间
spec:
endpoints: # prometheus所采集Metrics地址配置,endpoints为一个数组,可以创建多个,但是每个endpoints包含三个字段interval、path、port
- interval: 15s # prometheus采集数据的周期,单位为秒
path: /metrics # prometheus采集数据的路径
port: prometheus # prometheus采集数据的端口,这里为port的name,主要是通过spec.selector中选择对应的svc,在选中的svc中匹配该端口
namespaceSelector: # 需要发现svc的范围
any: true # 有且仅有一个值true,当该字段被设置时,表示监听所有符合selector所选择的svc
selector:
matchLabels: # 选择svc的标签
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.8.1
kubectl apply -f ingress-nginx-monitor.yaml
确认ingress-nginx servicemonitors已添加
kubectl -n monitoring get servicemonitors.monitoring.coreos.com
NAME AGE
alertmanager-main 70m
blackbox-exporter 70m
coredns 70m
grafana 70m
ingress-nginx 24m
kube-apiserver 70m
kube-controller-manager 70m
kube-scheduler 70m
kube-state-metrics 70m
kubelet 70m
node-exporter 70m
prometheus-adapter 70m
prometheus-k8s 70m
prometheus-operator 70m
监控外部ETCD:
etcd配置文件添加:ETCD_LISTEN_METRICS_URLS=“http://0.0.0.0:2381”
etcd-job.yaml
- job_name: "etcd-server"
scrape_interval: 30s
scrape_timeout: 10s
static_configs:
- targets: ["172.16.0.157:2381"]
labels:
instance: etcdserver
创建secret
kubectl create secret generic etcd-secret --from-file=etcd-job.yaml -n monitoring
kube-prometheus/manifests/prometheus-prometheus.yaml 末尾添加配置:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.46.0
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
enableFeatures: []
externalLabels: {}
image: quay.io/prometheus/prometheus:v2.46.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.46.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleNamespaceSelector: {}
ruleSelector: {}
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.46.0
additionalScrapeConfigs: #新增
name: etcd-secret #新增
key: etcd-job.yaml #新增
kubectl apply -f kube-prometheus/manifests/prometheus-prometheus.yaml
grafana导入 3070模板
http存活检测
kind: Probe
apiVersion: monitoring.coreos.com/v1
metadata:
name: example-com-website
namespace: monitoring
spec:
interval: 60s
module: http_2xx
prober:
url: blackbox-exporter.monitoring.svc.cluster.local:19115
targets:
staticConfig:
static:
- https://www.baidu.com
添加proxy监控,默认为http
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: kube-proxy
app.kubernetes.io/part-of: kube-prometheus
name: kube-proxy
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: true
jobLabel: app.kubernetes.io/name
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app.kubernetes.io/name: kube-proxy
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-proxy
namespace: kube-system
labels:
app.kubernetes.io/name: kube-proxy
subsets:
- addresses:
- ip: 172.16.0.157
targetRef:
kind: Node
name: master1
- ip: 172.16.0.124
targetRef:
kind: Node
name: node1
- ip: 172.16.0.46
targetRef:
kind: Node
name: node2
ports:
- name: http-metrics
port: 10249
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: kube-proxy
namespace: kube-system
labels:
app.kubernetes.io/name: kube-proxy
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10249
targetPort: 10249
protocol: TCP