Istio可观测性

image-20231129072302901

前言

Istio 为网格内所有的服务通信生成详细的遥测数据。这种遥测技术提供了服务行为的可观测性，使运维人员能够排查故障、维护和优化应用程序，而不会给开发人员带来其他额外的负担。通过 Istio，运维人员可以全面了解到受监控的服务如何与其他服务以及 Istio 组件进行交互。

Istio 生成以下类型的遥测数据，以提供对整个服务网格的可观测性：

Metrics（指标）：Istio 基于 4 个监控的黄金标识（延迟、流量、错误、饱和）生成了一系列服务指标，Istio 还为网格控制平面提供了更详细的指标。除此以外还提供了一组默认的基于这些指标的网格监控仪表板。
Tracing（分布式追踪）：Istio 为每个服务生成分布式追踪 span，运维人员可以理解网格内服务的依赖和调用流程。
Log（访问日志）：当流量流入网格中的服务时，Istio 可以生成每个请求的完整记录，包括源和目标的元数据，该信息使运维人员能够将服务行为的审查控制到单个工作负载实例的级别。

接下来我们将分别来学习 Istio 的指标、分布式追踪和访问日志是如何工作的。

指标

指标提供了一种以聚合的方式监控和理解行为的方法。为了监控服务行为，Istio 为服务网格中所有出入网格，以及网格内部的服务流量都生成了指标，这些指标提供了关于行为的信息，例如总流量、错误率和请求响应时间。除了监控网格中服务的行为外，监控网格本身的行为也很重要。Istio 组件还可以导出自身内部行为的指标，以提供对网格控制平面的功能和健康情况的洞察能力。

指标类别

整体上 Istio 的指标可以分成 3 个级别：代理级别、服务级别、控制平面级别。

1、代理级别指标

Istio 指标收集从 Envoy Sidecar 代理开始，每个代理为通过它的所有流量（入站和出站）生成一组丰富的指标。代理还提供关于它本身管理功能的详细统计信息，包括配置信息和健康信息。

Envoy 生成的指标提供了资源（例如监听器和集群）粒度上的网格监控。因此，为了监控 Envoy 指标，需要了解网格服务和 Envoy 资源之间的连接。

Istio 允许运维人员在每个工作负载实例上选择生成和收集哪些 Envoy 指标。默认情况下，Istio 只支持 Envoy 生成的统计数据的一小部分，以避免依赖过多的后端服务，还可以减少与指标收集相关的 CPU 开销。但是运维人员可以在需要时轻松地扩展收集到的代理指标数据。这样我们可以有针对性地调试网络行为，同时降低了跨网格监控的总体成本。

2、服务级别指标

除了代理级别指标之外，Istio 还提供了一组用于监控服务通信的面向服务的指标。这些指标涵盖了四个基本的服务监控需求：延迟、流量、错误和饱和情况。而且 Istio 还自带了一组默认的仪表板，用于监控基于这些指标的服务行为。默认情况下，标准 Istio 指标会导出到 Prometheus。而且服务级别指标的使用完全是可选的，运维人员可以根据自身的需求来选择关闭指标的生成和收集。

3、控制平面指标

另外 Istio 控制平面还提供了一组自我监控指标。这些指标允许监控 Istio 自己的行为。

通过 Prometheus 查询指标

Istio 默认使用 Prometheus 来收集和存储指标。Prometheus 是一个开源的系统监控和警报工具包，它可以从多个源收集指标，并允许运维人员通过 PromQL 查询语言来查询收集到的指标。

首先要确保 Istio 的 prometheus 组件已经启用，如果没有启用可以通过以下命令启用：

[root@master1 ~]#cd istio-1.19.3/
[root@master1 istio-1.19.3]#ls samples/addons/
extras  grafana.yaml  jaeger.yaml  kiali.yaml  loki.yaml  prometheus.yaml  README.md

#部署
kubectl apply -f samples/addons

上面的命令会安装 Kiali，包括 Prometheus、Grafana 以及 jaeger。当然这仅仅只能用于测试环境，在生产环境可以单独安装 Prometheus 进行有针对性的配置优化。

安装后可以通过以下命令查看 Prometheus 服务状态：

$ kubectl get svc prometheus -n istio-system
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
prometheus   ClusterIP   10.106.228.196   <none>        9090/TCP   25d
$ kubectl get pods -n istio-system -l app=prometheus
NAME                         READY   STATUS    RESTARTS       AGE
prometheus-5d5d6d6fc-2gtxm   2/2     Running   0              25d

首先在浏览器中访问 http://$GATEWAY_URL/productpage 应用，然后我们就可以打开 Prometheus UI 来查看指标了。在 Kubernetes 环境中，执行如下命令就可以打开 Prometheus UI：

istioctl dashboard prometheus
# 也可以创建 Ingress 或者 Gateway 来访问 Prometheus UI

[root@master1 istio-1.19.3]#istioctl dashboard prometheus
http://localhost:9090
Failed to open browser; open http://localhost:9090 in your browser.

C[root@master1 istio-1.19.3]#istioctl dashboard prometheus --address 0.0.0.0
http://0.0.0.0:9090
Failed to open browser; open http://0.0.0.0:9090 in your browser.

image-20231129065931011

打开后我们可以在页面中随便查询一个指标，比如我们查询 istio_requests_total 指标，如下所示：

img

istio_requests_total 这是一个 COUNTER 类型的指标，用于记录 Istio 代理处理的总请求数。

当然然后可以根据自己需求来编写 promql 语句进行查询，比如查询 productpage 服务的总次数，可以用下面的语句：

istio_requests_total{destination_service="productpage.default.svc.cluster.local"}

查询 reviews 服务 v3 版本的总次数：

istio_requests_total{destination_service="reviews.default.svc.cluster.local", destination_version="v3"}

该查询返回所有请求 reviews 服务 v3 版本的当前总次数。

过去 5 分钟 productpage 服务所有实例的请求频次：

rate(istio_requests_total{destination_service=~"productpage.*", response_code="200"}[5m])

在 Graph 选项卡中，可以看到查询结果的图形化表示。

img

对于 PromQL 语句的使用可以参考官方文档 Prometheus Querying Basics，或者我们的《Prometheus 入门到实战》课程，这并不是我们这里的重点，所以就不再详细介绍了。

虽然我们这里并没有做任何的配置，但是 Istio 默认已经为我们收集了一些指标，所以我们可以直接查询到这些指标了。

使用 Grafana 可视化指标

Prometheus 提供了一个基本的 UI 来查询指标，但是它并不是一个完整的监控系统，更多的时候我们可以使用 Grafana 来可视化指标。

首先同样要保证 Istio 的 grafana 组件已经启用，如果没有启用可以通过以下命令启用：

kubectl apply -f samples/addons

并且要保证 Prometheus 服务正在运行，服务安装后可以通过下面的命令来查看状态：

$ kubectl -n istio-system get svc grafana
NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
grafana   ClusterIP   10.96.197.74   <none>        3000/TCP   25d
$ kubectl -n istio-system get pods -l app=grafana
NAME                       READY   STATUS    RESTARTS       AGE
grafana-5f9b8c6c5d-jv65v   1/1     Running   0              25d

然后我们可以通过以下命令来打开 Grafana UI：

istioctl dashboard grafana
# 也可以创建 Ingress 或者 Gateway 来访问 Grafana

istioctl dashboard grafana --address 0.0.0.0

然后我们就可以在浏览器中打开 Grafana UI 了，默认情况下 Grafana 已经配置了 Prometheus 数据源，所以我们可以直接使用 Prometheus 数据源来查询指标。

img

此外 Grafana 也已经内置了 Istio 的一些仪表盘，我们可以直接使用这些仪表盘来查看指标，比如我们可以打开 Istio Mesh Dashboard 仪表盘来查看网格的指标：

img

从图中可以看出现在有一些数据，但是并不是很多，这是因为我们现在还没产生一些流量请求。

下面我们可以用下面的命令向 productpage 服务发送 100 个请求：

for i in $(seq 1 100); do curl -s -o /dev/null "http://$GATEWAY_URL/productpage"; done

然后我们再次查看 Istio Mesh Dashboard，它应该反映所产生的流量，如下所示：

img

当然除此之外我们也可以查看到 Service 或者 Workload 的指标，比如我们可以查看 productpage 工作负载的指标：

img

这里给出了每一个工作负载，以及该工作负载的入站工作负载（将请求发送到该工作负载的工作负载）和出站服务（此工作负载向其发送请求的服务）的详细指标。

Istio Dashboard 主要包括三个主要部分：

网格摘要视图：这部分提供网格的全局摘要视图，并显示网格中（HTTP/gRPC 和 TCP）的工作负载。
单独的服务视图：这部分提供关于网格中每个单独的（HTTP/gRPC 和 TCP）服务的请求和响应指标。这部分也提供关于该服务的客户端和服务工作负载的指标。
单独的工作负载视图：这部分提供关于网格中每个单独的（HTTP/gRPC 和 TCP）工作负载的请求和响应指标。这部分也提供关于该工作负载的入站工作负载和出站服务的指标。

指标采集原理

从上面的例子我们可以看出当我们安装了 Istio 的 Prometheus 插件后，Istio 就会自动收集一些指标，但是我们并没有做任何的配置，那么 Istio 是如何收集指标的呢？如果我们想使用我们自己的 Prometheus 来收集指标，那么我们应该如何配置呢？

首先我们需要去查看下 Istio 的 Prometheus 插件的配置，通过 cat samples/addons/prometheus.yaml 命令查看配置文件，如下所示：

# Source: prometheus/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    component: "server"
    app: prometheus
    release: prometheus
    chart: prometheus-19.6.1
    heritage: Helm
  name: prometheus
  namespace: istio-system
spec:
  ports:
    - name: http
      port: 9090
      protocol: TCP
      targetPort: 9090
  selector:
    component: "server"
    app: prometheus
    release: prometheus
  sessionAffinity: None
  type: "ClusterIP"
---
# Source: prometheus/templates/deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: "server"
    app: prometheus
    release: prometheus
    chart: prometheus-19.6.1
    heritage: Helm
  name: prometheus
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: "server"
      app: prometheus
      release: prometheus
  replicas: 1
  strategy:
    type: Recreate
    rollingUpdate: null
  template:
    metadata:
      labels:
        component: "server"
        app: prometheus
        release: prometheus
        chart: prometheus-19.6.1
        heritage: Helm
        sidecar.istio.io/inject: "false"
    spec:
      enableServiceLinks: true
      serviceAccountName: prometheus
      containers:
        - name: prometheus-server-configmap-reload
          image: "jimmidyson/configmap-reload:v0.8.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - --volume-dir=/etc/config
            - --webhook-url=http://127.0.0.1:9090/-/reload
          resources: {}
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
              readOnly: true
        - name: prometheus-server
          image: "prom/prometheus:v2.41.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - --storage.tsdb.retention.time=15d
            - --config.file=/etc/config/prometheus.yml ##配置文件
            - --storage.tsdb.path=/data
            - --web.console.libraries=/etc/prometheus/console_libraries
            - --web.console.templates=/etc/prometheus/consoles
            - --web.enable-lifecycle
          ports:
            - containerPort: 9090
          readinessProbe:
            httpGet:
              path: /-/ready
              port: 9090
              scheme: HTTP
            initialDelaySeconds: 0
            periodSeconds: 5
            timeoutSeconds: 4
            failureThreshold: 3
            successThreshold: 1
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: 9090
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 15
            timeoutSeconds: 10
            failureThreshold: 3
            successThreshold: 1
          resources: {}
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: storage-volume
              mountPath: /data
              subPath: ""
      dnsPolicy: ClusterFirst
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
      terminationGracePeriodSeconds: 300
      volumes:
        - name: config-volume
          configMap:
            name: prometheus
        - name: storage-volume
          emptyDir: {} ##临时的
# 省略了部分配置

从上面的资源清单中可以看出 Prometheus 服务的核心配置文件为 --config.file=/etc/config/prometheus.yml，而该配置文件是通过上面的 prometheus 这个 ConfigMap 以 volume 形式挂载到容器中的。

所以我们重点是查看这个 ConfigMap 的配置，如下所示：

# Source: prometheus/templates/cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    component: "server"
    app: prometheus
    release: prometheus
    chart: prometheus-19.6.1
    heritage: Helm
  name: prometheus
  namespace: istio-system
data:
  allow-snippet-annotations: "false"
  alerting_rules.yml: |
    {}
  alerts: |
    {}
  prometheus.yml: |
    global:
      evaluation_interval: 1m
      scrape_interval: 15s
      scrape_timeout: 10s
    rule_files:
    - /etc/config/recording_rules.yml
    - /etc/config/alerting_rules.yml
    - /etc/config/rules
    - /etc/config/alerts
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: kubernetes-apiservers
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: default;kubernetes;https
        source_labels:
        - __meta_kubernetes_namespace
        - __meta_kubernetes_service_name
        - __meta_kubernetes_endpoint_port_name
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: kubernetes-nodes
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - regex: (.+)
        replacement: /api/v1/nodes/$1/proxy/metrics
        source_labels:
        - __meta_kubernetes_node_name
        target_label: __metrics_path__
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: kubernetes-nodes-cadvisor
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - regex: (.+)
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
        source_labels:
        - __meta_kubernetes_node_name
        target_label: __metrics_path__
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - honor_labels: true
      job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: drop
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (.+?)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: service
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node
    - honor_labels: true
      job_name: kubernetes-service-endpoints-slow
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (.+?)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: service
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node
      scrape_interval: 5m
      scrape_timeout: 30s
    - honor_labels: true
      job_name: prometheus-pushgateway
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - action: keep
        regex: pushgateway
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
    - honor_labels: true
      job_name: kubernetes-services
      kubernetes_sd_configs:
      - role: service
      metrics_path: /probe
      params:
        module:
        - http_2xx
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
      - source_labels:
        - __address__
        target_label: __param_target
      - replacement: blackbox
        target_label: __address__
      - source_labels:
        - __param_target
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - source_labels:
        - __meta_kubernetes_service_name
        target_label: service
    - honor_labels: true
      job_name: kubernetes-pods
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: drop
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
        - __meta_kubernetes_pod_phase
    - honor_labels: true
      job_name: kubernetes-pods-slow
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
        - __meta_kubernetes_pod_phase
      scrape_interval: 5m
      scrape_timeout: 30s
  recording_rules.yml: |
    {}
  rules: |
    {}
---

这个配置文件中描述了 6 个指标抓取任务的配置：

prometheus：抓取 Prometheus 服务自身的指标。
kubernetes-apiservers：抓取 Kubernetes API 服务器的指标。
kubernetes-nodes：抓取 Kubernetes 节点的指标。
kubernetes-nodes-cadvisor：抓取 Kubernetes 节点的 cadvisor 指标，主要包括容器的 CPU、内存、网络、磁盘等指标。
kubernetes-service-endpoints：抓取 Kubernetes 服务端点的指标。
kubernetes-pods：抓取 Kubernetes Pod 的指标。

img

这里我们可以重点关注下 kubernetes-pods 这个指标抓取任务的配置，因为我们大部分的指标数据都是通过 Pod 的 Envoy Sidecar 来提供的。

从配置上可以看到这是基于 pod 的服务发现方式：

首先只会保留 __meta_kubernetes_pod_annotation_prometheus_io_scrape 这个源标签为 true 的指标数据，这个源标签表示的是如果 Pod 的 annotation 注解中有 prometheus.io/scrape 标签，且值为 true，则会保留该指标数据，否则会丢弃该指标数据
然后根据 prometheus.io/scheme 注解来配置协议为 http 或者 https
根据 prometheus.io/path 注解来配置抓取路径
根据 prometheus.io/port 注解来配置抓取端口；
将 prometheus.io/param 注解的值映射为 Prometheus 的标签；
然后还会将 pod 的标签通过 labelmap 映射为 Prometheus 的标签；最后还会将 pod 的 namespace 和 pod 的名称映射为 Prometheus 的标签。
最后需要判断 Pod 的 phase 状态，只有当 Pod 的 phase 状态为 Running 时才会保留该指标数据，否则会丢弃该指标数据。

比如我们查询 istio_requests_total{app="productpage", destination_app="details"} 这个指标，如下所示：

img

该查询语句的查询结果为：

istio_requests_total{
    app="details",
    connection_security_policy="mutual_tls",
    destination_app="details",
    destination_canonical_revision="v1",
    destination_canonical_service="details",
    destination_cluster="Kubernetes",
    destination_principal="spiffe://cluster.local/ns/default/sa/bookinfo-details",
    destination_service="details.default.svc.cluster.local",
    destination_service_name="details",
    destination_service_namespace="default",
    destination_version="v1",
    destination_workload="details-v1",
    destination_workload_namespace="default",
    instance="10.244.2.74:15020",
    job="kubernetes-pods",
    namespace="default",
    pod="details-v1-5f4d584748-9fflw",
    pod_template_hash="5f4d584748",
    reporter="destination",
    request_protocol="http",
    response_code="200",
    response_flags="-",
    security_istio_io_tlsMode="istio",
    service_istio_io_canonical_name="details",
    service_istio_io_canonical_revision="v1",
    source_app="productpage",
    source_canonical_revision="v1",
    source_canonical_service="productpage",
    source_cluster="Kubernetes",
    source_principal="spiffe://cluster.local/ns/default/sa/bookinfo-productpage",
    source_version="v1",
    source_workload="productpage-v1",
    source_workload_namespace="default",
    version="v1"
}  362

该查询表示的是从 productpage 服务到 details 服务的请求总次数，从查询结果可以看出该指标就是来源于 job="kubernetes-pods" 这个指标抓取任务，那说明这个指标数据是通过服务发现方式从 Pod 中抓取的。

我们可以查看下 productpage Pod 的信息，如下所示：

$ kubectl get pods productpage-v1-564d4686f-l8kxr -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    istio.io/rev: default
    kubectl.kubernetes.io/default-container: productpage
    kubectl.kubernetes.io/default-logs-container: productpage
    prometheus.io/path: /stats/prometheus
    prometheus.io/port: "15020"
    prometheus.io/scrape: "true"
    sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
  labels:
    app: productpage
    pod-template-hash: 564d4686f
    security.istio.io/tlsMode: istio
    service.istio.io/canonical-name: productpage
    service.istio.io/canonical-revision: v1
    version: v1
  name: productpage-v1-564d4686f-l8kxr
  namespace: default
spec:
  containers:
  - image: docker.io/istio/examples-bookinfo-productpage-v1:1.18.0
    imagePullPolicy: IfNotPresent
# ......

我们从上面的资源清单中可以看到该 Pod 包含如下几个注解：

prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scrape: "true"

这些注解就是用来配置 Prometheus 服务发现的，其中 prometheus.io/scrape: "true" 表示该 Pod 的指标数据是需要被抓取的，而 prometheus.io/path: /stats/prometheus 和 prometheus.io/port: "15020" 则是用来配置抓取路径和抓取端口的，当 Prometheus 发现这个 Pod 后根据配置就可以通过 <pod ip>:15020/stats/prometheus 这个路径来抓取该 Pod 的指标数据了，这个路径就是 Envoy Sidecar 提供的 /stats/prometheus 路径，而 15020 则是 Envoy Sidecar 的端口，这个端口是通过 istio-proxy 这个容器配置的静态监听器暴露出来的。

当然定义的标签也被映射为 Prometheus 的标签了，从结果来看除了 Pod 的这些标签之外，Envoy Sidecar 也会自己添加很多相关标签，主要是标明 destination 和 source 的信息，有了这些标签我们就可以很方便的对指标进行查询了。Envoy Sidecar 自行添加的一些主要标签如下所示：

reporter：标识请求指标的上报端，如果指标由服务端 Istio 代理上报，则设置为 destination，如果指标由客户端 Istio 代理或网关上报，则设置为 source。
source_workload：标识源工作负载的名称，如果缺少源信息，则标识为 unknown。
source_workload_namespace：标识源工作负载的命名空间，如果缺少源信息，则标识为 unknown。
source_principal：标识流量源的对等主体，当使用对等身份验证时设置。
source_app：根据源工作负载的 app 标签标识源应用程序，如果源信息丢失，则标识为 unknown。
source_version：标识源工作负载的版本，如果源信息丢失，则标识为 unknown。
destination_workload：标识目标工作负载的名称，如果目标信息丢失，则标识为 unknown。
destination_workload_namespace：标识目标工作负载的命名空间，如果目标信息丢失，则标识为 unknown。
destination_principal：标识流量目标的对等主体，使用对等身份验证时设置。
destination_app：它根据目标工作负载的 app 标签标识目标应用程序，如果目标信息丢失，则标识为 unknown。
destination_version：标识目标工作负载的版本，如果目标信息丢失，则标识为 unknown。
destination_service：标识负责传入请求的目标服务主机，例如：details.default.svc.cluster.local。
destination_service_name：标识目标服务名称，例如 details。
destination_service_namespace：标识目标服务的命名空间。
request_protocol：标识请求的协议，设置为请求或连接协议。
response_code：标识请求的响应代码，此标签仅出现在 HTTP 指标上。
connection_security_policy：标识请求的服务认证策略，当 Istio 使用安全策略来保证通信安全时，如果指标由服务端 Istio 代理上报，则将其设置为 mutual_tls。如果指标由客户端 Istio 代理上报，由于无法正确填充安全策略，因此将其设置为 unknown。
response_flags：有关来自代理的响应或连接的其他详细信息。
Canonical Service：工作负载属于一个 Canonical 服务，而 Canonical 服务却可以属于多个服务。Canonical 服务具有名称和修订版本，因此会产生以下标签：
source_canonical_service
source_canonical_revision
destination_canonical_service
destination_canonical_revision
destination_cluster：目标工作负载的集群名称，这是由集群安装时的 global.multiCluster.clusterName 设置的。
source_cluster：源工作负载的集群名称，这是由集群安装时的 global.multiCluster.clusterName 设置的。
grpc_response_status: 这标识了 gRPC 的响应状态，这个标签仅出现在 gRPC 指标上。

对于 Istio 来说包括 COUNTER 和 DISTRIBUTION 两种指标类型，这两种指标类型对应我们比较熟悉的计数器和直方图。

对于 HTTP，HTTP/2 和 GRPC 通信，Istio 生成以下指标：

请求数 (istio_requests_total)：这都是一个 COUNTER 类型的指标，用于记录 Istio 代理处理的总请求数。
请求时长 (istio_request_duration_milliseconds)：这是一个 DISTRIBUTION 类型的指标，用于测量请求的持续时间。
请求体大小 (istio_request_bytes)：这是一个 DISTRIBUTION 类型的指标，用来测量 HTTP 请求主体大小。
响应体大小 (istio_response_bytes)：这是一个 DISTRIBUTION 类型的指标，用来测量 HTTP 响应主体大小。
gRPC 请求消息数 (istio_request_messages_total)：这是一个 COUNTER 类型的指标，用于记录从客户端发送的 gRPC 消息总数。
gRPC 响应消息数 (istio_response_messages_total)：这是一个 COUNTER 类型的指标，用于记录从服务端发送的 gRPC 消息总数。

对于 TCP 流量，Istio 生成以下指标：

TCP 发送字节大小 (istio_tcp_sent_bytes_total)：这是一个 COUNTER 类型的指标，用于测量在 TCP 连接情况下响应期间发送的总字节数。
TCP 接收字节大小 (istio_tcp_received_bytes_total)：这是一个 COUNTER 类型的指标，用于测量在 TCP 连接情况下请求期间接收到的总字节数。
TCP 已打开连接数 (istio_tcp_connections_opened_total)：这是一个 COUNTER 类型的指标，用于记录 TCP 已打开的连接总数。
TCP 已关闭连接数 (istio_tcp_connections_closed_total)：这是一个 COUNTER 类型的指标，用于记录 TCP 已关闭的连接总数。