RESOURCES: ==> v1/ServiceAccount NAME SECRETS AGE metric-metrics-server 1 1s
==> v1/ClusterRole NAME AGE system:metrics-server-aggregated-reader 1s system:metric-metrics-server 1s
==> v1/ClusterRoleBinding NAME AGE metric-metrics-server:system:auth-delegator 1s system:metric-metrics-server 1s
==> v1beta1/RoleBinding NAME AGE metric-metrics-server-auth-reader 1s
==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE metric-metrics-server ClusterIP 10.103.214.219 <none> 443/TCP 1s
==> v1/Deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE metric-metrics-server 1 1 1 0 1s
==> v1beta1/APIService NAME AGE v1beta1.metrics.k8s.io 1s
==> v1/Pod(related) NAME READY STATUS RESTARTS AGE metric-metrics-server-697bd98b8b-kvg2d 0/1 ContainerCreating 0 1s
NOTES: The metric server has been deployed.
In a few minutes you should be able to list metrics using the following command:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
等部署完成后,可以查看 Pod 日志是否正常:
1 2 3 4 5 6 7 8 9
$ kubectl get pods -n kube-system -l release=metric NAME READY STATUS RESTARTS AGE metric-metrics-server-697bd98b8b-kvg2d 1/1 Running 0 58m $ kubectl logs -f metric-metrics-server-697bd98b8b-kvg2d -n kube-system I0521 17:31:54.580374 1 serving.go:273] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) [restful] 2019/05/21 17:31:55 log.go:33: [restful/swagger] listing is available at https://:8443/swaggerapi [restful] 2019/05/21 17:31:55 log.go:33: [restful/swagger] https://:8443/swaggerui/ is mapped to folder /swagger-ui/ I0521 17:31:55.112171 1 serve.go:96] Serving securely on [::]:8443 E0521 17:32:55.229771 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ydzs-node2: unable to fetch metrics from kubelet ydzs-node2 (ydzs-node2): Get https://ydzs-node2:10250/stats/summary/: dial tcp: lookup ydzs-node2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ydzs-master: unable to fetch metrics from kubelet ydzs-master (ydzs-master): Get https://ydzs-master:10250/stats/summary/: dial tcp: lookup ydzs-master on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:ydzs-node1: unable to fetch metrics from kubelet ydzs-node1 (ydzs-node1): Get https://ydzs-node1:10250/stats/summary/: dial tcp: lookup ydzs-node1 on 10.96.0.10:53: no such host]
我们可以发现 Pod 中出现了一些错误信息:xxx: no such host,我们看到这个错误信息一般就可以确定是 DNS 解析不了造成的,我们可以看到 metrics-server 会通过 kubelet 的 10250 端口获取信息,使用的是 hostname,我们部署集群的时候在节点的 /etc/hosts 里面添加了节点的 hostname 和 ip 的映射,但是是我们的 metrics-server 的 Pod 内部并没有这个 hosts 信息,当然也就不识别 hostname 了,要解决这个问题,有两种方法:
E0521 17:55:34.650303 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ydzs-master: unable to fetch metrics from kubelet ydzs-master (10.151.30.11): Get https://10.151.30.11:10250/stats/summary/: x509: cannot validate certificate for 10.151.30.11 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ydzs-node2: unable to fetch metrics from kubelet ydzs-node2 (10.151.30.23): Get https://10.151.30.23:10250/stats/summary/: x509: cannot validate certificate for 10.151.30.23 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:ydzs-node1: unable to fetch metrics from kubelet ydzs-node1 (10.151.30.22): Get https://10.151.30.22:10250/stats/summary/: x509: cannot validate certificate for 10.151.30.22 because it doesn't contain any IP SANs]
因为部署集群的时候,CA 证书并没有把各个节点的 IP 签上去,所以这里 metrics-server 通过 IP 去请求时,提示签的证书没有对应的 IP(错误:x509: cannot validate certificate for 192.168.33.11 because it doesn’t contain any IP SANs),我们可以添加一个--kubelet-insecure-tls参数跳过证书校验: