多读书多实践,勤思考善领悟

二进制搭建高可用的kubernetes集群(四)

本文于2201天之前发表,文中内容可能已经过时。

接上文:手动搭建高可用的kubernetes集群(三)


10. 部署Dashboard 插件

官方文件目录:kubernetes/cluster/addons/dashboard

使用的文件如下:

1
2
$ ls *.yaml
dashboard-controller.yaml dashboard-rbac.yaml dashboard-service.yaml

  • 新加了 dashboard-rbac.yaml 文件,定义 dashboard 使用的 RoleBinding。

由于 kube-apiserver 启用了 RBAC 授权,而官方源码目录的 dashboard-controller.yaml 没有定义授权的 ServiceAccount,所以后续访问 kube-apiserver 的 API 时会被拒绝,前端界面提示:

image403

解决办法是:定义一个名为dashboard 的ServiceAccount,然后将它和Cluster Role view 绑定:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ cat > dashboard-rbac.yaml<<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: dashboard
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
name: dashboard
subjects:
- kind: ServiceAccount
name: dashboard
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
EOF

配置dashboard-controller

1
2
20a21
> serviceAccountName: dashboard
  • 使用名为 dashboard 的自定义 ServiceAccount

配置dashboard-service

1
2
3
$ diff dashboard-service.yaml.orig dashboard-service.yaml
10a11
> type: NodePort
  • 指定端口类型为 NodePort,这样外界可以通过地址 nodeIP:nodePort 访问 dashboard

    执行所有定义文件

    1
    2
    3
    4
    5
    $ pwd
    /home/ych/k8s-repo/dashboard
    $ ls *.yaml
    dashboard-controller.yaml dashboard-rbac.yaml dashboard-service.yaml
    $ kubectl create -f .

检查执行结果

查看分配的 NodePort

1
2
3
$ kubectl get services kubernetes-dashboard -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.254.104.90 <none> 80:31202/TCP 1m

  • NodePort 31202映射到dashboard pod 80端口;

检查 controller

1
2
3
4
5
6
$ kubectl get deployment kubernetes-dashboard  -n kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kubernetes-dashboard 1 1 1 1 3m

$ kubectl get pods -n kube-system | grep dashboard
kubernetes-dashboard-6667f9b4c-4xbpz 1/1 Running 0 3m

访问dashboard

  1. kubernetes-dashboard 服务暴露了 NodePort,可以使用 http://NodeIP:nodePort 地址访问 dashboard
  2. 通过 kube-apiserver 访问 dashboard
  3. 通过 kubectl proxy 访问 dashboard
imagedashboard ui

由于缺少 Heapster 插件,当前 dashboard 不能展示 Pod、Nodes 的 CPU、内存等 metric 图形

注意:如果你的后端apiserver是高可用的集群模式的话,那么Dashboard的apiserver-host最好手动指定,不然,当你apiserver某个节点挂了的时候,Dashboard可能不能正常访问,如下配置

1
2
3
4
5
6
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.7.1
ports:
- containerPort: 9090
protocol: TCP
args:
- --apiserver-host=http://<api_server_ha_addr>:8080

11. 部署Heapster 插件

到heapster release 页面下载最新版的heapster

1
2
$ wget https://github.com/kubernetes/heapster/archive/v1.4.3.tar.gz
$ tar -xzvf v1.4.3.tar.gz

部署相关文件目录:/home/ych/k8s-repo/heapster-1.4.3/deploy/kube-config

1
2
3
$ ls influxdb/ && ls rbac/
grafana.yaml heapster.yaml influxdb.yaml
heapster-rbac.yaml

为方便测试访问,将grafana.yaml下面的服务类型设置为type=NodePort

执行所有文件

1
2
3
4
5
6
7
8
9
10
$ kubectl create -f rbac/heapster-rbac.yaml
clusterrolebinding "heapster" created
$ kubectl create -f influxdb
deployment "monitoring-grafana" created
service "monitoring-grafana" created
serviceaccount "heapster" created
deployment "heapster" created
service "heapster" created
deployment "monitoring-influxdb" created
service "monitoring-influxdb" created

检查执行结果

检查 Deployment

1
2
3
4
$ kubectl get deployments -n kube-system | grep -E 'heapster|monitoring'
heapster 1 1 1 1 2m
monitoring-grafana 1 1 1 0 2m
monitoring-influxdb 1 1 1 1 2m

检查 Pods

1
2
3
4
$ kubectl get pods -n kube-system | grep -E 'heapster|monitoring'
heapster-7cf895f48f-p98tk 1/1 Running 0 2m
monitoring-grafana-c9d5cd98d-gb9xn 0/1 CrashLoopBackOff 4 2m
monitoring-influxdb-67f8d587dd-zqj6p 1/1 Running 0 2m

我们可以看到monitoring-grafana的POD 是没有执行成功的,通过查看日志可以看到下面的错误信息:

Failed to parse /etc/grafana/grafana.ini, open /etc/grafana/grafana.ini: no such file or directory

要解决这个问题(heapster issues)我们需要将grafana 的镜像版本更改成:gcr.io/google_containers/heapster-grafana-amd64:v4.0.2,然后重新执行,即可正常。

访问 grafana

上面我们修改grafana 的Service 为NodePort 类型:

1
2
3
$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
monitoring-grafana NodePort 10.254.34.89 <none> 80:30191/TCP 28m

则我们就可以通过任意一个节点加上上面的30191端口就可以访问grafana 了。

imagegrafana ui

heapster 正确安装后,我们便可以回去看我们的dashboard 是否有图表出现了:

imagedashboard

12. 安装Ingress

Ingress其实就是从kuberenets集群外部访问集群的一个入口,将外部的请求转发到集群内不同的Service 上,其实就相当于nginx、apache 等负载均衡代理服务器,再加上一个规则定义,路由信息的刷新需要靠Ingress controller来提供

Ingress controller可以理解为一个监听器,通过不断地与kube-apiserver打交道,实时的感知后端service、pod 等的变化,当得到这些变化信息后,Ingress controller再结合Ingress的配置,更新反向代理负载均衡器,达到服务发现的作用。其实这点和服务发现工具consul的consul-template非常类似。

部署traefik

Traefik是一款开源的反向代理与负载均衡工具。它最大的优点是能够与常见的微服务系统直接整合,可以实现自动化动态配置。目前支持Docker、Swarm、Mesos/Marathon、 Mesos、Kubernetes、Consul、Etcd、Zookeeper、BoltDB、Rest API等等后端模型。

imagetraefik

创建rbac
创建文件:ingress-rbac.yaml,用于service account验证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: ServiceAccount
metadata:
name: ingress
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
subjects:
- kind: ServiceAccount
name: ingress
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io

DaemonSet 形式部署traefik
创建文件:traefik-daemonset.yaml,为保证traefik 总能提供服务,在每个节点上都部署一个traefik,所以这里使用DaemonSet 的形式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
kind: ConfigMap
apiVersion: v1
metadata:
name: traefik-conf
namespace: kube-system
data:
traefik-config: |-
defaultEntryPoints = ["http","https"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
CertFile = "/ssl/ssl.crt"
KeyFile = "/ssl/ssl.key"

---
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: traefik-ingress
namespace: kube-system
labels:
k8s-app: traefik-ingress
spec:
template:
metadata:
labels:
k8s-app: traefik-ingress
name: traefik-ingress
spec:
terminationGracePeriodSeconds: 60
restartPolicy: Always
serviceAccountName: ingress
containers:
- image: traefik:latest
name: traefik-ingress
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: admin
containerPort: 8080
args:
- --configFile=/etc/traefik/traefik.toml
- -d
- --web
- --kubernetes
- --logLevel=DEBUG
volumeMounts:
- name: traefik-config-volume
mountPath: /etc/traefik
- name: traefik-ssl-volume
mountPath: /ssl
volumes:
- name: traefik-config-volume
configMap:
name: traefik-conf
items:
- key: traefik-config
path: traefik.toml
- name: traefik-ssl-volume
secret:
secretName: traefik-ssl

注意上面的yaml 文件中我们添加了一个名为traefik-conf的ConfigMap,该配置是用来将http 请求强制跳转成https,并指定https 所需CA 文件地址,这里我们使用secret的形式来指定CA 文件的路径:

1
2
3
4
$ ls
ssl.crt ssl.key
$ kubectl create secret generic traefik-ssl --from-file=ssl.crt --from-file=ssl.key --namespace=kube-system
secret "traefik-ssl" created

创建ingress
创建文件:traefik-ingress.yaml,现在可以通过创建ingress文件来定义请求规则了,根据自己集群中的service 自己修改相应的serviceName 和servicePort

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: traefik-ingress
spec:
rules:
- host: traefik.nginx.io
http:
paths:
- path: /
backend:
serviceName: my-nginx
servicePort: 80

执行创建命令:

1
2
3
4
5
6
7
8
$ kubectl create -f ingress-rbac.yaml
serviceaccount "ingress" created
clusterrolebinding "ingress" created
$ kubectl create -f traefik-daemonset.yaml
configmap "traefik-conf" created
daemonset "traefik-ingress" created
$ kubectl create -f traefik-ingress.yaml
ingress "traefik-ingress" created

Traefik UI
创建文件:traefik-ui.yaml,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: v1
kind: Service
metadata:
name: traefik-ui
namespace: kube-system
spec:
selector:
k8s-app: traefik-ingress
ports:
- name: web
port: 80
targetPort: 8080
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: traefik-ui
namespace: kube-system
spec:
rules:
- host: traefik-ui.local
http:
paths:
- path: /
backend:
serviceName: traefik-ui
servicePort: web

测试

部署完成后,在本地/etc/hosts添加一条配置:

1
2
# 将下面的xx.xx.xx.xx替换成任意节点IP
xx.xx.xx.xx master03 traefik.nginx.io traefik-ui.local

配置完成后,在本地访问:traefik-ui.local,则可以访问到traefik的dashboard页面:

imagetraefik dashboard

同样的可以访问traefik.nginx.io,得到正确的结果页面:

imagenginx

上面配置完成后,就可以将我们的所有节点加入到一个SLB中,然后配置相应的域名解析到SLB即可。

13. 问题汇总

1. dashboard无法显示监控图
dashboard 和heapster influxdb都部署完成后 dashboard依旧无法显示监控图 通过排查 heapster log有超时错误

1
2
3
$ kubectl logs -f pods/heapster-2882613285-58d9r -n kube-system

E0630 17:23:47.339987 1 reflector.go:203] k8s.io/heapster/metrics/sources/kubelet/kubelet.go:342: Failed to list *api.Node: Get http://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: i/o timeout E0630 17:23:47.340274 1 reflector.go:203] k8s.io/heapster/metrics/heapster.go:319: Failed to list *api.Pod: Get http://kubernetes.default/api/v1/pods?resourceVersion=0: dial tcp: i/o timeout E0630 17:23:47.340498 1 reflector.go:203] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:84: Failed to list *api.Namespace: Get http://kubernetes.default/api/v1/namespaces?resourceVersion=0: dial tcp: lookup kubernetes.default on 10.254.0.2:53: dial udp 10.254.0.2:53: i/o timeout E0630 17:23:47.340563 1 reflector.go:203] k8s.io/heapster/metrics/heapster.go:327: Failed to list *api.Node: Get http://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on 10.254.0.2:53: dial udp 10.254.0.2:53: i/o timeout E0630 17:23:47.340623 1 reflector.go:203] k8s.io/heapster/metrics/processors/node_autoscaling_enricher.go💯 Failed to list *api.Node: Get http://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on 10.254.0.2:53: dial udp 10.254.0.2:53: i/o timeout E0630 17:23:55.014414 1 influxdb.go:150] Failed to create infuxdb: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp: lookup monitoring-influxdb on 10.254.0.2:53: read udp 172.30.45.4:48955->10.254.0.2:53: i/o timeout`

我是docker的systemd Unit文件忘记添加

1
ExecStart=/root/local/bin/dockerd --log-level=error $DOCKER_NETWORK_OPTIONS

后边的$DOCKER_NETWORK_OPTIONS,导致docker0的网段跟flannel.1不一致。

2. kube-proxy报错kube-proxy[2241]: E0502 15:55:13.889842 2241 conntrack.go:42] conntrack returned error: error looking for path of conntrack: exec: “conntrack”: executable file not found in $PATH
导致现象:kubedns启动成功,运行正常,但是service 之间无法解析,kubernetes中的DNS解析异常

解决方法:CentOS中安装conntrack-tools包后重启kubernetes 集群即可。

3. Unable to access kubernetes services: no route to host
导致现象: 在POD 内访问集群的某个服务的时候出现no route to host

1
2
$ curl my-nginx.nx.svc.cluster.local
curl: (7) Failed connect to my-nginx.nx.svc.cluster.local:80; No route to host

解决方法:清除所有的防火墙规则,然后重启docker 服务

1
2
$ iptables --flush && iptables -tnat --flush
$ systemctl restart docker

4. 使用NodePort 类型的服务,只能在POD 所在节点进行访问
导致现象: 使用NodePort 类型的服务,只能在POD 所在节点进行访问,其他节点通过NodePort 不能正常访问

解决方法: kube-proxy 默认使用的是proxy_model就是iptables,正常情况下是所有节点都可以通过NodePort 进行访问的,我这里将阿里云的安全组限制全部去掉即可,然后根据需要进行添加安全限制。

参考资料

和我一步步部署 kubernetes 集群
keepalived 配置
kubernetes issue
kubernetes heapster issue


全文完