Kubernetes集群构建

Kubernetes集群构建

evobot 800 2021-11-25

Kubernetes集群构建

参考easzlab/kubeasz: 使用Ansible脚本安装K8S集群,介绍组件交互原理,方便直接,不受国内网络环境影响 (github.com)项目

架构规划

软硬件限制

  1. cpu及内存,master:至少1c2g,推荐2c4g。node:至少1c2g
  2. linux系统,内核版本至少3.10,推荐centos7/RHEL7
  3. docker:至少1.9版本,推荐1.12+
  4. etcd:至少2.0版本,推荐3.0+

节点规划

角色数量描述
部署节点1运行ansible/ezctl命令,建议独立节点
etcd节点3注意etcd集群需要1,3,5,...奇数个节点,一般复用master节点
master节点2高可用集群至少2个master节点,额外规划一个master VIP(虚地址
node节点3运行应用负载的节点,可根据需要提升机器配置/增加节点数

机器规划

IP主机名角色
192.168.93.128master1deploy、master1、lb1、etcd
192.168.93.129master2master2、lb2、etcd
192.168.93.130node1etcd、node
192.168.93.131node2node
192.168.93.135 VIP

集群部署

集群准备

四台服务器上全部执行以下命令:

yum install -y epel-release
yum update
yum install python -y

deploy节点

ansible安装

deploy节点安装和准备ansible,执行以下命令安装ansible

yum install python-pip git

pip install --upgrade pip -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

# 如果产生报错,则是因为升级pip版本跨度太大,可以用下面的命令先升级到中间版本
python -m pip install --upgrade pip==20.2.4

pip install --no-cache-dir ansible -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

pip install ansible

deploy配置

免密码登录配置

  1. deploy节点需要使用ssh-keygen生成密钥
  2. 使用for ip in 129 130 131;do ssh-copy-id 192.168.93.$ip;done命令拷贝密钥到其他机器

deploy节点编排k8s安装

# 下载工具脚本ezdown,举例使用kubeasz版本3.0.0
export release=3.1.1
wget https://github.com/easzlab/kubeasz/releases/download/${release}/ezdown
chmod +x ./ezdown
# 使用工具脚本下载
./ezdown -D

上面的脚本运行成功后,所有的文件(kubeasz代码,二进制,离线镜像)都已经在/etc/kubeasz目录下

执行/etc/kubeasz/ezctl new k8s-01命令创建集群配置实例:

[root@master1 kubeasz]# ./ezctl new k8s-01
2021-11-17 08:39:22 DEBUG generate custom cluster files in /etc/kubeasz/clusters/k8s-01
2021-11-17 08:39:22 DEBUG set version of common plugins
2021-11-17 08:39:22 DEBUG cluster k8s-01: files successfully created.
2021-11-17 08:39:22 INFO next steps 1: to config '/etc/kubeasz/clusters/k8s-01/hosts'
2021-11-17 08:39:22 INFO next steps 2: to config '/etc/kubeasz/clusters/k8s-01/config.yml'

根据提示配置/etc/kubeasz/clusters/k8s-01/hosts/etc/kubeasz/clusters/k8s-01/config.yml: 根据前面节点规划修改hosts 文件和其他集群层面的主要配置选项;其他集群组件等配置项可以在config.yml 文件中修改。

集群参数配置

/etc/kubeasz/clusters/k8s-01/hosts文件内容

# 'etcd' cluster should have odd member(s) (1,3,5,...)
[etcd]
192.168.93.128
192.168.93.129
192.168.93.130

# master node(s)
[kube_master]
192.168.93.128
192.168.93.129

# work node(s)
[kube_node]
192.168.93.130
192.168.93.131

# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'true' to install a harbor server; 'false' to integrate with existed one
[harbor]
#192.168.1.8 NEW_INSTALL=false

# [optional] loadbalance for accessing k8s from outside
[ex_lb]
192.168.93.128 LB_ROLE=backup EX_APISERVER_VIP=192.168.93.135 EX_APISERVER_PORT=8443
192.168.93.129 LB_ROLE=master EX_APISERVER_VIP=192.168.93.136 EX_APISERVER_PORT=8443

/etc/kubeasz/clusters/k8s-01/config.yml文件内容,需要将master和dns的IP修改与hosts内相同

MASTER_CERT_HOSTS:
  - "192.168.93.128"
  - "192.168.93.129"
  - "10.1.1.1"
  - "k8s.test.io"
  #- "www.test.com"

# coredns 自动安装
dns_install: "yes"
corednsVer: "1.8.4"
ENABLE_LOCAL_DNS_CACHE: true
dnsNodeCacheVer: "1.17.0"
# 设置 local dns cache 地址
LOCAL_DNS_CACHE: "10.68.0.2"

安装

安装介绍

一键安装:ezctl setup k8s-01 all

分步安装:

# 具体使用 ezctl help setup 查看分步安装帮助信息
# ezctl setup k8s-01 01
# ezctl setup k8s-01 02
# ezctl setup k8s-01 03
# ezctl setup k8s-01 04
...

分步安装步骤

  1. 创建证书和环境准备

    执行ezctl setup k8s-01 01开始环境准备的工作

  2. 安装etcd集群

    执行ezctl setup k8s-01 02会安装etcd集群,安装完成后,验证etcd集群状态:

    # 变量 $NODE_IPS是etcd节点ip地址
    export NODE_IPS="192.168.93.128 192.168.93.130 192.168.93.131"
    
    for ip in ${NODE_IPS}; do   
        ETCDCTL_API=3 etcdctl  | 
        --endpoints=https://${ip}:2379 \
        --cacert=/etc/kubernetes/ssl/ca.pem \
        --cert=/etc/kubernetes/ssl/etcd.pem \ 
        --key=/etc/kubernetes/ssl/etcd-key.pem  \ 
        endpoint health; done
    

    输出结果如下,表示etcd节点工作正常:

    https://192.168.93.128:2379 is healthy: successfully committed proposal: took = 17.886794ms
    https://192.168.93.130:2379 is healthy: successfully committed proposal: took = 23.42423ms
    https://192.168.93.131:2379 is healthy: successfully committed proposal: took = 18.927204ms
    
    
  3. 安装容器(docker)

    执行ezctl setup k8s-01 03命令,会在集群的每台机器上安装docker服务。

  4. 安装master节点

    master节点主要包含三个组件apiserverschedulercontroller-manager其中:

    • apiserver提供集群管理的REST API接口,包括认证授权、数据校验以及集群状态变更等
      • 只有API Server才直接操作etcd
      • 其他模块通过API Server查询或修改数据
      • 提供其他模块之间的数据交互和通信的枢纽
      • scheduler负责分配调度Pod到集群内的node节点
    • 监听kube-apiserver,查询还未分配Node的Pod
      • 根据调度策略为这些Pod分配节点
    • controller-manager由一系列的控制器组成,它通过apiserver监控整个集群的状态,并确保集群处于预期的工作状态

    执行ezctl setup k8s-01 04安装master节点,安装完成后,先查看master节点的服务状态:

    # 查看进程状态
    systemctl status kube-apiserver
    systemctl status kube-controller-manager
    systemctl status kube-scheduler
    # 查看进程运行日志
    journalctl -u kube-apiserver
    journalctl -u kube-controller-manager
    journalctl -u kube-scheduler
    

    使用kubectl get componentstatus或者kubectl get cs查看master节点状态:

    NAME                 STATUS      MESSAGE        ERROR
    controller-manager   Unhealthy   Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
    scheduler            Healthy     ok
    etcd-1               Healthy     {"health":"true","reason":""}
    etcd-2               Healthy     {"health":"true","reason":""}
    etcd-0               Healthy     {"health":"true","reason":""}
    

    如果发现controller-manager的状态是unhealthy,那么需要修改/etc/systemd/system/kube-controller-manager.service文件,将bind-address改为127.0.0.1:

    [Service]
    ExecStart=/opt/kube/bin/kube-controller-manager \
      --bind-address=127.0.0.1 \
    

    然后执行systemctl daemon-reloadsystemctl restart kube-controller-manager重启服务,然后再查看master状态:

    [root@master1 ~]# kubectl get cs
    Warning: v1 ComponentStatus is deprecated in v1.19+
    NAME                 STATUS    MESSAGE                         ERROR
    scheduler            Healthy   ok
    etcd-1               Healthy   {"health":"true","reason":""}
    controller-manager   Healthy   ok
    etcd-2               Healthy   {"health":"true","reason":""}
    etcd-0               Healthy   {"health":"true","reason":""}
    
  5. 安装node节点

    kube_node 是集群中运行工作负载的节点,前置条件需要先部署好kube_master节点,它需要部署如下组件:

    • kubelet: kube_node上最主要的组件
    • kube-proxy: 发布应用服务与负载均衡
    • haproxy:用于请求转发到多个 apiserver,详见HA-2x 架构
    • calico: 配置容器网络 (或者其他网络组件)

    执行ezctl setup k8s-01 05开始安装node节点,安装时也会将master节点一并安装node节点的组件,但不会开启Scheduler服务,安装完成后,查看node节点组件服务状态:

    systemctl status kubelet	# 查看状态
    systemctl status kube-proxy
    journalctl -u kubelet		# 查看日志
    journalctl -u kube-proxy 
    
    [root@master1 ~]# kubectl get node
    NAME             STATUS                     ROLES    AGE   VERSION
    192.168.93.128   Ready,SchedulingDisabled   master   17m   v1.22.2
    192.168.93.129   Ready,SchedulingDisabled   master   17m   v1.22.2
    192.168.93.130   Ready                      node     16m   v1.22.2
    192.168.93.131   Ready                      node     16m   v1.22.2
    
  6. 部署集群网络

    首先回顾下K8S网络设计原则,在配置集群网络插件或者实践K8S 应用/服务部署请时刻想到这些原则:

    • 每个Pod都拥有一个独立IP地址,Pod内所有容器共享一个网络命名空间
    • 集群内所有Pod都在一个直接连通的扁平网络中,可通过IP直接访问
      • 所有容器之间无需NAT就可以直接互相访问
      • 所有Node和所有容器之间无需NAT就可以直接互相访问
      • 容器自己看到的IP跟其他容器看到的一样
    • Service cluster IP尽可在集群内部访问,外部请求需要通过NodePort、LoadBalance或者Ingress来访问

    Kubernetes Pod的网络是这样创建的:

    ​ 1.每个Pod除了创建时指定的容器外,都有一个kubelet启动时指定的基础容器,比如:easzlab/pause-amd64 registry.access.redhat.com/rhel7/pod-infrastructure

    ​ 2.首先 kubelet创建基础容器生成network namespace

    ​ 3.然后 kubelet调用网络CNI driver,由它根据配置调用具体的CNI 插件

    ​ 4.然后 CNI 插件给基础容器配置网络

    ​ 5.最后 Pod 中其他的容器共享使用基础容器的网络

    执行ezctl setup k8s-01 06安装网络组件,安装完成后查看网络服务:

    [root@master1 ~]# kubectl get pod -n kube-system
    NAME                                         READY   STATUS             RESTARTS       AGE
    kube-flannel-ds-amd64-fvchk                  1/1     Running            1 (12m ago)    18m
    kube-flannel-ds-amd64-hm5kd                  1/1     Running            1 (11m ago)    18m
    kube-flannel-ds-amd64-k65mj                  1/1     Running            1 (13m ago)    18m
    kube-flannel-ds-amd64-twtql                  1/1     Running            1 (11m ago)    18m
    
  7. 安装集群主要插件

    这一步主要安装一些DNS,dashboad插件,使用ezctl setup k8s-01 07安装,安装完成后,查看kube-system namespace的服务:

    [root@master1 ~]# kubectl get svc -n kube-system
    NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
    dashboard-metrics-scraper   ClusterIP   10.68.209.199   <none>        8000/TCP                 28m
    kube-dns                    ClusterIP   10.68.0.2       <none>        53/UDP,53/TCP,9153/TCP   28m
    kube-dns-upstream           ClusterIP   10.68.31.79     <none>        53/UDP,53/TCP            28m
    kubernetes-dashboard        NodePort    10.68.24.112    <none>        443:32707/TCP            28m
    metrics-server              ClusterIP   10.68.129.80    <none>        443/TCP                  28m
    node-local-dns              ClusterIP   None            <none>        9253/TCP                 28m
    

测试

首先查看集群信息,使用kubectl cluster-info命令查看:

[root@master1 ~]# kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
KubeDNSUpstream is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns-upstream:dns/proxy
kubernetes-dashboard is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

查看node/pod的资源使用情况,命令如下:

kubectl top node
kubectl top pod --all-namespaces
[root@master1 ~]# kubectl top node
NAME             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
192.168.93.128   130m         6%     839Mi           35%
192.168.93.129   93m          4%     766Mi           32%
192.168.93.130   67m          3%     473Mi           20%
192.168.93.131   37m          1%     390Mi           16%
[root@master1 ~]# kubectl top pod --all-namespaces
NAMESPACE     NAME                                         CPU(cores)   MEMORY(bytes)
kube-system   coredns-67cb59d684-8lflh                     2m           14Mi
kube-system   dashboard-metrics-scraper-856586f554-g6jwq   1m           10Mi
kube-system   kube-flannel-ds-amd64-fvchk                  2m           19Mi
kube-system   kube-flannel-ds-amd64-hm5kd                  2m           21Mi
kube-system   kube-flannel-ds-amd64-k65mj                  2m           23Mi
kube-system   kube-flannel-ds-amd64-twtql                  2m           12Mi
kube-system   kubernetes-dashboard-65b659dd64-qfmz8        1m           14Mi
kube-system   metrics-server-7d567f6489-s9zqs              2m           24Mi
kube-system   node-local-dns-8mmqk                         2m           9Mi
kube-system   node-local-dns-qsg7c                         2m           11Mi

DNS测试:

  • 首先创建nginx server,使用kubectl run nginx --image=nginx --expose --port=80快速创建nginx的service和pod:

    [root@master1 ~]# kubectl run nginx --image=nginx --expose --port=80
    service/nginx created
    pod/nginx created
    [root@master1 ~]# kubectl get pod
    NAME    READY   STATUS    RESTARTS   AGE
    nginx   1/1     Running   0          12s
    [root@master1 ~]# kubectl get svc
    NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
    kubernetes   ClusterIP   10.68.0.1      <none>        443/TCP   43m
    nginx        ClusterIP   10.68.177.44   <none>        80/TCP    62s
    
  • 再运行一个busybox,并且进入容器内运行nslookup命令,查看dns状态:

    [root@master1 ~]# kubectl run busybox --rm -it --image=busybox /bin/sh
    If you don't see a command prompt, try pressing enter.
    / # nslookup nginx.default.svc.cluster.local
    Server:         10.68.0.2
    Address:        10.68.0.2:53
    
    Name:   nginx.default.svc.cluster.local
    Address: 10.68.252.155
    
    *** Can't find nginx.default.svc.cluster.local: No answer
    

    可以看到上面的10.68.0.2就是dns服务器的地址,并且监听了53端口,看一下所有的namespaces下面svc的cluster-ip:

    [root@master1 ~]# kubectl get svc --all-namespaces
    NAMESPACE     NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
    default       kubernetes                  ClusterIP   10.68.0.1       <none>        443/TCP                  56m
    default       nginx                       ClusterIP   10.68.252.155   <none>        80/TCP                   6m6s
    kube-system   dashboard-metrics-scraper   ClusterIP   10.68.209.199   <none>        8000/TCP                 50m
    kube-system   kube-dns                    ClusterIP   10.68.0.2       <none>        53/UDP,53/TCP,9153/TCP   50m
    kube-system   kube-dns-upstream           ClusterIP   10.68.31.79     <none>        53/UDP,53/TCP            50m
    kube-system   kubernetes-dashboard        NodePort    10.68.24.112    <none>        443:32707/TCP            50m
    kube-system   metrics-server              ClusterIP   10.68.129.80    <none>        443/TCP                  50m
    kube-system   node-local-dns              ClusterIP   None            <none>        9253/TCP                 50m
    

    在busybox里可以直接pings pod的NAME,DNS会自动解析pod的cluster-ip,但是不能ping通kube-system这个namespace下的pod:

    [root@master1 ~]# kubectl run busybox --rm -it --image=busybox /bin/sh
    If you don't see a command prompt, try pressing enter.
    / # ping nginx
    PING nginx (10.68.252.155): 56 data bytes
    64 bytes from 10.68.252.155: seq=0 ttl=64 time=0.069 ms
    64 bytes from 10.68.252.155: seq=1 ttl=64 time=0.082 ms
    ^C
    --- nginx ping statistics ---
    2 packets transmitted, 2 packets received, 0% packet loss
    round-trip min/avg/max = 0.069/0.075/0.082 ms
    / # ping kubernetes
    PING kubernetes (10.68.0.1): 56 data bytes
    64 bytes from 10.68.0.1: seq=0 ttl=64 time=0.042 ms
    64 bytes from 10.68.0.1: seq=1 ttl=64 time=0.096 ms
    ^C
    --- kubernetes ping statistics ---
    2 packets transmitted, 2 packets received, 0% packet loss
    round-trip min/avg/max = 0.042/0.069/0.096 ms
    / # ping kubernetes-dashboard
    ping: bad address 'kubernetes-dashboard'
    

其他功能

  • 要新增node节点,只需要执行ezctl add-node <cluster> <node-ip>即可,上面的例子中,我们的cluster是k8s-01

  • 备份,恢复,都可以使用ezctl命令来执行,具体的用法可以使用ezctl -h命令来查看。


# k8s