本文详细介绍了如何在 Rocky Linux 系统上部署一个高可用的 Kubernetes 集群,包括系统配置、组件安装、网络设置等完整步骤。
本次部署采用的系统及组件版本:
离线安装包:链接: https://pan.baidu.com/s/19CjX1ImiwQTWqDleWiBwgg提取码: 8888
二进制安装请查看这个链接:https://blog.csdn.net/qq_39965541/article/details/157136178?spm=1011.2415.3001.5331
10.96.0.0/1210.244.0.0/16cat /etc/redhat-release# 应为 Rocky Linux release 10.1 (Red Quartz)/etc/hosts在所有节点:
echo'192.168.1.11 master01192.168.1.12 master02192.168.1.13 master03192.168.1.14 worker1192.168.1.15 worker2192.168.1.100 master-lb' >> /etc/hostssystemctl disable --now firewalldsetenforce 0sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinuxsed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/configswapoff -ased -i.bak '/swap/s/^/#/' /etc/fstabdnf install -y ntpd# 或systemctl status chronydecho"* soft nofile 65536" >> /etc/security/limits.confecho"* hard nofile 65536" >> /etc/security/limits.confecho"* soft nproc 65536" >> /etc/security/limits.confecho"* hard nproc 65536" >> /etc/security/limits.confecho"* soft memlock unlimited" >> /etc/security/limits.confecho"* hard memlock unlimited" >> /etc/security/limits.conf在主控机 Master01 上:
ssh-keygen -t rsa # 回车全部默认值for i in master01 master02 master03; do ssh-copy-id -i ~/.ssh/id_rsa.pub $idonednf install -y ipvsadm ipset sysstat conntrack libseccompmodprobe ip_vsmodprobe ip_vs_rrmodprobe ip_vs_wrrmodprobe ip_vs_shmodprobe nf_conntrackcat > /etc/modules-load.d/ipvs.conf <<EOFip_vsip_vs_lcip_vs_wlcip_vs_rrip_vs_wrrip_vs_lblcip_vs_lblcrip_vs_dhip_vs_ship_vs_foip_vs_nqip_vs_sedip_vs_ftpip_vs_shnf_conntrackip_tablesip_setxt_setipt_setipt_rpfilteript_REJECTipipEOFsystemctl enable --now systemd-modules-load.service检查加载情况:
lsmod | grep -e ip_vs -e nf_conntrack在所有节点创建 /etc/sysctl.d/k8s.conf:
cat <<EOF > /etc/sysctl.d/k8s.conf## 网络优化 启用 IPv4 数据包转发 CNI 网络插件如 Calico/Cilium 依赖net.ipv4.ip_forward = 1net.ipv4.tcp_tw_reuse = 2net.ipv4.tcp_timestamps = 1net.ipv4.tcp_fin_timeout = 30net.ipv4.conf.all.route_localnet = 1net.ipv4.tcp_max_tw_buckets = 36000net.ipv4.tcp_max_orphans = 327680net.ipv4.tcp_orphan_retries = 3net.ipv4.tcp_syncookies = 1net.ipv4.ip_conntrack_max = 65536net.core.somaxconn = 65535net.core.netdev_max_backlog = 65536# 增加 SYN 半连接队列长度net.ipv4.tcp_max_syn_backlog = 65536net.ipv4.tcp_rmem = 4096 12582912 16777216net.ipv4.tcp_wmem = 4096 12582912 16777216net.netfilter.nf_conntrack_max = 1048576net.ipv4.tcp_keepalive_time = 600net.ipv4.tcp_keepalive_intvl = 30net.ipv4.tcp_keepalive_probes = 10# 文件系统fs.file-max = 2097152fs.nr_open = 52706963fs.may_detach_mounts = 1fs.inotify.max_user_instances = 8192fs.inotify.max_user_watches = 524288# 内存管理vm.swappiness = 0vm.max_map_count = 262144vm.overcommit_memory = 1vm.panic_on_oom = 0kernel.panic = 10# 容器支持kernel.pid_max = 4194304net.bridge.bridge-nf-call-iptables = 1net.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-arptables = 1# Kubernetes 要求net.ipv4.conf.all.rp_filter = 0net.ipv4.conf.default.rp_filter = 0kernel.softlockup_panic = 1EOFsysctl --system确认内核模块仍已加载:
lsmod | grep --color=auto -e ip_vs -e nf_conntrack配置内核参数转发 IPv4 并让 iptables 看到桥接流量
cat <<EOF | sudo tee /etc/modules-load.d/k8s.confoverlaybr_netfilterEOFsudo modprobe overlaysudo modprobe br_netfilter# 应用 sysctl 参数而不重新启动sudo sysctl --system通过运行以下指令确认 br_netfilter 和 overlay 模块被加载:
lsmod | grep br_netfilterlsmod | grep overlay查看内核参数是否为 1
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forwardwget https://github.com/containerd/containerd/releases/download/v2.2.1/containerd-2.2.1-linux-amd64.tar.gztar xvf containerd-2.2.1-linux-amd64.tar.gzmv bin/* /usr/local/bin/mkdir /etc/containerdcontainerd config default > /etc/containerd/config.tomlcat > /usr/lib/systemd/system/containerd.service <<EOF[Unit]Description=containerd container runtimeDocumentation=https://containerd.ioAfter=network.target dbus.service[Service]ExecStartPre=-/sbin/modprobe overlayExecStart=/usr/local/bin/containerdType=notifyDelegate=yesKillMode=processRestart=alwaysRestartSec=5LimitNPROC=infinityLimitCORE=infinityTasksMax=infinityOOMScoreAdjust=-999[Install]WantedBy=multi-user.targetEOFsystemctl daemon-reloadsystemctl enable --now containerd下载地址:https://github.com/opencontainers/runc/releases/download/v1.4.0/runc.amd64
install -m 755 runc.amd64 /usr/local/sbin/runc下载地址:https://github.com/containernetworking/plugins/releases/download/v1.9.0/cni-plugins-linux-amd64-v1.9.0.tgz
mkdir -p /opt/cni/bintar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.9.0.tgz下载地址:https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.35.0/crictl-v1.35.0-linux-amd64.tar.gz
tar -xf crictl-v1.35.0-linux-amd64.tar.gz -C /usr/local/bincat > /etc/crictl.yaml <<EOFruntime-endpoint: unix:///var/run/containerd/containerd.sockimage-endpoint: unix:///var/run/containerd/containerd.socktimeout: 30debug: falsepull-image-on-create: falseEOFcgroup 详细介绍请查看 官方文档
编辑 /etc/containerd/config.toml 中对应部分:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] ShimCgroup = '' # 在这行下面添加 SystemdCgroup = true # 默认是没有这行的重启 containerd:
systemctl restart containerd在所有 Master 节点上:
dnf install -y haproxy keepalived所有 Master 节点共享相同配置文件 /etc/haproxy/haproxy.cfg,内容如下:
cat > /etc/haproxy/haproxy.cfg << EOFglobal maxconn 2000ulimit-n 16384log 127.0.0.1 local0 err stats timeout 30sdefaultslog global mode http option httplog timeout connect 5000 timeout client 50000 timeout server 50000 timeout http-request 15s timeout http-keep-alive 15sfrontend k8s-masterbind 0.0.0.0:8443 mode tcp option tcplog tcp-request inspect-delay 5s default_backend k8s-masterbackend k8s-master mode tcp balance roundrobin option httpchk GET /healthz http-check expect status 200 option tcp-check default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 server master01 192.168.1.11:6443 check server master02 192.168.1.12:6443 check server master03 192.168.1.13:6443 checkEOFMaster01:
cat > /etc/keepalived/keepalived.conf << EOF! Configuration File for keepalivedglobal_defs { router_id LVS_DEVEL}vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1}vrrp_instance VI_1 { state MASTER interface ens33 mcast_src_ip 192.168.1.11 virtual_router_id 51 priority 100 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.100 } track_script { chk_apiserver }}EOFMaster02:
cat > /etc/keepalived/keepalived.conf << EOF! Configuration File for keepalivedglobal_defs { router_id LVS_DEVEL}vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1}vrrp_instance VI_1 { state BACKUP interface ens33 mcast_src_ip 192.168.1.12 virtual_router_id 51 priority 99 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.100 } track_script { chk_apiserver } }EOFMaster03:
cat > /etc/keepalived/keepalived.conf << EOF! Configuration File for keepalivedglobal_defs { router_id LVS_DEVEL}vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1}vrrp_instance VI_1 { state BACKUP interface ens33 mcast_src_ip 192.168.1.13 virtual_router_id 51 priority 98 nopreempt advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 192.168.1.100 } track_script { chk_apiserver } }EOF健康检查脚本 /etc/keepalived/check_apiserver.sh:
cat > /etc/keepalived/check_apiserver.sh << EOF#!/bin/basherr=0for k in $(seq 1 3)do check_code=$(pgrep haproxy)if [[ $check_code == "" ]]; then err=$(expr $err + 1) sleep 1continueelse err=0breakfidoneif [[ $err != "0" ]]; thenecho"systemctl stop keepalived" /usr/bin/systemctl stop keepalivedexit 1elseexit 0fiEOFchmod +x /etc/keepalived/check_apiserver.sh启动服务:
systemctl daemon-reloadsystemctl enable --now haproxysystemctl enable --now keepalivedsystemctl status keepalived haproxy测试 VIP 是否可 ping 通:
ping 192.168.1.100cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/enabled=1gpgcheck=1gpgkey=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/repodata/repomd.xml.keyexclude=kubelet kubeadm kubectl cri-tools kubernetes-cniEOFdnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetessystemctl enable --now kubeletkubeadm config images list所需镜像(版本 v1.35.0):
registry.k8s.io/kube-apiserver:v1.35.0registry.k8s.io/kube-controller-manager:v1.35.0registry.k8s.io/kube-scheduler:v1.35.0registry.k8s.io/kube-proxy:v1.35.0registry.k8s.io/coredns/coredns:v1.13.1registry.k8s.io/pause:3.10.1registry.k8s.io/etcd:3.6.6-0 这个命令输出的镜像版本是3.6.6,这个再下载镜像的时候报错加上 -0 就好了 估计以后就没问题了导入镜像示例:
ctr -n k8s.io image import 加镜像名字 # 或者导入自己的镜像仓库在pull下来# 倒入好镜像以后用crictl查看ctr也能查看,但是不直观crictl images# ctr 好像又命名空间的概念 我也没研究过 要是嫌麻烦可以安装docker客户端工具管理containerdctr -n k8s.io images ls kubeadm config print init-defaults > kubeadm-init.yaml修改生成的 kubeadm-init.yaml,例子如下:
当前配置文件是 堆叠ETCD 说人话就是 内部ETCD 这个可以修改为外部ETCD 也就是二进制安装的ETCD
cat > ./kubeadm-init.yaml << EOFapiVersion: kubeadm.k8s.io/v1beta4# 引导令牌(保持默认即可)bootstrapTokens:- groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication# 本地API端点kind: InitConfigurationlocalAPIEndpoint: advertiseAddress: 192.168.1.11 bindPort: 6443nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent imagePullSerial: true name: master01 taints: null# 超时设置(保持默认即可)timeouts: controlPlaneComponentHealthCheck: 4m0s discovery: 5m0s etcdAPICall: 2m0s kubeletHealthCheck: 4m0s kubernetesAPICall: 1m0s tlsBootstrap: 5m0s upgradeManifests: 5m0s---apiServer: {}apiVersion: kubeadm.k8s.io/v1beta4caCertificateValidityPeriod: 87600h0m0scertificateValidityPeriod: 8760h0m0scertificatesDir: /etc/kubernetes/pkiclusterName: kubernetescontrollerManager: {}dns: {}encryptionAlgorithm: RSA-2048etcd:local: dataDir: /var/lib/etcdimageRepository: registry.k8s.iokind: ClusterConfigurationkubernetesVersion: 1.33.5networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 # 如果不是高可用集群 删除这行即可controlPlaneEndpoint: "192.168.1.100:8443"proxy: {}scheduler: {}EOF外部 etcd 配置:
etcd:local:dataDir:/var/lib/etcd替换为 external 配置块,并填写你的 etcd 集群 endpoints 及证书路径:
etcd:external:endpoints:-https://etcd-node1.example.com:2379-https://etcd-node2.example.com:2379-https://etcd-node3.example.com:2379caFile:/etc/kubernetes/pki/etcd/ca.crtcertFile:/etc/kubernetes/pki/apiserver-etcd-client.crtkeyFile:/etc/kubernetes/pki/apiserver-etcd-client.key说明:
endpoints:外部 etcd 集群各成员的访问地址列表(至少两个或三个实例以实 HA)。caFile:etcd CA 证书文件(用于 TLS 客户端验证)。certFile / keyFile:apiserver 与 etcd 通信所需的客户端证书和密钥。local 和 external 是互斥的。一旦使用 external,需要删除同一配置文件中保留 local etcd 配置。kubeadm init --config kubeadm-init.yaml --upload-certs若初始化失败,可重置再来:
kubeadm reset -f ; ipvsadm --clear ; rm -rf ~/.kube初始化成功后,配置 kubeconfig:
mkdir -p $HOME/.kubecp -i /etc/kubernetes/admin.conf $HOME/.kube/configchown $(id -u):$(id -g) $HOME/.kube/config# 或者如果是 root 用户export KUBECONFIG=/etc/kubernetes/admin.conf解释:要开始使用您的集群,您需要以普通用户身份运行以下命令
mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config或者,如果您是 root 用户,可以运行:
export KUBECONFIG=/etc/kubernetes/admin.conf您现在可以通过在每个控制平面节点上以 root 用户身份运行以下命令来加入任意数量的控制平面节点:
kubeadm join 192.168.1.100:8443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:3fcd0d0ac88c9a4f1321f6d15cb484b8f67b1492c10282f5faa3070b5741635f \ --control-plane --certificate-key bf521ccd59a5d33a2d8370e0ae9f10b7f00db3412f1c066aafd0e516c80664ae请注意,certificate-key 提供对集群敏感数据的访问权限,请保密!为了安全起见,上传的证书将在两个小时后被删除;如果需要,您可以使用 "kubeadm init phase upload-certs --upload-certs" 在之后重新加载证书。
然后,您可以通过在每个工作节点上以 root 用户身份运行以下命令来加入任意数量的工作节点:
kubeadm join 192.168.1.100:8443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:3fcd0d0ac88c9a4f1321f6d15cb484b8f67b1492c10282f5faa3070b5741635f下载地址:https://github.com/projectcalico/calico/blob/v3.31.3/manifests/calico-etcd.yaml 下载好以后修改配置
# 添加etcd 节点sed -i 's#etcd_endpoints: "http://<ETCD_IP>:<ETCD_PORT>"#etcd_endpoints: "https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379"#g' calico-etcd.yaml# 添加证书ETCD_CA=`cat /etc/kubernetes/pki/etcd/ca.crt | base64 | tr -d '\n'`ETCD_CERT=`cat /etc/kubernetes/pki/etcd/server.crt | base64 | tr -d '\n'`ETCD_KEY=`cat /etc/kubernetes/pki/etcd/server.key | base64 | tr -d '\n'`sed -i "s@# etcd-key: null@etcd-key: ${ETCD_KEY}@g; s@# etcd-cert: null@etcd-cert: ${ETCD_CERT}@g; s@# etcd-ca: null@etcd-ca: ${ETCD_CA}@g" calico-etcd.yaml# 添加证书路径sed -i 's#etcd_ca: ""#etcd_ca: "/calico-secrets/etcd-ca"#g; s#etcd_cert: ""#etcd_cert: "/calico-secrets/etcd-cert"#g; s#etcd_key: "" #etcd_key: "/calico-secrets/etcd-key" #g' calico-etcd.yaml# 修改pod网段地址POD_SUBNET="10.244.0.0/16"sed -i 's@# - name: CALICO_IPV4POOL_CIDR@- name: CALICO_IPV4POOL_CIDR@g; s@# value: "192.168.0.0/16"@ value: '"${POD_SUBNET}"'@g' calico-etcd.yaml全部修改好以后检查一遍没问题就可以部署了
kubectl create -f calico-etcd.yaml部署成功以后再次查看集群状态就没问题了
[root@master01 ~]# kubectl get node NAME STATUS ROLES AGE VERSIONmaster01 Ready control-plane 25m v1.35.0master02 Ready control-plane 24m v1.35.0master03 Ready control-plane 23m v1.35.0node01 Ready 24m v1.35.0node02 Ready 23m v1.35.0安装之前需要删除污点
kubectl taint node --all node-role.kubernetes.io/control-plane:NoSchedule-这是官方配置文件,直接拿来用会提示缺少证书:https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
以下为修改添加证书相关路径添加挂在点等等 证书文件路径为/etc/kubernetes/pki/front-proxy-ca.crt(部署集群时自动生成的证书)
在安装Metrics
cat > ./components.yaml << EapiVersion: v1kind: ServiceAccountmetadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: labels: k8s-app: metrics-server rbac.authorization.k8s.io/aggregate-to-admin: "true" rbac.authorization.k8s.io/aggregate-to-edit: "true" rbac.authorization.k8s.io/aggregate-to-view: "true" name: system:aggregated-metrics-readerrules:- apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: labels: k8s-app: metrics-server name: system:metrics-serverrules:- apiGroups: - "" resources: - nodes/metrics verbs: - get- apiGroups: - "" resources: - pods - nodes verbs: - get - list - watch---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: labels: k8s-app: metrics-server name: metrics-server-auth-reader namespace: kube-systemroleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-readersubjects:- kind: ServiceAccount name: metrics-server namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: labels: k8s-app: metrics-server name: metrics-server:system:auth-delegatorroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegatorsubjects:- kind: ServiceAccount name: metrics-server namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: labels: k8s-app: metrics-server name: system:metrics-serverroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-serversubjects:- kind: ServiceAccount name: metrics-server namespace: kube-system---apiVersion: v1kind: Servicemetadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-systemspec: ports: - appProtocol: https name: https port: 443 protocol: TCP targetPort: https selector: k8s-app: metrics-server---apiVersion: apps/v1kind: Deploymentmetadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-systemspec: selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: k8s-app: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=10250 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt - --requestheader-username-headers=X-Remote-User - --requestheader-group-headers=X-Remote-Group - --requestheader-extra-headers-prefix=X-Remote-Extra- image: registry.k8s.io/metrics-server/metrics-server:v0.8.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 name: metrics-server ports: - containerPort: 10250 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 resources: requests: cpu: 100m memory: 200Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 seccompProfile:type: RuntimeDefault volumeMounts: - mountPath: /tmp name: tmp-dir - mountPath: /etc/kubernetes/pki name: k8s-certs nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir - hostPath: path: /etc/kubernetes/pki name: k8s-certs---apiVersion: apiregistration.k8s.io/v1kind: APIServicemetadata: labels: k8s-app: metrics-server name: v1beta1.metrics.k8s.iospec: group: metrics.k8s.io groupPriorityMinimum: 100 insecureSkipTLSVerify: true service: name: metrics-server namespace: kube-system version: v1beta1 versionPriority: 100Ekubectl create -f components.yamlkubectl edit cm kube-proxy -n kube-system# 将 mode 修改为 "ipvs"更新Kube-Proxy的Pod:
kubectl patch daemonset kube-proxy -n kube-system -p '{"spec":{"template":{"metadata":{"annotations":{"date":"$(date +'%s')"}}}}}'验证模式:
curl 127.0.0.1:10249/proxyMode# 应显示 ipvs注意:下面部分相关组件都是一年以前的老版本 如果需要新版本直接在官网下载最新版本安装即可 安装方法可以参考我的教程
yum -y install bash-completionsource /usr/share/bash-completion/bash_completionsource <(kubectl completion bash)echo"source <(kubectl completion bash)" >> ~/.bashrc#加载bash-completionsource /etc/profile.d/bash_completion.sh /etc/kubernetes/manifests;更改后 kubelet 会自动重启对应 Pod。/etc/sysconfig/kubelet 和 /var/lib/kubelet/config.yaml。## 查看污点kubectl describe node | grep Taint## 删除污点kubectl taint node --all node-role.kubernetes.io/control-plane:NoSchedule-kubectl apply -f https://addons.kuboard.cn/kuboard/kuboard-v3.yamlkubectl get nodes确保所有节点状态为 Ready,角色分配符合预期(例如 control-plane、worker 等)。
kubectl get pods -n kube-system核心服务(如 CoreDNS、kube-proxy、calico/node 等)应为 Running。
kubectl get componentstatuses(注意:kubectl get cs/componentstatuses 在新版 Kubernetes 中已被标注为弃用,但仍可用于基本诊断)
kubectl cluster-info这会显示 API Server、DNS 等服务的地址,确保它们都处于可访问状态。
使用 readiness probe 查看 API 的健康状态:
kubectl get --raw='/readyz?verbose'返回 ok 表示 API 已准备就绪并可以处理请求。
dnsutils 或 busybox),然后执行 nslookup:cat<<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: busybox namespace: defaultspec: containers: - name: busybox image: docker.io/library/busybox:1.28command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: AlwaysEOFkubectl exec -ti busybox -- nslookup kubernetes.default能解析成功说明网络正常。
部署测试 Pod / Deployment:
kubectl apply -f https://k8s.io/examples/application/deployment.yamlkubectl get pods查看是否能正常创建并运行。
暴露服务:
kubectl expose deployment nginx-deployment --port=80 --type=NodePort访问节点 IP + 分配的 NodePort,确保服务可访问。
如果你已安装 Metrics Server:
kubectl top nodeskubectl top pods -n kube-system这些命令能返回节点和 Pod 的 CPU/内存使用情况,说明 Metrics API 正常工作。
使用以下命令查看系统事件日志,如果有资源调度或服务启动失败等问题,可以及时发现原因:
kubectl get events --sort-by='.metadata.creationTimestamp'也可以导出集群状态供诊断:
kubectl cluster-info dump本文详细介绍了如何在 Rocky Linux 系统上部署一个高可用的 Kubernetes 集群,包括:
通过本文的步骤,你可以搭建一个功能完整、高可用的 Kubernetes 集群,为你的应用提供稳定可靠的运行环境。
如果你在安装过程中遇到任何问题,欢迎在评论区留言,我会尽力帮助你解决。同时,也欢迎分享你的安装经验和优化建议!
小贴士:定期备份集群配置和 etcd 数据,确保在发生故障时能够快速恢复集群状态。