kainstall 推荐一个工具:使用 kainstall 工具一键部署 kubernetes (k8s) 高可用集群

lework · 2020年11月12日 · 最后由 zhangdongdong7 回复于 2021年12月07日 · 650 次阅读
本帖已被设为精华帖!

介绍

kainstall = kubeadm install kubernetes

使用 shell 脚本, 基于 kubeadm 一键部署 kubernetes HA 集群, 轻松帮您打造一个可在 生产环境 下使用的健壮集群。

https://github.com/lework/kainstall

为什么?

为什么要搞这个?Ansible PlayBook 不好么?

因为懒,Ansible PlayBook 编排是非常给力的,不过需要安装 Python 和 Ansible, 且需要下载多个 yaml 文件 。因为懒,我想要个更简单的方式来快速部署一个分布式的 Kubernetes HA 集群, 使用 shell 脚本可以不借助外力直接在服务器上运行,省时省力。 并且 shell 脚本只有一个文件,文件大小100 KB 左右,非常小巧,可以实现一条命令安装集群的超快体验,而且配合离线安装包,可以在不联网的环境下安装集群,这体验真的非常爽啊。

功能

  • 服务器初始化。
    • 关闭 selinux
    • 关闭 swap
    • 关闭 firewalld
    • 关闭大内存页
    • 配置 epel
    • 修改 limits
    • 配置内核参数
    • 配置 history 记录
    • 配置 journal 日志
    • 配置 chrony 时间同步
    • 添加 ssh-login-info 信息
    • 配置 audit 审计
    • 安装 ipvs 模块
    • 更新内核
  • 安装docker, kube组件。
  • 初始化kubernetes集群,以及增加或删除节点。
  • 安装ingress组件,可选nginxtraefik
  • 安装network组件,可选flannelcalico, 需在初始化时指定。
  • 安装monitor组件,可选prometheus
  • 安装log组件,可选elasticsearch
  • 安装storage组件,可选rooklonghorn
  • 安装web ui组件,可选dashboard, kubesphere
  • 安装addon组件,可选metrics-server, nodelocaldns
  • 升级到kubernetes指定版本。
  • 更新集群证书。
  • 添加运维操作,如备份 etcd 快照。
  • 支持离线部署
  • 支持sudo 特权
  • 支持10 年证书期限
  • 支持v1.15+的 kubernetes。

一键初始化

bash -c "$(curl -sSL https://cdn.jsdelivr.net/gh/lework/kainstall/kainstall.sh)"  \
  - init \
  --master 192.168.77.130,192.168.77.131,192.168.77.132 \
  --worker 192.168.77.133,192.168.77.134 \
  --user root \
  --password 123456 \
  --port 22 \
  --version 1.19.3

更多操作见: kainstall 仓库

离线部署

wget http://kainstall.oss-cn-shanghai.aliyuncs.com/1.19.3/centos7.tgz

bash -c "$(curl -sSL https://cdn.jsdelivr.net/gh/lework/kainstall/kainstall.sh)"  \
  - init \
  --master 192.168.77.130,192.168.77.131,192.168.77.132 \
  --worker 192.168.77.133,192.168.77.134 \
  --user root \
  --password 123456 \
  --port 22 \
  --version 1.19.3 \
  --offline-file centos7.tgz 

更多离线包: kainstall-offline 仓库

联系

创建了一个 QQ 群 467645743 大家有问题的可以加进来。

顶顶顶

lework 将本帖设为了精华贴 11月13日 10:58

大佬,shell 写的真好。能不能透露一下怎么学的 shell?大佬还有其他 shell 作品吗?

comeonyng43 回复
  1. 首先学会 shell 语法。
  2. 尝试使用 shell 来解决需求。
  3. 借鉴好脚本的写法。
  4. 看看最佳实践。
  5. shell 资料 https://cs.leops.cn/#/cheatsheet/linux/bash

您好:我在用脚本部署 k8s 集群时没有指定版本 bash kainstall-centos.sh init --master 192.168.200.121,192.168.200.122 --worker 192.168.200.123,192.168.200.124 --user root --password 1qaz2wsx --port 22,选择默认版本,

完成时提示报错: ERROR Summary: [2021-08-17T00:39:00.081914830+0800]: ERROR: [waiting] ingress-nginx pod ready failed. [2021-08-17T00:39:35.795033905+0800]: ERROR: [apply] add kubernetes dashboard ingress failed. 查看日志发现错误:

utils::retry 6 kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=60s ' Warning: Permanently added '192.168.200.121' (ECDSA) to the list of known hosts. error: timed out waiting for the condition on pods/ingress-nginx-controller-7d8f68bdbd-pk88q Retry 1/6 exited 1, retrying in 1 seconds... error: timed out waiting for the condition on pods/ingress-nginx-controller-7d8f68bdbd-pk88q Retry 2/6 exited 1, retrying in 2 seconds... error: timed out waiting for the condition on pods/ingress-nginx-controller-7d8f68bdbd-pk88q Retry 3/6 exited 1, retrying in 4 seconds... error: timed out waiting for the condition on pods/ingress-nginx-controller-7d8f68bdbd-pk88q Retry 4/6 exited 1, retrying in 8 seconds... error: timed out waiting for the condition on pods/ingress-nginx-controller-7d8f68bdbd-pk88q Retry 5/6 exited 1, retrying in 16 seconds... error: timed out waiting for the condition on pods/ingress-nginx-controller-7d8f68bdbd-pk88q Retry 6/6 exited 1, no more retries left. [2021-08-17T00:39:00.081914830+0800]: ERROR: [waiting] ingress-nginx pod ready failed.

Warning: Permanently added '192.168.200.121' (ECDSA) to the list of known hosts. error: unable to recognize "STDIN": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1" Retry 1/6 exited 1, retrying in 1 seconds... error: unable to recognize "STDIN": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1" Retry 2/6 exited 1, retrying in 2 seconds... error: unable to recognize "STDIN": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1" Retry 3/6 exited 1, retrying in 4 seconds... error: unable to recognize "STDIN": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1" Retry 4/6 exited 1, retrying in 8 seconds... error: unable to recognize "STDIN": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1" Retry 5/6 exited 1, retrying in 16 seconds... error: unable to recognize "STDIN": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1" Retry 6/6 exited 1, no more retries left. [2021-08-17T00:39:35.795033905+0800]: ERROR: [apply] add kubernetes dashboard ingress failed.

检测服务状态: [root@master01 opt]# kubectl get cs Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR scheduler Unhealthy Get "http://127.0.0.1:10251/healthz:" dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
[root@master01 opt]#

[root@master01 opt]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master-node1 Ready control-plane,master 23m v1.22.0 k8s-master-node2 Ready control-plane,master 19m v1.22.0 k8s-worker-node1 Ready worker 18m v1.22.0 k8s-worker-node2 Ready worker 18m v1.22.0 [root@master01 opt]# kubectl get ns NAME STATUS AGE default Active 24m ingress-nginx Active 15m kube-node-lease Active 24m kube-public Active 24m kube-system Active 24m kubernetes-dashboard Active 8m44s [root@master01 opt]#

因为 1.22 把 v1beta 去掉了,ingress crontrol 都有问题了,现在这个问题解决了。可以重新安装下 ingress

kainstall 上的 readme 中 qq 不能跳转,能否给一个 qq 群号?

添加 elasticsearch 之后 es 报错

java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:292)
    at org.elasticsearch.node.Node.<init>(Node.java:376)
    at org.elasticsearch.node.Node.<init>(Node.java:281)
    at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:219)
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:219)
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:399)
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159)
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150)
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:75)
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:116)
    at org.elasticsearch.cli.Command.main(Command.java:79)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:81)
For complete error details, refer to the log at /usr/share/elasticsearch/logs/k8s-logs.log

需要在 es-cluster 的 yaml 中的 env 下添加

name: node.max_local_storage_nodes value: "3"

重置集群,再次安装报错,复现步骤 1、bash kainstall-ubuntu.sh init -m xxx --version 1.20.6 正常 2、bash kainstall-ubuntu.sh add -w xxx --version 1.20.6 正常 3、bash kainstall-ubuntu.sh reset 正常 4、bash kainstall-ubuntu.sh init -m 172.17.31.10 -w 172.17.31.11 --version 1.20.6 报错

[2021-12-07T22:48:43.843009307+0800]: INFO:    [check] sshpass command exists.
[2021-12-07T22:48:43.844762232+0800]: INFO:    [check] wget command exists.
[2021-12-07T22:48:43.967038796+0800]: INFO:    [check] ssh 172.17.31.10 connection succeeded.
[2021-12-07T22:48:44.131261546+0800]: INFO:    [check] ssh 172.17.31.11 connection succeeded.
[2021-12-07T22:48:44.132638066+0800]: INFO:    [check] os support: ubuntu20.04 ubuntu20.10 ubuntu21.04 ubuntu18.04
[2021-12-07T22:48:44.246906858+0800]: INFO:    [check] 172.17.31.10 os support succeeded.
[2021-12-07T22:48:44.391252598+0800]: INFO:    [check] 172.17.31.11 os support succeeded.
[2021-12-07T22:48:44.394797116+0800]: INFO:    [init] Get 172.17.31.10 InternalIP.
[2021-12-07T22:48:44.512532553+0800]: INFO:    [command] get MGMT_NODE_IP value succeeded.
[2021-12-07T22:48:44.514228279+0800]: INFO:    [init] master: 172.17.31.10
[2021-12-07T22:48:52.129289350+0800]: INFO:    [init] init master 172.17.31.10 succeeded.
[2021-12-07T22:48:52.559981054+0800]: INFO:    [init] 172.17.31.10 set hostname and hostname resolution succeeded.
[2021-12-07T22:48:52.561673278+0800]: INFO:    [init] 172.17.31.10: set audit-policy file.
[2021-12-07T22:48:52.676739186+0800]: INFO:    [init] 172.17.31.10: set audit-policy file succeeded.
[2021-12-07T22:48:52.678454078+0800]: INFO:    [init] worker: 172.17.31.11
[2021-12-07T22:49:00.137476057+0800]: INFO:    [init] init worker 172.17.31.11 succeeded.
[2021-12-07T22:49:00.656946306+0800]: INFO:    [install] install docker on 172.17.31.10.
[2021-12-07T22:49:06.107280418+0800]: INFO:    [install] install docker on 172.17.31.10 succeeded.
[2021-12-07T22:49:06.109137495+0800]: INFO:    [install] install kube on 172.17.31.10
[2021-12-07T22:49:13.957090675+0800]: INFO:    [install] install kube on 172.17.31.10 succeeded.
[2021-12-07T22:49:13.959173245+0800]: INFO:    [install] install docker on 172.17.31.11.
[2021-12-07T22:49:17.974488740+0800]: INFO:    [install] install docker on 172.17.31.11 succeeded.
[2021-12-07T22:49:17.976069907+0800]: INFO:    [install] install kube on 172.17.31.11
[2021-12-07T22:49:25.850685739+0800]: INFO:    [install] install kube on 172.17.31.11 succeeded.
[2021-12-07T22:49:25.852310759+0800]: INFO:    [install] install haproxy on 172.17.31.11
[2021-12-07T22:49:28.783115900+0800]: INFO:    [install] install haproxy on 172.17.31.11 succeeded.
[2021-12-07T22:49:28.784874939+0800]: INFO:    [kubeadm init] kubeadm init on 172.17.31.10
[2021-12-07T22:49:28.786523020+0800]: INFO:    [kubeadm init] 172.17.31.10: set kubeadmcfg.yaml
[2021-12-07T22:49:28.909218319+0800]: INFO:    [kubeadm init] 172.17.31.10: set kubeadmcfg.yaml succeeded.
[2021-12-07T22:49:28.910937717+0800]: INFO:    [kubeadm init] 172.17.31.10: kubeadm init start.
[2021-12-07T22:53:32.232152130+0800]: ERROR:   [kubeadm init] 172.17.31.10: kubeadm init failed.

ERROR Summary:
  [2021-12-07T22:53:32.232152130+0800]: ERROR:   [kubeadm init] 172.17.31.10: kubeadm init failed.



  See detailed log >>> /tmp/kainstall.lZtGxGCHJV/kainstall.log

报错日志

root@i-hx0g9sad:~# tail -40  /tmp/kainstall.lZtGxGCHJV/kainstall.log
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

    Unfortunately, an error has occurred:
        timed out waiting for the condition

    This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

    Additionally, a control plane component may have crashed or exited when started by the container runtime.
    To troubleshoot, list all containers using your preferred container runtimes CLI.

    Here is one example how you may list all Kubernetes containers running in docker:
        - 'docker ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'docker logs CONTAINERID'

[2021-12-07T22:53:32.232152130+0800]: ERROR:   [kubeadm init] 172.17.31.10: kubeadm init failed.

kubelet 报错

需要 登录 后方可回复, 如果你还没有账号请点击这里 注册