Compare languages | Managing control plane: FAQ


How do I add a master node to a static or hybrid cluster?	Как добавить master-узел в статическом или гибридном кластере?
It is important to have an odd number of masters to ensure a quorum.	Важно иметь нечетное количество master-узлов для обеспечения кворума.
Adding a master node to a static or hybrid cluster has no difference from adding a regular node to a cluster. To do this, use the corresponding examples. All the necessary actions to configure a cluster control plane components on the new master nodes are performed automatically. Wait until the master nodes appear in `Ready` status.	Добавление master-узла в статический или гибридный кластер ничем не отличается от добавления обычного узла в кластер. Воспользуйтесь для этого соответствующими примерами. Все необходимые действия по настройке компонентов control plane кластера на новом узле будут выполнены автоматически, дождитесь их завершения — появления master-узлов в статусе `Ready`.

How do I add master nodes to a cloud cluster?	Как добавить master-узлы в облачном кластере?
The following describes the conversion of a single-master cluster into a multi-master.	Далее описана конвертация кластера с одним master-узлом в мультимастерный кластер.
Before adding nodes, ensure you have the required quotas in the cloud provider. It is important to have an odd number of masters to ensure a quorum.	Перед добавлением узлов убедитесь в наличии необходимых квот. Важно иметь нечетное количество master-узлов для обеспечения кворума.
Make a backup of `etcd` and the `/etc/kubernetes` directory. Transfer the archive to a server outside the cluster (e.g., on a local machine). Ensure there are no alerts in the cluster that can prevent the creation of new master nodes. Make sure that Deckhouse queue is empty. Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):	Сделайте резервную копию `etcd` и папки `/etc/kubernetes`. Скопируйте полученный архив за пределы кластера (например, на локальную машину). Убедитесь, что в кластере нет алертов, которые могут помешать созданию новых master-узлов. Убедитесь, что очередь Deckhouse пуста. На локальной машине запустите контейнер установщика Deckhouse соответствующей редакции и версии (измените адрес container registry при необходимости):
bash DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/version}’) DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/edition}’ \| tr ‘[:upper:]’ ‘[:lower:]’ ) docker run –pull=always -it -v “$HOME/.ssh/:/tmp/.ssh/” registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash	bash DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/version}’) DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/edition}’ \| tr ‘[:upper:]’ ‘[:lower:]’ ) docker run –pull=always -it -v “$HOME/.ssh/:/tmp/.ssh/” registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
In the installer container, run the following command to check the state before working:	В контейнере с инсталлятором выполните следующую команду, чтобы проверить состояние перед началом работы:
bash dhctl terraform check –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host	bash dhctl terraform check –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host
The command output should indicate that Terraform found no inconsistencies and no changes are required.	Ответ должен сообщить, что Terraform не нашел расхождений и изменений не требуется.
In the installer container, run the following command and specify the required number of replicas using the `masterNodeGroup.replicas` parameter:	В контейнере с инсталлятором выполните следующую команду и укажите требуемое количество master-узлов в параметре `masterNodeGroup.replicas`:
bash dhctl config edit provider-cluster-configuration –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host	bash dhctl config edit provider-cluster-configuration –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host
For Yandex Cloud, when using external addresses on master nodes, the number of array elements in the masterNodeGroup.instanceClass.externalIPAddresses parameter must equal the number of master nodes. If `Auto` is used (public IP addresses are provisioned automatically), the number of array elements must still equal the number of master nodes. To illustrate, with three master nodes (`masterNodeGroup.replicas: 3`) and automatic address reservation, the `masterNodeGroup.instanceClass.externalIPAddresses` parameter would look as follows: bash externalIPAddresses: “Auto” “Auto” “Auto”	Для Yandex Cloud, при использовании внешних адресов на master-узлах, количество элементов массива в параметре masterNodeGroup.instanceClass.externalIPAddresses должно равняться количеству master-узлов. При использовании значения `Auto` (автоматический заказ публичных IP-адресов), количество элементов в массиве все равно должно соответствовать количеству master-узлов. Например, при трех master-узлах (`masterNodeGroup.replicas: 3`) и автоматическом заказе адресов, параметр `masterNodeGroup.instanceClass.externalIPAddresses` будет выглядеть следующим образом: bash externalIPAddresses: “Auto” “Auto” “Auto”
In the installer container, run the following command to start scaling:	В контейнере с инсталлятором выполните следующую команду для запуска масштабирования:
bash dhctl converge –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host	bash dhctl converge –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host
Wait until the required number of master nodes are `Ready` and all `control-plane-manager` instances are up and running:	Дождитесь появления необходимого количества master-узлов в статусе `Ready` и готовности всех экземпляров `control-plane-manager`:
bash kubectl -n kube-system wait pod –timeout=10m –for=condition=ContainersReady -l app=d8-control-plane-manager	bash kubectl -n kube-system wait pod –timeout=10m –for=condition=ContainersReady -l app=d8-control-plane-manager

How do I reduce the number of master nodes in a cloud cluster?	Как уменьшить число master-узлов в облачном кластере?
The following describes the conversion of a multi-master cluster into a single-master.	Далее описана конвертация мультимастерного кластера в кластер с одним master-узлом.
The steps described below must be performed from the first in order of the master node of the cluster (master-0). This is because the cluster is always scaled in order: for example, it is impossible to delete nodes master-0 and master-1, leaving master-2.	Описанные ниже шаги необходимо выполнять с первого по порядку master-узла кластера (master-0). Это связано с тем, что кластер всегда масштабируется по порядку: например, невозможно удалить узлы master-0 и master-1, оставив master-2.
Make a backup of etcd and the `/etc/kubernetes` directory. Transfer the archive to a server outside the cluster (e.g., on a local machine). Ensure there are no alerts in the cluster that can prevent the update of the master nodes. Make sure that Deckhouse queue is empty. Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):	Сделайте резервную копию `etcd` и папки `/etc/kubernetes`. Скопируйте полученный архив за пределы кластера (например, на локальную машину). Убедитесь, что в кластере нет алертов, которые могут помешать обновлению master-узлов. Убедитесь, что очередь Deckhouse пуста. На локальной машине запустите контейнер установщика Deckhouse соответствующей редакции и версии (измените адрес container registry при необходимости):
bash DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/version}’) DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/edition}’ \| tr ‘[:upper:]’ ‘[:lower:]’ ) docker run –pull=always -it -v “$HOME/.ssh/:/tmp/.ssh/” registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash	bash DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/version}’) DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/edition}’ \| tr ‘[:upper:]’ ‘[:lower:]’ ) docker run –pull=always -it -v “$HOME/.ssh/:/tmp/.ssh/” registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
In the installer container, run the following command to check the state before working:	В контейнере с инсталлятором выполните следующую команду и укажите `1` в параметре `masterNodeGroup.replicas`:
bash dhctl terraform check –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host	bash dhctl config edit provider-cluster-configuration –ssh-agent-private-keys=/tmp/.ssh/ \ --ssh-user= --ssh-host
The command output should indicate that Terraform found no inconsistencies and no changes are required.	Для Yandex Cloud при использовании внешних адресов на master-узлах количество элементов массива в параметре masterNodeGroup.instanceClass.externalIPAddresses должно равняться количеству master-узлов. При использовании значения `Auto` (автоматический заказ публичных IP-адресов) количество элементов в массиве все равно должно соответствовать количеству master-узлов. Например, при одном master-узле (`masterNodeGroup.replicas: 1`) и автоматическом заказе адресов параметр `masterNodeGroup.instanceClass.externalIPAddresses` будет выглядеть следующим образом: yaml externalIPAddresses: “Auto”
Run the following command in the installer container and set `masterNodeGroup.replicas` to `1`:	Снимите следующие лейблы с удаляемых master-узлов: `node-role.kubernetes.io/control-plane` `node-role.kubernetes.io/master` `node.deckhouse.io/group`
bash dhctl config edit provider-cluster-configuration –ssh-agent-private-keys=/tmp/.ssh/ \ --ssh-user= --ssh-host	Команда для снятия лейблов:
For Yandex Cloud, when using external addresses on master nodes, the number of array elements in the masterNodeGroup.instanceClass.externalIPAddresses parameter must equal the number of master nodes. If `Auto` is used (public IP addresses are provisioned automatically), the number of array elements must still equal the number of master nodes. To illustrate, with three master nodes (`masterNodeGroup.replicas: 1`) and automatic address reservation, the `masterNodeGroup.instanceClass.externalIPAddresses` parameter would look as follows: yaml externalIPAddresses: “Auto”	bash kubectl label node node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-
Remove the following labels from the master nodes to be deleted: `node-role.kubernetes.io/control-plane` `node-role.kubernetes.io/master` `node.deckhouse.io/group`	Убедитесь, что удаляемые master-узлы пропали из списка узлов кластера etcd:
Use the following command to remove labels:	bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table
bash kubectl label node node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-	Выполните `drain` для удаляемых узлов:
Make sure that the master nodes to be deleted are no longer listed as etcd cluster members:	bash kubectl drain --ignore-daemonsets --delete-emptydir-data
bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table	Выключите виртуальные машины, соответствующие удаляемым узлам, удалите инстансы соответствующих узлов из облака и подключенные к ним диски (`kubernetes-data-master-<N>`).
`drain` the nodes being deleted:	Удалите в кластере поды, оставшиеся на удаленных узлах:
bash kubectl drain --ignore-daemonsets --delete-emptydir-data	bash kubectl delete pods –all-namespaces –field-selector spec.nodeName= --force
Shut down the virtual machines corresponding to the nodes to be deleted, remove the instances of those nodes from the cloud and the disks connected to them (`kubernetes-data-master-<N>`).	Удалите в кластере объекты `Node` удаленных узлов:
In the cluster, delete the Pods running on the nodes being deleted:	bash kubectl delete node
bash kubectl delete pods –all-namespaces –field-selector spec.nodeName= --force	В контейнере с инсталлятором выполните следующую команду для запуска масштабирования:
In the cluster, delete the Node objects associated with the nodes being deleted:	bash dhctl converge –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host
bash kubectl delete node	Как убрать роль master-узла, сохранив узел?
In the installer container, run the following command to start scaling:	Сделайте резервную копию etcd и папки `/etc/kubernetes`. Скопируйте полученный архив за пределы кластера (например, на локальную машину). Убедитесь, что в кластере нет алертов, которые могут помешать обновлению master-узлов. Убедитесь, что очередь Deckhouse пуста. Снимите следующие лейблы: `node-role.kubernetes.io/control-plane` `node-role.kubernetes.io/master` `node.deckhouse.io/group`
bash dhctl converge –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= --ssh-host	Команда для снятия лейблов:
How do I dismiss the master role while keeping the node?	bash kubectl label node node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-
Make a backup of `etcd` and the `/etc/kubernetes` directory. Transfer the archive to a server outside the cluster (e.g., on a local machine). Ensure there are no alerts in the cluster that can prevent the update of the master nodes. Make sure that Deckhouse queue is empty. Remove the following labels: `node-role.kubernetes.io/control-plane` `node-role.kubernetes.io/master` `node.deckhouse.io/group`	Убедитесь, что удаляемый master-узел пропал из списка узлов кластера:
Use the following command to remove labels:	bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table
bash kubectl label node node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-	Зайдите на узел и выполните следующие команды:
Make sure that the master node to be deleted is no longer listed as a member of the etcd cluster:	shell rm -f /etc/kubernetes/manifests/{etcd,kube-apiserver,kube-scheduler,kube-controller-manager}.yaml rm -f /etc/kubernetes/{scheduler,controller-manager}.conf rm -f /etc/kubernetes/authorization-webhook-config.yaml rm -f /etc/kubernetes/admin.conf /root/.kube/config rm -rf /etc/kubernetes/deckhouse rm -rf /etc/kubernetes/pki/{ca.key,apiserver,etcd/,front-proxy,sa.*} rm -rf /var/lib/etcd/member/
bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table
Exec to the node and run the following commands:	Как изменить образ ОС в мультимастерном кластере?
shell rm -f /etc/kubernetes/manifests/{etcd,kube-apiserver,kube-scheduler,kube-controller-manager}.yaml rm -f /etc/kubernetes/{scheduler,controller-manager}.conf rm -f /etc/kubernetes/authorization-webhook-config.yaml rm -f /etc/kubernetes/admin.conf /root/.kube/config rm -rf /etc/kubernetes/deckhouse rm -rf /etc/kubernetes/pki/{ca.key,apiserver,etcd/,front-proxy,sa.*} rm -rf /var/lib/etcd/member/	Сделайте резервную копию `etcd` и папки `/etc/kubernetes`. Скопируйте полученный архив за пределы кластера (например, на локальную машину). Убедитесь, что в кластере нет алертов, которые могут помешать обновлению master-узлов. Убедитесь, что очередь Deckhouse пуста. На локальной машине запустите контейнер установщика Deckhouse соответствующей редакции и версии (измените адрес container registry при необходимости):
How do I switch to a different OS image in a multi-master cluster?	bash DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/version}’) DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/edition}’ \| tr ‘[:upper:]’ ‘[:lower:]’ ) docker run –pull=always -it -v “$HOME/.ssh/:/tmp/.ssh/” registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash
Make a backup of `etcd` and the `/etc/kubernetes` directory. Transfer the archive to a server outside the cluster (e.g., on a local machine). Ensure there are no alerts in the cluster that can prevent the update of the master nodes. Make sure that Deckhouse queue is empty. Run the appropriate edition and version of the Deckhouse installer container on the local machine (change the container registry address if necessary):	В контейнере с инсталлятором выполните следующую команду, чтобы проверить состояние перед началом работы:
bash DH_VERSION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/version}’) DH_EDITION=$(kubectl -n d8-system get deployment deckhouse -o jsonpath=’{.metadata.annotations.core.deckhouse.io\/edition}’ \| tr ‘[:upper:]’ ‘[:lower:]’ ) docker run –pull=always -it -v “$HOME/.ssh/:/tmp/.ssh/” registry.deckhouse.io/deckhouse/${DH_EDITION}/install:${DH_VERSION} bash	bash dhctl terraform check –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host --ssh-host --ssh-host
In the installer container, run the following command to check the state before working:	Ответ должен сообщить, что Terraform не нашел расхождений и изменений не требуется.
bash dhctl terraform check –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host --ssh-host --ssh-host	В контейнере с инсталлятором выполните следующую команду и укажите необходимый образ ОС в параметре `masterNodeGroup.instanceClass` (укажите адреса всех master-узлов в параметре `--ssh-host`):
The command output should indicate that Terraform found no inconsistencies and no changes are required.	bash dhctl config edit provider-cluster-configuration –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host --ssh-host --ssh-host
In the installer container, run the following command and specify the required OS image using the `masterNodeGroup.instanceClass` parameter (specify the addresses of all master nodes using the `-ssh-host` parameter):	В контейнере с инсталлятором выполните следующую команду, чтобы провести обновление узлов:
bash dhctl config edit provider-cluster-configuration –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host --ssh-host --ssh-host	Внимательно изучите действия, которые планирует выполнить converge, когда запрашивает подтверждение.
Select the master node to update (enter its name):	При выполнении команды узлы будут замены на новые с подтверждением на каждом узле. Замена будет выполняться по очереди в обратном порядке (2,1,0).
bash NODE=”"	bash dhctl converge –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host --ssh-host --ssh-host
Run the following command to remove the `node-role.kubernetes.io/control-plane`, `node-role.kubernetes.io/master`, and `node.deckhouse.io/group` labels from the node:	Следующие действия (П. 9-12) выполняйте поочередно на каждом master-узле, начиная с узла с наивысшим номером (с суффиксом 2) и заканчивая узлом с наименьшим номером (с суффиксом 0).
bash kubectl label node ${NODE} node-role.kubernetes.io/control-plane- node-role.kubernetes.io/master- node.deckhouse.io/group-	На созданном узле откройте журнал systemd-юнита `bashible.service`. Дождитесь окончания настройки узла — в журнале должно появиться сообщение `nothing to do`:
Make sure that the node is no longer listed as an etcd cluster member:	bash journalctl -fu bashible.service
bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table	Проверьте, что узел etcd отобразился в списке узлов кластера:
In the installer container, run the following command to perform nodes upgrade:	bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table
You should read carefully what converge is going to do when it asks for approval.	Убедитесь, что `control-plane-manager` функционирует на узле.
When the command is executed, the nodes will be replaced by new nodes with confirmation on each node. The replacement will be performed one by one in reverse order (2,1,0).	bash kubectl -n kube-system wait pod –timeout=10m –for=condition=ContainersReady -l app=d8-control-plane-manager –field-selector spec.nodeName=
bash dhctl converge –ssh-agent-private-keys=/tmp/.ssh/ --ssh-user= \ --ssh-host --ssh-host --ssh-host	Перейдите к обновлению следующего узла.
Repeat the steps below (Sec. 9-12) for each master node one by one, starting with the node with the highest number (suffix 2) and ending with the node with the lowest number (suffix 0).
On the newly created node, check the systemd-unit log for the `bashible.service`. Wait until the node configuration is complete (you will see a message `nothing to do` in the log):	Как изменить образ ОС в кластере с одним master-узлом?
bash journalctl -fu bashible.service	Преобразуйте кластер с одним master-узлом в мультимастерный в соответствии с инструкцией. Обновите master-узлы в соответствии с инструкцией. Преобразуйте мультимастерный кластер в кластер с одним master-узлом в соответствии с инструкцией
Make sure the node is listed as an etcd cluster member:
bash kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table	Как посмотреть список узлов кластера в etcd?
Make sure `control-plane-manager` is running on the node:	Вариант 1
bash kubectl -n kube-system wait pod –timeout=10m –for=condition=ContainersReady -l app=d8-control-plane-manager –field-selector spec.nodeName=${NODE}	Используйте команду `etcdctl member list`.
Proceed to update the next node (repeat the steps above).	Пример:
How do I switch to a different OS image in a single-master cluster?	shell kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table
Convert your single-master cluster to a multi-master one, as described in the guide on adding master nodes to a cluster. Update the master nodes following the instructions. Convert your multi-master cluster to a single-master one according to the guide on excluding master nodes from the cluster.	Внимание. Последний параметр в таблице вывода показывает, что узел находится в состоянии `learner`, а не в состоянии `leader`.
How do I view the list of etcd members?	Вариант 2
Option 1	Используйте команду `etcdctl endpoint status`. Для этой команды, после флага `--endpoints` нужно подставить адрес каждого узла control-plane. В пятом столбце таблицы вывода будет указано значение `true` для лидера.
Use the `etcdctl member list` command.	Пример скрипта, который автоматически передает все адреса узлов control-plane:
Example:	shell MASTER_NODE_IPS=($(kubectl get nodes -l node-role.kubernetes.io/control-plane=”” -o ‘custom-columns=IP:.status.addresses[?(@.type==”InternalIP”)].address’ –no-headers)) unset ENDPOINTS_STRING for master_node_ip in ${MASTER_NODE_IPS[@]} do ENDPOINTS_STRING+=”–endpoints https://${master_node_ip}:2379 “ done kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key $(echo -n $ENDPOINTS_STRING) endpoint status -w table
shell kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o json \| jq -r ‘.items[] \| select( .status.conditions[] \| select(.type == “ContainersReady” and .status == “True”)) \| .metadata.name’ \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ member list -w table	Что делать, если что-то пошло не так?
Warning. The last parameter in the output table shows etcd member is in `learner` state, is not in `leader` state.	В процессе работы `control-plane-manager` автоматически создает резервные копии конфигурации и данных, которые могут пригодиться в случае возникновения проблем. Эти резервные копии сохраняются в директории `/etc/kubernetes/deckhouse/backup`. Если в процессе работы возникли ошибки или непредвиденные ситуации, вы можете использовать эти резервные копии для восстановления до предыдущего исправного состояния.
Option 2
Use the `etcdctl endpoint status` command. For this command, every control-plane address must be passed after `--endpoints` flag. The fifth parameter in the output table will be `true` for the leader.	Что делать, если кластер etcd не функционирует?
Example of a script that automatically passes all control-plane nodes to the command:	Если кластер etcd не функционирует и не удается восстановить его из резервной копии, вы можете попытаться восстановить его с нуля, следуя шагам ниже.
shell MASTER_NODE_IPS=($(kubectl get nodes -l node-role.kubernetes.io/control-plane=”” -o ‘custom-columns=IP:.status.addresses[?(@.type==”InternalIP”)].address’ –no-headers)) unset ENDPOINTS_STRING for master_node_ip in ${MASTER_NODE_IPS[@]} do ENDPOINTS_STRING+=”–endpoints https://${master_node_ip}:2379 “ done kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key $(echo -n $ENDPOINTS_STRING) endpoint status -w table	Сначала на всех узлах, которые являются частью вашего кластера etcd, кроме одного, удалите манифест `etcd.yaml`, который находится в директории `/etc/kubernetes/manifests/`. После этого только один узел останется активным, и с него будет происходить восстановление состояния мультимастерного кластера. На оставшемся узле откройте файл манифеста `etcd.yaml` и укажите параметр `--force-new-cluster` в `spec.containers.command`. После успешного восстановления кластера, удалите параметр `--force-new-cluster`.
What if something went wrong?	Эта операция является деструктивной, так как она полностью уничтожает текущие данные и инициализирует кластер с состоянием, которое сохранено на узле. Все pending-записи будут утеряны.
During its operation, `control-plane-manager` automatically creates backups of configurations and data that may be useful in case of issues. These backups are stored in the `/etc/kubernetes/deckhouse/backup` directory. If errors or unforeseen situations occur during operation, you can use these backups to restore to the previous stable state.	Что делать, если etcd постоянно перезапускается с ошибкой?
What if the etcd cluster fails?	Этот способ может понадобиться, если использование параметра `--force-new-cluster` не восстанавливает работу etcd. Это может произойти, если converge master-узлов прошел неудачно, в результате чего новый master-узел был создан на старом диске etcd, изменил свой адрес в локальной сети, а другие master-узлы отсутствуют. Этот метод стоит использовать если контейнер etcd находится в бесконечном цикле перезапуска, а в его логах появляется ошибка: `panic: unexpected removal of unknown remote peer`.
If the etcd cluster is not functioning and it cannot be restored from a backup, you can attempt to rebuild it from scratch by following the steps below.	Установите утилиту etcdutl. С текущего локального снапшота базы etcd (`/var/lib/etcd/member/snap/db`) выполните создание нового снапшота:
First, on all nodes that are part of your etcd cluster, except for one, remove the `etcd.yaml` manifest located in the `/etc/kubernetes/manifests/` directory. This last node will serve as a starting point for the new multi-master cluster. On the last node, edit etcd manifest `/etc/kubernetes/manifests/etcd.yaml` and add the parameter `--force-new-cluster` to `spec.containers.command`. After the new cluster is ready, remove the `--force-new-cluster` parameter.	shell ./etcdutl snapshot restore /var/lib/etcd/member/snap/db –name \ --initial-cluster=HOSTNAME=https://<ADDRESS>:2380 --initial-advertise-peer-urls=https://ADDRESS:2380 \ --skip-hash-check=true --data-dir /var/lib/etcdtest
This operation is unsafe and breaks the guarantees given by the consensus protocol. Note that it brings the cluster to the state that was saved on the node. Any pending entries will be lost.	`<HOSTNAME>` — название master-узла; `<ADDRESS>` — адрес master-узла.
What if etcd restarts with an error?	Выполните следующие команды для использования нового снапшота:
This method may be necessary if the `--force-new-cluster` option doesn’t restore etcd work. Such a scenario can occur during an unsuccessful converge of master nodes, where a new master node was created with an old etcd disk, changed its internal address, and other master nodes are absent. Symptoms indicating the need for this method include: the etcd container being stuck in an endless restart with the log showing the error: `panic: unexpected removal of unknown remote peer`.	shell cp -r /var/lib/etcd /tmp/etcd-backup rm -rf /var/lib/etcd mv /var/lib/etcdtest /var/lib/etcd
Install the etcdutl utility. Create a new etcd database snapshot from the current local snapshot (`/var/lib/etcd/member/snap/db`):	Найдите контейнеры `etcd` и `api-server`:
shell ./etcdutl snapshot restore /var/lib/etcd/member/snap/db –name \ --initial-cluster=HOSTNAME=https://<ADDRESS>:2380 --initial-advertise-peer-urls=https://ADDRESS:2380 \ --skip-hash-check=true --data-dir /var/lib/etcdtest	shell crictl ps -a \| egrep “etcd\|apiserver”
`<HOSTNAME>` — the name of the master node; `<ADDRESS>` — the address of the master node.	Удалите найденные контейнеры `etcd` и `api-server`:
Execute the following commands to use the new snapshot:	shell crictl rm
shell cp -r /var/lib/etcd /tmp/etcd-backup rm -rf /var/lib/etcd mv /var/lib/etcdtest /var/lib/etcd	Перезапустите master-узел.
Locate the `etcd` and `api-server` containers:	Что делать, если объем базы данных etcd достиг лимита, установленного в quota-backend-bytes?
shell crictl ps -a \| egrep “etcd\|apiserver”	Когда объем базы данных etcd достигает лимита, установленного параметром `quota-backend-bytes`, доступ к ней становится “read-only”. Это означает, что база данных etcd перестает принимать новые записи, но при этом остается доступной для чтения данных. Вы можете понять, что столкнулись с подобной ситуацией, выполнив команду:
Remove the `etcd` and `api-server` containers:	shell kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ endpoint status -w table –cluster
shell crictl rm	Если в поле `ERRORS` вы видите подобное сообщение `alarm:NOSPACE`, значит вам нужно предпринять следующие шаги:
Restart the master node.	Найдите строку с `--quota-backend-bytes` в файле манифеста пода etcd, раположенного по пути `/etc/kubernetes/manifests/etcd.yaml` и увеличьте значение, умножив указанный параметр в этой строке на два. Если такой строки нет — добавьте, например: `- --quota-backend-bytes=8589934592`. Эта настройка задает лимит на 8 ГБ.
What to do if the database volume of etcd reaches the limit set in quota-backend-bytes?	Сбросьте активное предупреждение (alarm) о нехватке места в базе данных. Для этого выполните следующую команду:
When the database volume of etcd reaches the limit set by the `quota-backend-bytes` parameter, it switches to “read-only” mode. This means that the etcd database stops accepting new entries but remains available for reading data. You can tell that you are facing a similar situation by executing the command:	shell kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ alarm disarm
shell kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ endpoint status -w table –cluster	Измените параметр maxDbSize в настройках `control-plane-manager` на тот, который был задан в манифесте.
If you see a message like `alarm:NOSPACE` in the `ERRORS` field, you need to take the following steps:	Как настроить дополнительные политики аудита?
Make change to `/etc/kubernetes/manifests/etcd.yaml` — find the line with `--quota-backend-bytes` and edit it. If there is no such line — add, for example: `- --quota-backend-bytes=8589934592` - this sets the limit to 8 GB.	Включите параметр auditPolicyEnabled в настройках модуля:
Disarm the active alarm that occurred due to reaching the limit. To do this, execute the command:	yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true
shell kubectl -n kube-system exec -ti $(kubectl -n kube-system get pod -l component=etcd,tier=control-plane -o name \| head -n1) – etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ alarm disarm	Создайте Secret `kube-system/audit-policy` с YAML-файлом политик, закодированным в Base64:
Change the maxDbSize parameter in the `control-plane-manager` settings to match the value specified in the manifest.	yaml apiVersion: v1 kind: Secret metadata: name: audit-policy namespace: kube-system data: audit-policy.yaml:
How do I configure additional audit policies?	Минимальный рабочий пример `audit-policy.yaml` выглядит так:
Enable the auditPolicyEnabled flag in the module configuration:	yaml apiVersion: audit.k8s.io/v1 kind: Policy rules: level: Metadata omitStages: RequestReceived
yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true	С подробной информацией по настройке содержимого файла `audit-policy.yaml` можно ознакомиться: В официальной документации Kubernetes; В статье на Habr; В коде скрипта-генератора, используемого в GCE.
Create the `kube-system/audit-policy` Secret containing a Base64 encoded YAML file:	Как исключить встроенные политики аудита?
yaml apiVersion: v1 kind: Secret metadata: name: audit-policy namespace: kube-system data: audit-policy.yaml:	Установите параметр apiserver.basicAuditPolicyEnabled модуля в `false`.
The minimum viable example of the `audit-policy.yaml` file looks as follows:	Пример:
yaml apiVersion: audit.k8s.io/v1 kind: Policy rules: level: Metadata omitStages: RequestReceived	yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true basicAuditPolicyEnabled: false
You can find detailed information on how to configure `audit-policy.yaml` file here: The official Kubernetes documentation; The code of the generator script used in GCE.	Как вывести аудит-лог в стандартный вывод вместо файлов?
How to omit Deckhouse built-in policy rules?	Установите параметр apiserver.auditLog.output модуля в значение `Stdout`.
Set the apiserver.basicAuditPolicyEnabled module parameter to `false`.	Пример:
An example:	yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true auditLog: output: Stdout
yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true basicAuditPolicyEnabled: false	Как работать с журналом аудита?
How to stream audit log to stdout instead of files?	Предполагается, что на master-узлах установлен «скрейпер логов»: log-shipper, `promtail`, `filebeat`, который будет мониторить файл с логами:
Set the apiserver.auditLog.output parameter to `Stdout`.	bash /var/log/kube-audit/audit.log
An example:	Параметры ротации логов в файле журнала предустановлены и их изменение не предусмотрено:
yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: apiserver: auditPolicyEnabled: true auditLog: output: Stdout	Максимальное занимаемое место на диске `1000 МБ`. Максимальная глубина записи `7 дней`.
How to deal with the audit log?	В зависимости от настроек политики (`Policy`) и количества запросов к `apiserver` логов может быть очень много, соответственно глубина хранения может быть менее 30 минут.
There must be some `log scraper` on master nodes (log-shipper, promtail, filebeat) that will monitor the log file:	Текущая реализация функционала не гарантирует безопасность, так как существует риск временного нарушения работы control plane.
bash /var/log/kube-audit/audit.log	Если в Secret’е с конфигурационным файлом окажутся неподдерживаемые опции или опечатка, `apiserver` не сможет запуститься.
The following fixed parameters of log rotation are in use:	В случае возникновения проблем с запуском `apiserver`, потребуется вручную отключить параметры `--audit-log-*` в манифесте `/etc/kubernetes/manifests/kube-apiserver.yaml` и перезапустить `apiserver` следующей командой:
The maximum disk space is limited to `1000 Mb`. Logs older than `7 days` will be deleted.	bash docker stop $(docker ps \| grep kube-apiserver- \| awk ‘{print $1}’)
Depending on the `Policy` settings and the number of requests to the `apiserver`, the amount of logs collected may be high. Thus, in some cases, logs can only be kept for less than 30 minutes.	Или (в зависимости используемого вами CRI). crictl stopp $(crictl pods –name=kube-apiserver -q)
The current implementation of this feature isn’t safe and may lead to a temporary failure of the control plane.	После перезапуска будет достаточно времени исправить Secret или удалить его:
The `apiserver` will not be able to start if there are unsupported options or a typo in the Secret.	bash kubectl -n kube-system delete secret audit-policy
If `apiserver` is unable to start, you have to manually disable the `--audit-log-*` parameters in the `/etc/kubernetes/manifests/kube-apiserver.yaml` manifest and restart apiserver using the following command:	Как ускорить перезапуск подов при потере связи с узлом?
bash docker stop $(docker ps \| grep kube-apiserver- \| awk ‘{print $1}’)	По умолчанию, если узел в течении 40 секунд не сообщает свое состояние, он помечается как недоступный. И еще через 5 минут поды узла начнут перезапускаться на других узлах. В итоге общее время недоступности приложений составляет около 6 минут.
Or (depending on your CRI). crictl stopp $(crictl pods –name=kube-apiserver -q)	В специфических случаях, когда приложение не может быть запущено в нескольких экземплярах, есть способ сократить период их недоступности:
After the restart, you will be able to fix the Secret or delete it:	Уменьшить время перехода узла в состояние `Unreachable` при потере с ним связи настройкой параметра `nodeMonitorGracePeriodSeconds`. Установить меньший таймаут удаления подов с недоступного узла в параметре `failedNodePodEvictionTimeoutSeconds`.
bash kubectl -n kube-system delete secret audit-policy	Пример:
How do I speed up the restart of Pods if the connection to the node has been lost?	yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: nodeMonitorGracePeriodSeconds: 10 failedNodePodEvictionTimeoutSeconds: 50
By default, a node is marked as unavailable if it does not report its state for 40 seconds. After another 5 minutes, its Pods will be rescheduled to other nodes. Thus, the overall application unavailability lasts approximately 6 minutes.	В этом случае при потере связи с узлом приложения будут перезапущены примерно через 1 минуту.
In specific cases, if an application cannot run in multiple instances, there is a way to lower its unavailability time:	Оба упомянутых параметра напрямую влияют на использование процессора и памяти control-plane’ом. Снижая таймауты, системные компоненты чаще отправляют статусы и проверяют состояние ресурсов.
Reduce the period required for the node to become `Unreachable` if the connection to it is lost by setting the `nodeMonitorGracePeriodSeconds` parameter. Set a lower timeout for evicting Pods on a failed node using the `failedNodePodEvictionTimeoutSeconds` parameter.	При выборе оптимальных значений учитывайте графики использования ресурсов управляющих узлов. Чем меньше значения параметров, тем больше ресурсов может понадобиться для их обработки на этих узлах.
Example:	Резервное копирование и восстановление etcd
yaml apiVersion: deckhouse.io/v1alpha1 kind: ModuleConfig metadata: name: control-plane-manager spec: version: 1 settings: nodeMonitorGracePeriodSeconds: 10 failedNodePodEvictionTimeoutSeconds: 50	Что выполняется автоматически
In this case, if the connection to the node is lost, the applications will be restarted in about 1 minute.	Автоматически запускаются CronJob `kube-system/d8-etcd-backup-*` в 00:00 по UTC+0. Результат сохраняется в `/var/lib/etcd/etcd-backup.tar.gz` на всех узлах с `control-plane` в кластере (master-узлы).
Both these parameters directly impact the CPU and memory resources consumed by the control plane. By lowering timeouts, we force system components to send statuses more frequently and check the resource state more often.
When deciding on the appropriate threshold values, consider resources consumed by the control nodes (graphs can help you with this). Note that the lower parameters are, the more resources you may need to allocate to these nodes.	Как сделать резервную копию etcd вручную
etcd backup and restore	Используя Deckhouse CLI (Deckhouse Kubernetes Platform v1.65+)
What is done automatically	Начиная с релиза Deckhouse Kubernetes Platform v1.65, стала доступна утилита `d8 backup etcd`, которая предназначена для быстрого создания снимков состояния etcd.
CronJob `kube-system/d8-etcd-backup-*` is automatically started at 00:00 UTC+0. The result is saved in `/var/lib/etcd/etcd-backup.tar.gz` on all nodes with `control-plane` in the cluster (master nodes).	bash d8 backup etcd –kubeconfig $KUBECONFIG ./etcd-backup.snapshot
How to manually backup etcd	Используя bash (Deckhouse Kubernetes Platform v1.64 и старше)
Using Deckhouse CLI (Deckhouse Kubernetes Platform v1.65+)	Войдите на любой control-plane узел под пользователем `root` и используйте следующий bash-скрипт:
Starting with Deckhouse Kubernetes Platform v1.65, a new `d8 backup etcd` tool is available for taking snapshots of etcd state.	bash
bash d8 backup etcd –kubeconfig $KUBECONFIG ./etcd-backup.snapshot	!/usr/bin/env bash set -e
Using bash (Deckhouse Kubernetes Platform v1.64 and older)	pod=etcd-`hostname` kubectl -n kube-system exec “$pod” – /usr/bin/etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ snapshot save /var/lib/etcd/${pod##/}.snapshot && mv /var/lib/etcd/”${pod##/}.snapshot” etcd-backup.snapshot && cp -r /etc/kubernetes/ ./ && tar -cvzf kube-backup.tar.gz ./etcd-backup.snapshot ./kubernetes/ rm -r ./kubernetes ./etcd-backup.snapshot
Login into any control-plane node with `root` user and use next script:	В текущей директории будет создан файл `kube-backup.tar.gz` со снимком базы etcd одного из узлов кластера. Из полученного снимка можно будет восстановить состояние кластера.
bash	Рекомендуем сделать резервную копию директории `/etc/kubernetes`, в которой находятся:
!/usr/bin/env bash set -e	манифесты и конфигурация компонентов control-plane; PKI кластера Kubernetes.
pod=etcd-`hostname` kubectl -n kube-system exec “$pod” – /usr/bin/etcdctl –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ snapshot save /var/lib/etcd/${pod##/}.snapshot && mv /var/lib/etcd/”${pod##/}.snapshot” etcd-backup.snapshot && cp -r /etc/kubernetes/ ./ && tar -cvzf kube-backup.tar.gz ./etcd-backup.snapshot ./kubernetes/ rm -r ./kubernetes ./etcd-backup.snapshot	Данная директория поможет быстро восстановить кластер при полной потере control-plane узлов без создания нового кластера и без повторного присоединения узлов в новый кластер.
In the current directory etcd snapshot file `kube-backup.tar.gz` will be created from one of an etcd cluster members. From this file, you can restore the previous etcd cluster state in the future.	Рекомендуем хранить резервные копии снимков состояния кластера etcd, а также резервную копию директории `/etc/kubernetes/` в зашифрованном виде вне кластера Deckhouse. Для этого вы можете использовать сторонние инструменты резервного копирования файлов, например Restic, Borg, Duplicity и т.д.
Also, we recommend making a backup of the `/etc/kubernetes` directory, which contains:	О возможных вариантах восстановления состояния кластера из снимка etcd вы можете узнать в документации.
manifests and configurations of control-plane components; Kubernetes cluster PKI.	Как выполнить полное восстановление состояния кластера из резервной копии etcd?
This directory will help to quickly restore a cluster in case of complete loss of control-plane nodes without creating a new cluster and without rejoin the remaining nodes into the new cluster.	Далее описаны шаги по восстановлению кластера до предыдущего состояния из резервной копии при полной потере данных.
We recommend encrypting etcd snapshot backups as well as backup of the directory `/etc/kubernetes/` and saving them outside the Deckhouse cluster. You can use one of third-party files backup tools, for example: Restic, Borg, Duplicity, etc.
You can see documentation for learn about etcd disaster recovery procedures from snapshots.	Восстановление кластера с одним master-узлом
How do perform a full recovery of the cluster state from an etcd backup?	Для корректного восстановления выполните следующие шаги на master-узле:
The following steps will be described to restore to the previous state of the cluster from a backup in case of complete data loss.	Найдите утилиту `etcdctl` на master-узле и скопируйте исполняемый файл в `/usr/local/bin/`:
Restoring a single-master cluster	shell cp $(find /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/ -name etcdctl -print \| tail -n 1) /usr/local/bin/etcdctl etcdctl version
Follow these steps to restore a single-master cluster on master node:	Должен отобразиться корректный вывод `etcdctl version` без ошибок.
Find `etcdctl` utility on the master-node and copy the executable to `/usr/local/bin/`:	Также вы можете загрузить исполняемый файл etcdctl на сервер (желательно, чтобы версия `etcdctl` была такая же, как и версия etcd в кластере):
shell cp $(find /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/ -name etcdctl -print \| tail -n 1) /usr/local/bin/etcdctl etcdctl version	shell wget “https://github.com/etcd-io/etcd/releases/download/v3.5.16/etcd-v3.5.16-linux-amd64.tar.gz” tar -xzvf etcd-v3.5.16-linux-amd64.tar.gz && mv etcd-v3.5.16-linux-amd64/etcdctl /usr/local/bin/etcdctl
The result must be a correct output of `etcdctl version` command without errors.	Проверить версию etcd в кластере можно выполнив следующую команду (команда может не сработать, если etcd и Kubernetes API недоступны):
Alternatively, you can download etcdctl executable to the node (preferably its version is the same as the etcd version in the cluster):	shell kubectl -n kube-system exec -ti etcd-$(hostname) – etcdctl version
shell wget “https://github.com/etcd-io/etcd/releases/download/v3.5.16/etcd-v3.5.16-linux-amd64.tar.gz” tar -xzvf etcd-v3.5.16-linux-amd64.tar.gz && mv etcd-v3.5.16-linux-amd64/etcdctl /usr/local/bin/etcdctl	Остановите etcd.
You can check current `etcd` version using following command (might not work, if etcd and Kubernetes API are already unavailable):	shell mv /etc/kubernetes/manifests/etcd.yaml ~/etcd.yaml
shell kubectl -n kube-system exec -ti etcd-$(hostname) – etcdctl version	Сохраните текущие данные etcd.
Stop the etcd.	shell cp -r /var/lib/etcd/member/ /var/lib/deckhouse-etcd-backup
shell mv /etc/kubernetes/manifests/etcd.yaml ~/etcd.yaml	Очистите директорию etcd.
Save the current etcd data.	shell rm -rf /var/lib/etcd/member/
shell cp -r /var/lib/etcd/member/ /var/lib/deckhouse-etcd-backup	Положите резервную копию etcd в файл `~/etcd-backup.snapshot`.
Clean the etcd directory.	Восстановите базу данных etcd.
shell rm -rf /var/lib/etcd/member/	shell ETCDCTL_API=3 etcdctl snapshot restore ~/etcd-backup.snapshot –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ –data-dir=/var/lib/etcd
Put the etcd backup to `~/etcd-backup.snapshot` file.	Запустите etcd. Запуск может занять некоторое время.
Restore the etcd database.	shell mv ~/etcd.yaml /etc/kubernetes/manifests/etcd.yaml crictl ps –label io.kubernetes.pod.name=etcd-$HOSTNAME
shell ETCDCTL_API=3 etcdctl snapshot restore ~/etcd-backup.snapshot –cacert /etc/kubernetes/pki/etcd/ca.crt –cert /etc/kubernetes/pki/etcd/ca.crt –key /etc/kubernetes/pki/etcd/ca.key –endpoints https://127.0.0.1:2379/ –data-dir=/var/lib/etcd
Run etcd. The process may take some time.	Восстановление мультимастерного кластера
shell mv ~/etcd.yaml /etc/kubernetes/manifests/etcd.yaml crictl ps –label io.kubernetes.pod.name=etcd-$HOSTNAME	Для корректного восстановления выполните следующие шаги:
Restoring a multi-master cluster	Включите режим High Availability (HA) с помощью глобального параметра highAvailability. Это необходимо для сохранения хотя бы одной реплики Prometheus и его PVC, поскольку в режиме кластера с одним master-узлом HA по умолчанию отключён.
Follow these steps to restore a multi-master cluster:	Переведите кластер в режим с одним master-узлом в соответствии с инструкцией для облачных кластеров, или самостоятельно выведите статические master-узлы из кластера.
Explicitly set the High Availability (HA) mode by specifying the highAvailability parameter. This is necessary, for example, in order not to lose one Prometheus replica and its PVC, since HA is disabled by default in single-master mode.	На оставшемся единственном master-узле выполните шаги по восстановлению etcd из резервной копии в соответствии с инструкцией для кластера с одним master-узлом.
Switch the cluster to single-master mode according to instruction for cloud clusters or independently remove static master-node from the cluster.	Когда работа etcd будет восстановлена, удалите из кластера информацию об уже удаленных в первом пункте master-узлах, воспользовавшись следующей командой (укажите название узла):
On a single master-node, perform the steps to restore etcd from backup in accordance with the instructions for a single-master cluster.	shell kubectl delete node
When etcd operation is restored, delete the information about the master nodes already deleted in step 1 from the cluster:	Перезапустите все узлы кластера.
shell kubectl delete node MASTER_NODE_I	Дождитесь выполнения заданий из очереди Deckhouse:
Restart all nodes of the cluster.	shell kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse – deckhouse-controller queue main
Wait for the deckhouse queue to complete:	Переведите кластер обратно в режим мультимастерного в соответствии с инструкцией для облачных кластеров или инструкцией для статических или гибридных кластеров.
shell kubectl -n d8-system exec svc/deckhouse-leader -c deckhouse – deckhouse-controller queue main	Как восстановить объект Kubernetes из резервной копии etcd?
Switch the cluster back to multi-master mode according to instructions for cloud clusters or instructions for static or hybrid clusters.	Чтобы получить данные определенных объектов кластера из резервной копии etcd:
How do I restore a Kubernetes object from an etcd backup?	Запустите временный экземпляр etcd. Заполните его данными из резервной копии. Получите описания нужных объектов с помощью `auger`.
To get cluster objects data from an etcd backup, you need:	Пример шагов по восстановлению объектов из резервной копии etcd
Start an temporary instance of etcd. Fill it with data from the backup. Get desired objects using `auger`.	В следующем примере `etcd-backup.snapshot` — резервная копия etcd (snapshot), `infra-production` — пространство имен, в котором нужно восстановить объекты.
Example of steps to restore objects from an etcd backup	Для выгрузки бинарных данных из etcd потребуется утилита auger. Ее можно собрать из исходного кода на любой машине с Docker (на узлах кластера это сделать невозможно).
In the example below, `etcd-backup.snapshot` is a etcd shapshot, `infra-production` is the namespace in which objects need to be restored.	shell git clone -b v1.0.1 –depth 1 https://github.com/etcd-io/auger cd auger make release build/auger -h Получившийся исполняемый файл `build/auger`, а также `snapshot` из резервной копии etcd нужно загрузить на master-узел, с которого будет выполняться дальнейшие действия.
To decode objects from `etcd` you would need auger. It can be built from source on any machine that has Docker installed (it cannot be done on cluster nodes).	Данные действия выполняются на master-узле в кластере, на который предварительно был загружен файл `snapshot` и утилита `auger`:
shell git clone -b v1.0.1 –depth 1 https://github.com/etcd-io/auger cd auger make release build/auger -h Resulting executable `build/auger`, and also the `snapshot` from the backup copy of etcd must be uploaded on master-node, on which following actions would be performed.	Установите полный путь до `snapshot` и до утилиты в переменных окружения:
Following actions are performed on a master node, to which `etcd snapshot` file and `auger` tool were copied:	shell SNAPSHOT=/root/etcd-restore/etcd-backup.snapshot AUGER_BIN=/root/auger chmod +x $AUGER_BIN
Set full path for snapshot file and for the tool into environmental variables:	Запустите под с временным экземпляром etcd:
shell SNAPSHOT=/root/etcd-restore/etcd-backup.snapshot AUGER_BIN=/root/auger chmod +x $AUGER_BIN	Создайте манифест пода. Он будет запускаться именно на текущем master-узле, выбрав его по переменной `$HOSTNAME`, и смонтирует `snapshot` по пути `$SNAPSHOT` для загрузки во временный экземпляр etcd:
Run a Pod with temporary instance of `etcd`. Create Pod manifest. It should schedule on current master node by `$HOSTNAME` variable, and mounts snapshot file by `$SNAPSHOT` variable, which it then restores in temporary `etcd` instance:	shell cat «EOF >etcd.pod.yaml apiVersion: v1 kind: Pod metadata: name: etcdrestore namespace: default spec: nodeName: $HOSTNAME tolerations: operator: Exists initContainers: command: etcdctl snapshot restore “/tmp/etcd-snapshot” image: $(kubectl -n kube-system get pod -l component=etcd -o jsonpath=”{.items[].spec.containers[].image}” \| cut -f 1 -d ‘ ‘) imagePullPolicy: IfNotPresent name: etcd-snapshot-restore volumeMounts: name: etcddir mountPath: /default.etcd name: etcd-snapshot mountPath: /tmp/etcd-snapshot readOnly: true containers: command: etcd image: $(kubectl -n kube-system get pod -l component=etcd -o jsonpath=”{.items[].spec.containers[].image}” \| cut -f 1 -d ‘ ‘) imagePullPolicy: IfNotPresent name: etcd-temp volumeMounts: name: etcddir mountPath: /default.etcd volumes: name: etcddir emptyDir: {} name: etcd-snapshot hostPath: path: $SNAPSHOT type: File EOF
shell cat «EOF >etcd.pod.yaml apiVersion: v1 kind: Pod metadata: name: etcdrestore namespace: default spec: nodeName: $HOSTNAME tolerations: operator: Exists initContainers: command: etcdctl snapshot restore “/tmp/etcd-snapshot” image: $(kubectl -n kube-system get pod -l component=etcd -o jsonpath=”{.items[].spec.containers[].image}” \| cut -f 1 -d ‘ ‘) imagePullPolicy: IfNotPresent name: etcd-snapshot-restore volumeMounts: name: etcddir mountPath: /default.etcd name: etcd-snapshot mountPath: /tmp/etcd-snapshot readOnly: true containers: command: etcd image: $(kubectl -n kube-system get pod -l component=etcd -o jsonpath=”{.items[].spec.containers[].image}” \| cut -f 1 -d ‘ ‘) imagePullPolicy: IfNotPresent name: etcd-temp volumeMounts: name: etcddir mountPath: /default.etcd volumes: name: etcddir emptyDir: {} name: etcd-snapshot hostPath: path: $SNAPSHOT type: File EOF	Запустите под:
Create Pod from the resulting manifest:	shell kubectl create -f etcd.pod.yaml
shell kubectl create -f etcd.pod.yaml	Установите нужные переменные. В текущем примере:
Set environment variables. In this example:	`infra-production` - пространство имен, в котором мы будем искать ресурсы.
`infra-production` - namespace which we will search resources in.	`/root/etcd-restore/output` - каталог для восстановленных манифестов.
`/root/etcd-restore/output` - path for outputting recovered resource manifests.	`/root/auger` - путь до исполняемого файла утилиты `auger`:
`/root/auger` - path to `auger` executable.	shell FILTER=infra-production BACKUP_OUTPUT_DIR=/root/etcd-restore/output mkdir -p $BACKUP_OUTPUT_DIR && cd $BACKUP_OUTPUT_DIR
shell FILTER=infra-production BACKUP_OUTPUT_DIR=/root/etcd-restore/output mkdir -p $BACKUP_OUTPUT_DIR && cd $BACKUP_OUTPUT_DIR	Следующие команды отфильтруют список нужных ресурсов по переменной `$FILTER` и выгрузят их в каталог `$BACKUP_OUTPUT_DIR`:
Commands below will filter needed resources by `$FILTER` and output them into `$BACKUP_OUTPUT_DIR` directory:	shell files=($(kubectl -n default exec etcdrestore -c etcd-temp – etcdctl –endpoints=localhost:2379 get / –prefix –keys-only \| grep “$FILTER”)) for file in “${files[@]}” do OBJECT=$(kubectl -n default exec etcdrestore -c etcd-temp – etcdctl –endpoints=localhost:2379 get “$file” –print-value-only \| $AUGER_BIN decode) FILENAME=$(echo $file \| sed -e “s#/registry/##g;s#/#_#g”) echo “$OBJECT” > “$BACKUP_OUTPUT_DIR/$FILENAME.yaml” echo $BACKUP_OUTPUT_DIR/$FILENAME.yaml done
shell files=($(kubectl -n default exec etcdrestore -c etcd-temp – etcdctl –endpoints=localhost:2379 get / –prefix –keys-only \| grep “$FILTER”)) for file in “${files[@]}” do OBJECT=$(kubectl -n default exec etcdrestore -c etcd-temp – etcdctl –endpoints=localhost:2379 get “$file” –print-value-only \| $AUGER_BIN decode) FILENAME=$(echo $file \| sed -e “s#/registry/##g;s#/#_#g”) echo “$OBJECT” > “$BACKUP_OUTPUT_DIR/$FILENAME.yaml” echo $BACKUP_OUTPUT_DIR/$FILENAME.yaml done	Удалите из полученных описаний объектов информацию о времени создания (`creationTimestamp`), `UID`, `status` и прочие оперативные данные, после чего восстановите объекты:
From resulting `yaml` files, delete `creationTimestamp`, `UID`, `status` and other operational fields, and then restore the objects:	bash kubectl create -f deployments_infra-production_supercronic.yaml
bash kubectl create -f deployments_infra-production_supercronic.yaml	Удалите под с временным экземпляром etcd:
Delete the Pod with a temporary instance of etcd:	bash kubectl -n default delete pod etcdrestore
bash kubectl -n default delete pod etcdrestore	Как выбирается узел, на котором будет запущен под?
How the node to run the Pod on is selected	За распределение подов по узлам отвечает планировщик Kubernetes (компонент `scheduler`). Он проходит через две основные фазы — `Filtering` и `Scoring` (на самом деле, фаз больше, например, `pre-filtering` и `post-filtering`, но в общем можно выделить две ключевые фазы).
The Kubernetes scheduler component selects the node to run the Pod on.	Общее устройство планировщика Kubernetes
The selection process involves two phases, namely `Filtering` and `Scoring`. They are supposed to efficiently distribute the Pods between the nodes.	Планировщик состоит из плагинов, которые работают в рамках какой-либо фазы (фаз).
Although there are some additional phases, such as `pre-filtering`, `post-filtering`, and so on, you can safely narrow them down to the global phases mentioned above, as they merely increase flexibility and help to optimize things.	Примеры плагинов:
The structure of the Kubernetes scheduler	ImageLocality — отдает предпочтение узлам, на которых уже есть образы контейнеров, которые используются в запускаемом поде. Фаза: `Scoring`. TaintToleration — реализует механизм taints and tolerations. Фазы: `Filtering`, `Scoring`. NodePorts — проверяет, есть ли у узла свободные порты, необходимые для запуска пода. Фаза: `Filtering`.
The Scheduler comprises plugins that function in either or both phases.	С полным списком плагинов можно ознакомиться в документации Kubernetes.
Example of plugins:	Логика работы
ImageLocality — favors nodes that already have the container images that the Pod runs. Phase: `Scoring`. TaintToleration — implements taints and tolerations. Phases: `Filtering`, `Scoring`. NodePorts - checks whether the ports required for the Pod to run are available on the node. Phase: `Filtering`.	Профили планировщика
The full list of plugins is available in the Kubernetes documentation.	Есть два преднастроенных профиля планировщика:
Working logic	`default-scheduler` — профиль по умолчанию, который распределяет поды на узлы с наименьшей загрузкой; `high-node-utilization` — профиль, при котором поды размещаются на узлах с наибольшей загрузкой.
Scheduler profiles	Чтобы задать профиль планировщика, укажите его параметре `spec.schedulerName` манифеста пода.
There are two predefined scheduler profiles:	Пример использования профиля:
`default-scheduler`: The default profile that distributes pods to nodes with the lowest load; `high-node-utilization`: A profile that places pods on nodes with the highest load.	yaml apiVersion: v1 kind: Pod metadata: name: scheduler-example labels: name: scheduler-example spec: schedulerName: high-node-utilization containers: name: example-pod image: registry.k8s.io/pause:2.0
To specify a scheduler profile, use the `spec.schedulerName` parameter in the pod manifest.	Этапы планирования подов
Example of using a profile:	На первой фазе — `Filtering` — активируются плагины фильтрации (filter-плагины), которые из всех доступных узлов выбирают те, которые удовлетворяют определенным условиям фильтрации (например, `taints`, `nodePorts`, `nodeName`, `unschedulable` и другие). Если узлы расположены в разных зонах, планировщик чередует выбор зон, чтобы избежать размещения всех подов в одной зоне.
yaml apiVersion: v1 kind: Pod metadata: name: scheduler-example labels: name: scheduler-example spec: schedulerName: high-node-utilization containers: name: example-pod image: registry.k8s.io/pause:2.0	Предположим, что узлы распределяются по зонам следующим образом:
Pod scheduling stages	text Zone 1: Node 1, Node 2, Node 3, Node 4 Zone 2: Node 5, Node 6
The selection process starts with the `Filtering` phase. During it, `filter` plugins select nodes that satisfy filter conditions such as `taints`, `nodePorts`, `nodeName`, `unschedulable`, etc. If the nodes are in different zones, the scheduler alternates zones when selecting to ensure that all Pods will not end up in the same zone.	В этом случае они будут выбираться в следующем порядке:
Suppose there are two zones with the following nodes:	text Node 1, Node 5, Node 2, Node 6, Node 3, Node 4
text Zone 1: Node 1, Node 2, Node 3, Node 4 Zone 2: Node 5, Node 6	Обратите внимание, что с целью оптимизации выбираются не все попадающие под условия узлы, а только их часть. По умолчанию функция выбора количества узлов линейная. Для кластера из ≤50 узлов будут выбраны 100% узлов, для кластера из 100 узлов — 50%, а для кластера из 5000 узлов — 10%. Минимальное значение — 5% при количестве узлов более 5000. Таким образом, при настройках по умолчанию узел может не попасть в список возможных узлов для запуска.
In this case, the nodes will be selected in the following order:	Эту логику можно изменить (см. подробнее про параметр `percentageOfNodesToScore` в документации Kubernetes), но Deckhouse не дает такой возможности.
text Node 1, Node 5, Node 2, Node 6, Node 3, Node 4.	После того как были выбраны узлы, соответствующие условиям фильтрации, запускается фаза `Scoring`. Каждый плагин анализирует список отфильтрованных узлов и назначает оценку (score) каждому узлу. Оценки от разных плагинов суммируются. На этой фазе оцениваются доступные ресурсы на узлах: `pod capacity`, `affinity`, `volume provisioning` и другие. По итогам этой фазы выбирается узел с наибольшей оценкой. Если сразу несколько узлов получили максимальную оценку, узел выбирается случайным образом.
Note that Kubernetes limits the number of nodes to calculate their scores during scheduling. This optimizes the selection process and prevents unnecessary scoring. By default, the threshold is linear. For clusters with less than or equal to 50 nodes, 100% of nodes are considered for scheduling; for clusters with 100 nodes, a 50%-threshold is used; and for clusters with 5000 nodes, a 10%-threshold is used. The minimum threshold value is 5% for clusters with more than 5000 nodes. Therefore, even if all the conditions are met, a node may not be included in the list of candidates for scheduling if the default settings are used.	В итоге под запускается на выбранном узле.
This logic can be changed (read more about the parameter `percentage Of Nodes To Score` in the Kubernetes documentation), but Deckhouse does not provide such an option.	Документация
The `Scoring` phase follows once the nodes that meet the conditions are selected. Each plugin evaluates the filtered node list and assigns a score to each node based on available resources: `pod capacity`, `affinity`, `volume provisioning`, and other factors. The scores from the different plugins are then summed up and the node with the highest score is selected. If several nodes have the same score, the node is selected at random.	Общее описание scheduler. Система плагинов. Подробности фильтрации узлов. Исходный код scheduler.
Finally, the scheduler assigns the Pod to the node with the highest ranking.
Documentation	Как изменить или расширить логику работы планировщика
General description of the scheduler. Plugin system. Node Filtering Details. Scheduler source code.	Для изменения логики работы планировщика можно использовать механизм плагинов расширения.
How to change or extend the scheduler logic	Каждый плагин представляет собой вебхук, отвечающий следующим требованиям:
To change the logic of the scheduler it is possible to use the extension mechanism Extenders.	Использование TLS. Доступность через сервис внутри кластера. Поддержка стандартных `Verbs` (`filterVerb = filter`, `prioritizeVerb = prioritize`). Также, предполагается что все подключаемые плагины могут кэшировать информацию об узле (`nodeCacheCapable: true`).
Each plugin is a webhook that must satisfy the following requirements:	Подключить `extender` можно при помощи ресурса KubeSchedulerWebhookConfiguration.
Use of TLS. Accessibility through a service within the cluster. Support for standard `Verbs` (`filterVerb = filter`, `prioritizeVerb = prioritize`). It is also assumed that all plugins can cache node information (`nodeCacheCapable: true`).	При использовании опции `failurePolicy: Fail`, в случае ошибки в работе вебхука планировщик Kubernetes прекратит свою работу, и новые поды не смогут быть запущены.
You can connect an `extender` using KubeSchedulerWebhookConfiguration resource.	Как происходит ротация сертификатов kubelet?
When using the `failurePolicy: Fail` option, in case of an error in the webhook’s operation, the scheduler will stop working and new pods will not be able to start.	С настройкой и включением ротации сертификатов kubelet вы можете ознакомиться в официальной документации Kubernetes.
How does kubelet certificate rotation work?	В файле `/var/lib/kubelet/config.yaml` хранится конфигурация kubelet и указывается путь к сертификату (`tlsCertFile`) и закрытому ключу (`tlsPrivateKeyFile`).
You can read about configuring and enabling kubelet certificate rotation in the official Kubernetes documentation.	В kubelet реализована следующая логика работы с серверными сертификатами:
The `/var/lib/kubelet/config.yaml` file contains the kubelet configuration and specifies the path to the certificate (`tlsCertFile`) and private key (`tlsPrivateKeyFile`).	Если `tlsCertFile` и `tlsPrivateKeyFile` не пустые, то kubelet будет использовать их как сертификат и ключ по умолчанию. При запросе клиента в kubelet API с указанием IP-адреса (например https://10.1.1.2:10250/), для установления соединения по TLS-протоколу будет использован закрытый ключ по умолчанию (`tlsPrivateKeyFile`). В данном случае ротация сертификатов не будет работать. При запросе клиента в kubelet API с указанием названия хоста (например https://k8s-node:10250/), для установления соединения по TLS-протоколу будет использован динамически сгенерированный закрытый ключ из директории `/var/lib/kubelet/pki/`. В данном случае ротация сертификатов будет работать. Если `tlsCertFile` и `tlsPrivateKeyFile` пустые, то для установления соединения по TLS-протоколу будет использован динамически сгенерированный закрытый ключ из директории `/var/lib/kubelet/pki/`. В данном случае ротация сертификатов будет работать.
Kubelet handles server certificates using the following logic: If `tlsCertFile` and `tlsPrivateKeyFile` are not empty, kubelet will use them as the default certificate and key. When a client requests the kubelet API by specifying an IP address (e.g. https://10.1.1.2:10250/), the default private key (`tlsPrivateKeyFile`) will be used to establish a TLS connection. In this case, certificate rotation will not work. When a client requests the kubelet API by specifying a host name (e.g. https://k8s-node:10250/), a dynamically generated private key from the `/var/lib/kubelet/pki/` directory will be used to establish a TLS connection. In this case, certificate rotation will work.	Поскольку в Deckhouse Kubernetes Platform для запросов в kubelet API используются IP-адреса, то в конфигурации kubelet поля `tlsCertFile` и `tlsPrivateKeyFile` не используются, а используется динамический сертификат, который kubelet генерирует самостоятельно. Также в модуле `operator-trivy` отключены проверки CIS benchmark `AVD-KCV-0088` и `AVD-KCV-0089`, которые отслеживают, были ли переданы аргументы `--tls-cert-file` и `--tls-private-key-file` для kubelet.
If `tlsCertFile` and `tlsPrivateKeyFile` are empty, a dynamically generated private key from the `/var/lib/kubelet/pki/` directory will be used to establish the TLS connection. In this case, certificate rotation will work.	Kubelet использует клиентский TLS сертификат(`/var/lib/kubelet/pki/kubelet-client-current.pem`), при помощи которого может запросить у kube-apiserver новый клиентский сертификат или новый серверный сертификат(`/var/lib/kubelet/pki/kubelet-server-current.pem`).
Since Deckhouse Kubernetes Platform uses IP addresses to make requests to the kubelet API, the kubelet configuration does not use the `tlsCertFile` and `tlsPrivateKeyFile` fields, but uses a dynamic certificate that kubelet generates itself. Also, the CIS benchmark `AVD-KCV-0088` and `AVD-KCV-0089` checks, which track whether the `--tls-cert-file` and `--tls-private-key-file` arguments were passed to kubelet, are disabled in the `operator-trivy` module.	Когда до истечения времени жизни сертификата остается 5-10% (случайное значение из диапазона) времени, kubelet запрашивает у kube-apiserver новый сертификат. С описанием алгоритма ознакомьтесь в официальной документации Kubernetes.
The kubelet uses a client TLS certificate (`/var/lib/kubelet/pki/kubelet-client-current.pem`) with which it can request a new client certificate or a new server certificate (`/var/lib/kubelet/pki/kubelet-server-current.pem`) from kube-apiserver.	Чтобы kubelet успел установить сертификат до его истечения, рекомендуем устанавливать время жизни сертификатов более, чем 1 час. Время устанавливается с помощью аргумента `--cluster-signing-duration` в манифесте `/etc/kubernetes/manifests/kube-controller-manager.yaml`. По умолчанию это значение равно 1 году (8760 часов).
When there is 5-10% (random value from the range) of time left before the certificate expires, kubelet requests a new certificate from kube-apiserver. For a description of the algorithm, see the official Kubernetes documentation.	Если истекло время жизни клиентского сертификата, то kubelet не сможет делать запросы к kube-apiserver и не сможет обновить сертификаты. В данном случае узел (Node) будет помечен как `NotReady` и пересоздан.
To ensure that kubelet has time to install the certificate before it expires, we recommend setting the certificate lifetime to more than 1 hour. The time is set using the `--cluster-signing-duration` argument in the `/etc/kubernetes/manifests/kube-controller-manager.yaml` manifest. By default, this value is 1 year (8760 hours).	Как вручную обновить сертификаты компонентов управляющего слоя?
If the client certificate lifetime has expired, kubelet will not be able to make requests to kube-apiserver and will not be able to renew certificates. In this case, the node will be marked as `NotReady` and recreated.	Может возникнуть ситуация, когда master-узлы кластера находятся в выключенном состоянии долгое время. За это время может истечь срок действия сертификатов компонентов управляющего слоя. После включения узлов сертификаты не обновятся автоматически, поэтому это необходимо сделать вручную.
How to manually update control plane component certificates?	Обновление сертификатов компонентов управляющего слоя происходит с помощью утилиты `kubeadm`. Чтобы обновить сертификаты, выполните следующие действия на каждом master-узле:
There may be a situation when the cluster’s master nodes are powered off for an extended period. During this time, the control plane component certificates may expire. After the nodes are powered back on, the certificates will not update automatically and must be renewed manually.	Найдите утилиту `kubeadm` на master-узле и создайте символьную ссылку c помощью следующей команды:
Control plane component certificates are updated using the `kubeadm` utility. To update the certificates, do the following on each master node:	shell ln -s $(find /var/lib/containerd -name kubeadm -type f -executable -print) /usr/bin/kubeadm
Find the `kubeadm` utility on the master node and create a symbolic link using the following command:	Обновите сертификаты:
shell ln -s $(find /var/lib/containerd -name kubeadm -type f -executable -print) /usr/bin/kubeadm	shell kubeadm certs renew all
Update the certificates:
shell kubeadm certs renew all

Compare languages | Managing control plane: FAQ

How do I add a master node to a static or hybrid cluster?

Как добавить master-узел в статическом или гибридном кластере?

How do I add master nodes to a cloud cluster?

Как добавить master-узлы в облачном кластере?

How do I reduce the number of master nodes in a cloud cluster?

Как уменьшить число master-узлов в облачном кластере?

Как убрать роль master-узла, сохранив узел?

How do I dismiss the master role while keeping the node?

Как изменить образ ОС в мультимастерном кластере?

How do I switch to a different OS image in a multi-master cluster?

Как изменить образ ОС в кластере с одним master-узлом?

Как посмотреть список узлов кластера в etcd?

Вариант 1

How do I switch to a different OS image in a single-master cluster?

How do I view the list of etcd members?

Вариант 2

Option 1

Что делать, если что-то пошло не так?

Option 2

Что делать, если кластер etcd не функционирует?

What if something went wrong?

Что делать, если etcd постоянно перезапускается с ошибкой?

What if the etcd cluster fails?

What if etcd restarts with an error?

Что делать, если объем базы данных etcd достиг лимита, установленного в quota-backend-bytes?

What to do if the database volume of etcd reaches the limit set in quota-backend-bytes?

Как настроить дополнительные политики аудита?

How do I configure additional audit policies?

Как исключить встроенные политики аудита?

Как вывести аудит-лог в стандартный вывод вместо файлов?

How to omit Deckhouse built-in policy rules?

Как работать с журналом аудита?

How to stream audit log to stdout instead of files?

How to deal with the audit log?

Как ускорить перезапуск подов при потере связи с узлом?

How do I speed up the restart of Pods if the connection to the node has been lost?

Резервное копирование и восстановление etcd

Что выполняется автоматически

Как сделать резервную копию etcd вручную

etcd backup and restore

Используя Deckhouse CLI (Deckhouse Kubernetes Platform v1.65+)

What is done automatically

How to manually backup etcd

Используя bash (Deckhouse Kubernetes Platform v1.64 и старше)

Using Deckhouse CLI (Deckhouse Kubernetes Platform v1.65+)

Using bash (Deckhouse Kubernetes Platform v1.64 and older)

Как выполнить полное восстановление состояния кластера из резервной копии etcd?

Восстановление кластера с одним master-узлом

How do perform a full recovery of the cluster state from an etcd backup?

Restoring a single-master cluster

Восстановление мультимастерного кластера

Restoring a multi-master cluster

Как восстановить объект Kubernetes из резервной копии etcd?

How do I restore a Kubernetes object from an etcd backup?

Пример шагов по восстановлению объектов из резервной копии etcd

Example of steps to restore objects from an etcd backup

Как выбирается узел, на котором будет запущен под?

How the node to run the Pod on is selected

Общее устройство планировщика Kubernetes

The structure of the Kubernetes scheduler

Логика работы

Профили планировщика

Working logic

Scheduler profiles

Этапы планирования подов

Pod scheduling stages

Документация

Documentation

Как изменить или расширить логику работы планировщика

How to change or extend the scheduler logic

Как происходит ротация сертификатов kubelet?

How does kubelet certificate rotation work?

Как вручную обновить сертификаты компонентов управляющего слоя?

How to manually update control plane component certificates?