How do I add a master nodes to a cloud cluster (single-master to a multi-master)?

Как добавить master-узлы в облачном кластере?

See the control-plane-manager module FAQ.

Как конвертировать кластер с одним master-узлом в мультикластерный описано в FAQ модуля control-plane-manager.

How do I reduce the number of master nodes in a cloud cluster (multi-master to single-master)?

See the control-plane-manager module FAQ.

Как уменьшить число master-узлов в облачном кластере?

Static nodes

Как конвертировать мультимастерный кластер в кластер с одним master-узлом описано в FAQ модуля control-plane-manager.

Статические узлы

You can add a static node to the cluster manually (an example) or by using Cluster API Provider Static.

How do I add a static node to a cluster (Cluster API Provider Static)?

Добавить статический узел в кластер можно вручную (пример) или с помощью Cluster API Provider Static.

To add a static node to a cluster (bare metal server or virtual machine), follow these steps:

Как добавить статический узел в кластер (Cluster API Provider Static)?

Prepare the required resources:

Чтобы добавить статический узел в кластер (сервер bare-metal или виртуальную машину), выполните следующие шаги:

Allocate a server or virtual machine and ensure that the node has the necessary network connectivity with the cluster.

Подготовьте необходимые ресурсы:

If necessary, install additional operating system packages and configure the mount points that will be used on the node.

Выделите сервер или виртуальную машину и убедитесь, что узел имеет необходимую сетевую связанность с кластером.

Create a user with sudo` privileges:

При необходимости установите дополнительные пакеты ОС и настройте точки монтирования, которые будут использоваться на узле.

Add a new user (in this example, caps) with sudo privileges:

Создайте пользователя с правами sudo:

shell useradd -m -s /bin/bash caps usermod -aG sudo caps

Добавьте нового пользователя (в данном примере — caps) с правами выполнения команд через sudo:

Allow the user to run sudo commands without having to enter a password. For this, add the following line to the sudo configuration on the server (you can either edit the /etc/sudoers file, or run the sudo visudo command, or use some other method):

shell useradd -m -s /bin/bash caps usermod -aG sudo caps

shell caps ALL=(ALL) NOPASSWD: ALL

Разрешите пользователю выполнять команды через sudo без ввода пароля. Для этого отредактируйте конфигурацию sudo (отредактировав файл /etc/sudoers, выполнив команду sudo visudo или другим способом):

Set UsePAM to yes in /etc/ssh/sshd_config on server and restart sshd service:

shell caps ALL=(ALL) NOPASSWD: ALL

shell sudo systemctl restart sshd

На сервере откройте файл /etc/ssh/sshd_config и убедитесь, что параметр UsePAM установлен в значение yes. Затем перезапустите службу sshd:

Generate a pair of SSH keys with an empty passphrase on the server:

shell sudo systemctl restart sshd

shell ssh-keygen -t rsa -f caps-id -C “” -N “”

Сгенерируйте на сервере пару SSH-ключей с пустой парольной фразой:

The public and private keys of the caps user will be stored in the caps-id.pub and caps-id files in the current directory on the server.

shell ssh-keygen -t rsa -f caps-id -C “” -N “”

Add the generated public key to the /home/caps/.ssh/authorized_keys file of the caps user by executing the following commands in the keys directory on the server:

Приватный и публичный ключи будут сохранены в файлах caps-id и caps-id.pub соответственно в текущей директории.

shell mkdir -p /home/caps/.ssh cat caps-id.pub » /home/caps/.ssh/authorized_keys chmod 700 /home/caps/.ssh chmod 600 /home/caps/.ssh/authorized_keys chown -R caps:caps /home/caps/

Добавьте полученный публичный ключ в файл /home/caps/.ssh/authorized_keys пользователя caps, выполнив в директории с ключами на сервере следующие команды:

Create the SSHCredentials resource.
Create the StaticInstance resource.
Create the NodeGroup resource with the Static nodeType, specify the desired number of nodes in the group and, if necessary, the filter for StaticInstance.

shell mkdir -p /home/caps/.ssh cat caps-id.pub » /home/caps/.ssh/authorized_keys chmod 700 /home/caps/.ssh chmod 600 /home/caps/.ssh/authorized_keys chown -R caps:caps /home/caps/

An example of adding a static node.

Создайте ресурс SSHCredentials.
Создайте ресурс StaticInstance.
Создайте ресурс NodeGroup с nodeType Static, указав желаемое количество узлов в группе и, при необходимости, фильтр выбора StaticInstance.

How do I add a batch of static nodes to a cluster manually?

Пример добавления статического узла.

Use an existing one or create a new NodeGroup custom resource (example of the NodeGroup called worker). The nodeType parameter for static nodes in the NodeGroup must be Static or CloudStatic.

Как добавить несколько статических узлов в кластер вручную?

You can automate the bootstrap process with any automation platform you prefer. The following is an example for Ansible.

Используйте существующий или создайте новый кастомный ресурс (Custom Resource) NodeGroup (пример NodeGroup с именем worker).

Pick up one of Kubernetes API Server endpoints. Note that this IP must be accessible from nodes that are being added to the cluster:

Автоматизировать процесс добавления узлов можно с помощью любой платформы автоматизации. Далее приведен пример для Ansible.

shell kubectl -n default get ep kubernetes -o json | jq ‘.subsets[0].addresses[0].ip + “:” + (.subsets[0].ports[0].port | tostring)’ -r

Получите один из адресов Kubernetes API-сервера. Обратите внимание, что IP-адрес должен быть доступен с узлов, которые добавляются в кластер:

Check the K8s version. If the version >= 1.25, create node-group token:

shell kubectl -n default get ep kubernetes -o json | jq ‘.subsets[0].addresses[0].ip + “:” + (.subsets[0].ports[0].port | tostring)’ -r

shell kubectl create token node-group –namespace d8-cloud-instance-manager –duration 1h

Проверьте версию K8s. Если версия >= 1.25, создайте токен node-group:

Save the token you got and add it to the token: field of the Ansible playbook in the next steps.

shell kubectl create token node-group –namespace d8-cloud-instance-manager –duration 1h

If the Kubernetes version is smaller than 1.25, get a Kubernetes API token for a special ServiceAccount that Deckhouse manages:

Сохраните полученный токен, и добавьте в поле token: playbook’а Ansible на дальнейших шагах.

shell kubectl -n d8-cloud-instance-manager get $(kubectl -n d8-cloud-instance-manager get secret -o name | grep node-group-token)
-o json | jq ‘.data.token’ -r | base64 -d && echo “”

Если версия Kubernetes меньше 1.25, получите Kubernetes API-токен для специального ServiceAccount’а, которым управляет Deckhouse:

Create Ansible playbook with vars replaced with values from previous steps:

shell kubectl -n d8-cloud-instance-manager get $(kubectl -n d8-cloud-instance-manager get secret -o name | grep node-group-token)
-o json | jq ‘.data.token’ -r | base64 -d && echo “”

Создайте Ansible playbook с vars, которые заменены на полученные на предыдущих шагах значения:

yaml

hosts: all become: yes gather_facts: no vars: kube_apiserver: token: tasks:
name: Check if node is already bootsrapped stat: path: /var/lib/bashible register: bootstrapped
name: Get bootstrap secret uri: url: “https://{{ kube_apiserver }}/api/v1/namespaces/d8-cloud-instance-manager/secrets/manual-bootstrap-for-{{ node_group }}” return_content: yes method: GET status_code: 200 body_format: json headers: Authorization: “Bearer {{ token }}” validate_certs: no register: bootstrap_secret when: bootstrapped.stat.exists == False
name: Run bootstrap.sh shell: “{{ bootstrap_secret.json.data[‘bootstrap.sh’] | b64decode }}” args: executable: /bin/bash ignore_errors: yes when: bootstrapped.stat.exists == False
name: wait wait_for_connection: delay: 30 when: bootstrapped.stat.exists == False

yaml

hosts: all become: yes gather_facts: no vars: kube_apiserver: token: tasks:
name: Check if node is already bootsrapped stat: path: /var/lib/bashible register: bootstrapped
name: Get bootstrap secret uri: url: “https://{{ kube_apiserver }}/api/v1/namespaces/d8-cloud-instance-manager/secrets/manual-bootstrap-for-{{ node_group }}” return_content: yes method: GET status_code: 200 body_format: json headers: Authorization: “Bearer {{ token }}” validate_certs: no register: bootstrap_secret when: bootstrapped.stat.exists == False
name: Run bootstrap.sh shell: “{{ bootstrap_secret.json.data[‘bootstrap.sh’] | b64decode }}” args: executable: /bin/bash ignore_errors: yes when: bootstrapped.stat.exists == False
name: wait wait_for_connection: delay: 30 when: bootstrapped.stat.exists == False

Specify one more node_group variable. This variable must be the same as the name of NodeGroup to which node will belong. Variable can be passed in different ways, for example, by using an inventory file.:

text [system] system-0 system-1

Определите дополнительную переменную node_group. Значение переменной должно совпадать с именем NodeGroup, которой будет принадлежать узел. Переменную можно передать различными способами, например с использованием inventory-файла:

[system:vars] node_group=system

text [system] system-0 system-1

[worker] worker-0 worker-1

[system:vars] node_group=system

[worker:vars] node_group=worker

[worker] worker-0 worker-1

Run the playbook with the inventory file.

[worker:vars] node_group=worker

How do I clean up a static node manually?

Запустите выполнение playbook’а с использованием inventory-файла.

This method is valid for both manually configured nodes (using the bootstrap script) and nodes configured using CAPS.

Как вручную очистить статический узел?

To decommission a node from the cluster and clean up the server (VM), run the following command on the node:

shell bash /var/lib/bashible/cleanup_static_node.sh –yes-i-am-sane-and-i-understand-what-i-am-doing

Инструкция справедлива как для узла, настроенного вручную (с помощью бутстрап-скрипта), так и для узла, настроенного с помощью CAPS.

Can I delete a StaticInstance?

Чтобы вывести из кластера узел и очистить сервер (ВМ), выполните следующую команду на узле:

A StaticInstance that is in the Pending state can be deleted with no adverse effects.

shell bash /var/lib/bashible/cleanup_static_node.sh –yes-i-am-sane-and-i-understand-what-i-am-doing

To delete a StaticInstance in any state other than Pending (Running, Cleaning, Bootstrapping), you need to:

Можно ли удалить StaticInstance?

Add the label "node.deckhouse.io/allow-bootstrap": "false" to the StaticInstance.

StaticInstance, находящийся в состоянии Pending можно удалять без каких-либо проблем.

Example command for adding a label:

Чтобы удалить StaticInstance находящийся в любом состоянии, отличном от Pending (Running, Cleaning, Bootstrapping), выполните следующие шаги:

shell d8 k label staticinstance d8cluster-worker node.deckhouse.io/allow-bootstrap=false

Добавьте лейбл "node.deckhouse.io/allow-bootstrap": "false" в StaticInstance.

Wait until the StaticInstance status becomes Pending.

Пример команды для добавления лейбла:

To check the status of StaticInstance, use the command:

shell d8 k label staticinstance d8cluster-worker node.deckhouse.io/allow-bootstrap=false

shell d8 k get staticinstances

Дождитесь, пока StaticInstance перейдет в статус Pending.

Delete the StaticInstance.

Для проверки статуса StaticInstance используйте команду:

Example command for deleting StaticInstance:

shell d8 k get staticinstances

shell d8 k delete staticinstance d8cluster-worker

Удалите StaticInstance.

Decrease the NodeGroup.spec.staticInstances.count field by 1.

Пример команды для удаления StaticInstance:

How do I change the IP address of a StaticInstance?

shell d8 k delete staticinstance d8cluster-worker

You cannot change the IP address in the StaticInstance resource. If an incorrect address is specified in StaticInstance, you have to delete the StaticInstance and create a new one.

Уменьшите значение параметра NodeGroup.spec.staticInstances.count на 1.

How do I migrate a manually configured static node under CAPS control?

Как изменить IP-адрес StaticInstance?

You need to clean up the node, then hand over the node under CAPS control.

Изменить IP-адрес в ресурсе StaticInstance нельзя. Если в StaticInstance указан ошибочный адрес, то нужно удалить StaticInstance и создать новый.

How do I change the NodeGroup of a static node?

Как мигрировать статический узел настроенный вручную под управление CAPS?

Note that if a node is under CAPS control, you cannot change the NodeGroup membership of such a node. The only alternative is to delete StaticInstance and create a new one.

Необходимо выполнить очистку узла, затем добавить узел под управление CAPS.

To switch an existing manually created static node to another NodeGroup, you need to change its group label:

Как изменить NodeGroup у статического узла?

shell kubectl label node –overwrite node.deckhouse.io/group= kubectl label node node-role.kubernetes.io/-

Applying the changes will take some time.

Если узел находится под управлением CAPS, то изменить принадлежность к NodeGroup у такого узла нельзя. Единственный вариант — удалить StaticInstance и создать новый.

How to clean up a node for adding to the cluster?

Чтобы перенести существующий статический узел созданный вручную из одной NodeGroup в другую, необходимо изменить у узла лейбл группы:

This is only needed if you have to move a static node from one cluster to another. Be aware these operations remove local storage data. If you just need to change a NodeGroup, follow this instruction.

shell kubectl label node –overwrite node.deckhouse.io/group= kubectl label node node-role.kubernetes.io/-

Evict resources from the node and remove the node from LINSTOR/DRBD using the instruction if the node you are cleaning up has LINSTOR/DRBD storage pools.

Применение изменений потребует некоторого времени.

Delete the node from the Kubernetes cluster:

Как очистить узел для последующего ввода в кластер?

shell kubectl drain --ignore-daemonsets --delete-local-data kubectl delete node

Это необходимо только в том случае, если нужно переместить статический узел из одного кластера в другой. Имейте в виду, что эти операции удаляют данные локального хранилища. Если необходимо просто изменить NodeGroup, следуйте этой инструкции.

Run cleanup script on the node:

Если на зачищаемом узле есть пулы хранения LINSTOR/DRBD, то предварительно перенесите ресурсы с узла и удалите узел LINSTOR/DRBD, следуя инструкции.

shell bash /var/lib/bashible/cleanup_static_node.sh –yes-i-am-sane-and-i-understand-what-i-am-doing

Удалите узел из кластера Kubernetes:

Run the bootstrap.sh script after reboot of the node.

shell kubectl drain --ignore-daemonsets --delete-local-data kubectl delete node

How do I know if something went wrong?

Запустите на узле скрипт очистки:

If a node in a nodeGroup is not updated (the value of UPTODATE when executing the kubectl get nodegroup command is less than the value of NODES) or you assume some other problems that may be related to the node-manager module, then you need to look at the logs of the bashible service. The bashible service runs on each node managed by the node-manager module.

shell bash /var/lib/bashible/cleanup_static_node.sh –yes-i-am-sane-and-i-understand-what-i-am-doing

To view the logs of the bashible service on a specific node, run the following command:

После перезагрузки узла запустите скрипт bootstrap.sh.

shell journalctl -fu bashible

Как понять, что что-то пошло не так?

Example of output when the bashible service has performed all necessary actions:

Если узел в NodeGroup не обновляется (значение UPTODATE при выполнении команды kubectl get nodegroup меньше значения NODES) или вы предполагаете какие-то другие проблемы, которые могут быть связаны с модулем node-manager, нужно проверить логи сервиса bashible. Сервис bashible запускается на каждом узле, управляемом модулем node-manager.

console May 25 04:39:16 kube-master-0 systemd[1]: Started Bashible service. May 25 04:39:16 kube-master-0 bashible.sh[1976339]: Configuration is in sync, nothing to do. May 25 04:39:16 kube-master-0 systemd[1]: bashible.service: Succeeded.

Чтобы проверить логи сервиса bashible, выполните на узле следующую команду:

How do I know what is running on a node while it is being created?

shell journalctl -fu bashible

You can analyze cloud-init to find out what’s happening on a node during the bootstrapping process:

Пример вывода, когда все необходимые действия выполнены:

Find the node that is currently bootstrapping:

shell kubectl get instances | grep Pending

Как посмотреть, что в данный момент выполняется на узле при его создании?

An example:

Если необходимо узнать, что происходит на узле (например, узел долго создается), можно проверить логи cloud-init. Для этого выполните следующие шаги:

shell $ kubectl get instances | grep Pending dev-worker-2a6158ff-6764d-nrtbj Pending 46s

Найдите узел, который находится в стадии бутстрапа:

Get information about connection parameters for viewing logs:

shell kubectl get instances | grep Pending

shell kubectl get instances dev-worker-2a6158ff-6764d-nrtbj -o yaml | grep ‘bootstrapStatus’ -B0 -A2

Пример:

An example:

shell $ kubectl get instances | grep Pending dev-worker-2a6158ff-6764d-nrtbj Pending 46s

shell $ kubectl get instances dev-worker-2a6158ff-6764d-nrtbj -o yaml | grep ‘bootstrapStatus’ -B0 -A2 bootstrapStatus: description: Use ‘nc 192.168.199.178 8000’ to get bootstrap logs. logsEndpoint: 192.168.199.178:8000

Получите информацию о параметрах подключения для просмотра логов:

Run the command you got (nc 192.168.199.115 8000 according to the example above) to see cloud-init logs and determine the cause of the problem on the node.

shell kubectl get instances dev-worker-2a6158ff-6764d-nrtbj -o yaml | grep ‘bootstrapStatus’ -B0 -A2

The logs of the initial node configuration are located at /var/log/cloud-init-output.log.

Пример:

How do I update kernel on nodes?

Debian-based distros

Выполните полученную команду (в примере выше — nc 192.168.199.178 8000), чтобы просмотреть логи cloud-init и определить, на каком этапе остановилась настройка узла.

Create a Node Group Configuration resource by specifying the desired kernel version in the desired_version variable of the shell script (the resource’s spec.content parameter):

Логи первоначальной настройки узла находятся в /var/log/cloud-init-output.log.

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-kernel.sh spec: bundles:

‘*’ nodeGroups:
‘*’ weight: 32 content: | Copyright 2022 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Как обновить ядро на узлах?

desired_version=”5.15.0-53-generic”

Для дистрибутивов, основанных на Debian

bb-event-on ‘bb-package-installed’ ‘post-install’ post-install() { bb-log-info “Setting reboot flag due to kernel was updated” bb-flag-set reboot }

Создайте ресурс NodeGroupConfiguration, указав в переменной desired_version shell-скрипта (параметр spec.content ресурса) желаемую версию ядра:

version_in_use=”$(uname -r)”

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-kernel.sh spec: bundles:

‘*’ nodeGroups:
‘*’ weight: 32 content: | Copyright 2022 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

if [[ “$version_in_use” == “$desired_version” ]]; then exit 0 fi

desired_version=”5.15.0-53-generic”

bb-deckhouse-get-disruptive-update-approval bb-apt-install “linux-image-${desired_version}”

bb-event-on ‘bb-package-installed’ ‘post-install’ post-install() { bb-log-info “Setting reboot flag due to kernel was updated” bb-flag-set reboot }

CentOS-based distros

version_in_use=”$(uname -r)”

Create a Node Group Configuration resource by specifying the desired kernel version in the desired_version variable of the shell script (the resource’s spec.content parameter):

if [[ “$version_in_use” == “$desired_version” ]]; then exit 0 fi

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-kernel.sh spec: bundles:

‘*’ nodeGroups:
‘*’ weight: 32 content: | Copyright 2022 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

bb-deckhouse-get-disruptive-update-approval bb-apt-install “linux-image-${desired_version}”

desired_version=”3.10.0-1160.42.2.el7.x86_64”

Для дистрибутивов, основанных на CentOS

bb-event-on ‘bb-package-installed’ ‘post-install’ post-install() { bb-log-info “Setting reboot flag due to kernel was updated” bb-flag-set reboot }

version_in_use=”$(uname -r)”

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-kernel.sh spec: bundles:

‘*’ nodeGroups:
‘*’ weight: 32 content: | Copyright 2022 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

if [[ “$version_in_use” == “$desired_version” ]]; then exit 0 fi

desired_version=”3.10.0-1160.42.2.el7.x86_64”

bb-deckhouse-get-disruptive-update-approval bb-dnf-install “kernel-${desired_version}”

bb-event-on ‘bb-package-installed’ ‘post-install’ post-install() { bb-log-info “Setting reboot flag due to kernel was updated” bb-flag-set reboot }

NodeGroup parameters and their result

version_in_use=”$(uname -r)”

The NodeGroup parameter	Disruption update	Node provisioning	Kubelet restart
chaos	-	-	-
cloudInstances.classReference	-	+	-
cloudInstances.maxSurgePerZone	-	-	-
cri.containerd.maxConcurrentDownloads	-	-	+
cri.type	- (NotManaged) / + (other)	-	-
disruptions	-	-	-
kubelet.maxPods	-	-	+
kubelet.rootDir	-	-	+
kubernetesVersion	-	-	+
nodeTemplate	-	-	-
static	-	-	+
update.maxConcurrent	-	-	-

if [[ “$version_in_use” == “$desired_version” ]]; then exit 0 fi

Refer to the description of the NodeGroup custom resource for more information about the parameters.

bb-deckhouse-get-disruptive-update-approval bb-dnf-install “kernel-${desired_version}”

Changing the InstanceClass or instancePrefix parameter in the Deckhouse configuration won’t result in a RollingUpdate. Deckhouse will create new MachineDeployments and delete the old ones. The number of machinedeployments ordered at the same time is determined by the cloud Instances.maxSurgePerZone parameter.

Какие параметры NodeGroup к чему приводят?

During the disruption update, an evict of the pods from the node is performed. If any pod failes to evict, the evict is repeated every 20 seconds until a global timeout of 5 minutes is reached. After that, the pods that failed to evict are removed.

Параметр NG	Disruption update	Перезаказ узлов	Рестарт kubelet
chaos	-	-	-
cloudInstances.classReference	-	+	-
cloudInstances.maxSurgePerZone	-	-	-
cri.containerd.maxConcurrentDownloads	-	-	+
cri.type	- (NotManaged) / + (other)	-	-
disruptions	-	-	-
kubelet.maxPods	-	-	+
kubelet.rootDir	-	-	+
kubernetesVersion	-	-	+
nodeTemplate	-	-	-
static	-	-	+
update.maxConcurrent	-	-	-

How do I redeploy ephemeral machines in the cloud with a new configuration?

Подробно о всех параметрах можно прочитать в описании кастомного ресурса NodeGroup.

If the Deckhouse configuration is changed (both in the node-manager module and in any of the cloud providers), the VMs will not be redeployed. The redeployment is performed only in response to changing InstanceClass or NodeGroup objects.

В случае изменения параметров InstanceClass или instancePrefix в конфигурации Deckhouse не будет происходить RollingUpdate. Deckhouse создаст новые MachineDeployment, а старые удалит. Количество заказываемых одновременно MachineDeployment определяется параметром cloudInstances.maxSurgePerZone.

To force the redeployment of all Machines, you need to add/modify the manual-rollout-id annotation to the NodeGroup: kubectl annotate NodeGroup name_ng "manual-rollout-id=$(uuidgen)" --overwrite.

При обновлении, которое требует прерывания работы узла (disruption update), выполняется процесс вытеснения подов с узла. Если какой-либо под не может быть вытеснен, попытка повторяется каждые 20 секунд до достижения глобального таймаута в 5 минут. После истечения этого времени, поды, которые не удалось вытеснить, удаляются принудительно.

How do I allocate nodes to specific loads?

Как пересоздать эфемерные машины в облаке с новой конфигурацией?

You cannot use the deckhouse.io domain in labels and taints keys of the NodeGroup. It is reserved for Deckhouse components. Please, use the dedicated or dedicated.client.com keys.

При изменении конфигурации Deckhouse (как в модуле node-manager, так и в любом из облачных провайдеров) виртуальные машины не будут перезаказаны. Пересоздание происходит только после изменения ресурсов InstanceClass или NodeGroup.

There are two ways to solve this problem:

Чтобы принудительно пересоздать все узлы, связанные с ресурсом Machines, следует добавить/изменить аннотацию manual-rollout-id в NodeGroup: kubectl annotate NodeGroup имя_ng "manual-rollout-id=$(uuidgen)" --overwrite.

You can set labels to NodeGroup’s spec.nodeTemplate.labels, to use them in the Pod’s spec.nodeSelector or spec.affinity.nodeAffinity parameters. In this case, you select nodes that the scheduler will use for running the target application.
You cat set taints to NodeGroup’s spec.nodeTemplate.taints and then remove them via the Pod’s spec.tolerations parameter. In this case, you disallow running applications on these nodes unless those applications are explicitly allowed.

Как выделить узлы под специфические нагрузки?

Deckhouse tolerates the dedicated by default, so we recommend using the dedicated key with any value for taints on your dedicated nodes.️

Запрещено использование домена deckhouse.io в ключах labels и taints у NodeGroup. Он зарезервирован для компонентов Deckhouse. Следует отдавать предпочтение в пользу ключей dedicated или dedicated.client.com.

To use custom keys for taints (e.g., dedicated.client.com), you must add the key’s value to the modules.placement.customTolerationKeys parameters. This way, deckhouse can deploy system components (e.g., cni-flannel) to these dedicated nodes.

Для решений данной задачи существуют два механизма:

How to allocate nodes to system components?

Установка меток в NodeGroup spec.nodeTemplate.labels для последующего использования их в Pod spec.nodeSelector или spec.affinity.nodeAffinity. Указывает, какие именно узлы будут выбраны планировщиком для запуска целевого приложения.
Установка ограничений в NodeGroup spec.nodeTemplate.taints с дальнейшим снятием их в Pod spec.tolerations. Запрещает исполнение не разрешенных явно приложений на этих узлах.

Frontend

Deckhouse по умолчанию поддерживает использование taint’а с ключом dedicated, поэтому рекомендуется применять этот ключ с любым значением для taints на ваших выделенных узлах.

For Ingress controllers, use the NodeGroup with the following configuration:

Если требуется использовать другие ключи для taints (например, dedicated.client.com), необходимо добавить соответствующее значение ключа в параметр modules.placement.customTolerationKeys. Это обеспечит разрешение системным компонентам, таким как cni-flannel, использовать эти узлы.

yaml nodeTemplate: labels: node-role.deckhouse.io/frontend: “” taints:

effect: NoExecute key: dedicated.deckhouse.io value: frontend

Подробности в статье на Habr.

System components

Как выделить узлы под системные компоненты?

NodeGroup for components of Deckhouse subsystems will look as follows:

Фронтенд

yaml nodeTemplate: labels: node-role.deckhouse.io/system: “” taints:

effect: NoExecute key: dedicated.deckhouse.io value: system

Для Ingress-контроллеров используйте NodeGroup со следующей конфигурацией:

How do I speed up node provisioning on the cloud when scaling applications horizontally?

yaml nodeTemplate: labels: node-role.deckhouse.io/frontend: “” taints:

effect: NoExecute key: dedicated.deckhouse.io value: frontend

The most efficient way is to have some extra nodes “ready”. In this case, you can run new application replicas on them almost instantaneously. The obvious disadvantage of this approach is the additional maintenance costs related to these nodes.

Системные

Here is how you should configure the target NodeGroup:

Для компонентов подсистем Deckhouse параметр NodeGroup будет настроен с параметрами:

Specify the number of “ready” nodes (or a percentage of the maximum number of nodes in the group) using the cloudInstances.standby paramter.
If there are additional service components on nodes that are not handled by Deckhouse (e.g., the filebeat DaemonSet), you can specify the percentage of node resources they can consume via the standbyHolder.overprovisioningRate parameter.
This feature requires that at least one group node is already running in the cluster. In other words, there must be either a single replica of the application, or the cloudInstances.minPerZone parameter must be set to 1.

yaml nodeTemplate: labels: node-role.deckhouse.io/system: “” taints:

effect: NoExecute key: dedicated.deckhouse.io value: system

An example:

Как ускорить заказ узлов в облаке при горизонтальном масштабировании приложений?

yaml cloudInstances: maxPerZone: 10 minPerZone: 1 standby: 10% standbyHolder: overprovisioningRate: 30%

Самое действенное — держать в кластере некоторое количество предварительно подготовленных узлов, которые позволят новым репликам ваших приложений запускаться мгновенно. Очевидным минусом данного решения будут дополнительные расходы на содержание этих узлов.

How do I disable machine-controller-manager/CAPI in the case of potentially cluster-damaging changes?

Необходимые настройки целевой NodeGroup будут следующие:

Use this switch only if you know what you are doing and clearly understand the consequences.

Указать абсолютное количество предварительно подготовленных узлов (или процент от максимального количества узлов в этой группе) в параметре cloudInstances.standby.
При наличии на узлах дополнительных служебных компонентов, не обслуживаемых Deckhouse (например, DaemonSet filebeat), задать их процентное потребление ресурсов узла можно в параметре standbyHolder.overprovisioningRate.
Для работы этой функции требуется, чтобы как минимум один узел из группы уже был запущен в кластере. Иными словами, либо должна быть доступна одна реплика приложения, либо количество узлов для этой группы cloudInstances.minPerZone должно быть 1.

Set the mcmEmergencyBrake parameter to true:

Пример:

yaml mcmEmergencyBrake: true

yaml cloudInstances: maxPerZone: 10 minPerZone: 1 standby: 10% standbyHolder: overprovisioningRate: 30%

For disabling CAPI, set the capiEmergencyBrake parameter to true:

Как выключить machine-controller-manager/CAPI в случае выполнения потенциально деструктивных изменений в кластере?

yaml capiEmergencyBrake: true

Использовать эту настройку допустимо только тогда, когда вы четко понимаете, зачем это необходимо.

How do I restore the master node if kubelet cannot load the control plane components?

Для того чтобы временно отключить machine-controller-manager (MCM) и предотвратить его автоматические действия, которые могут повлиять на инфраструктуру кластера (например, удаление или пересоздание узлов), установите следующий параметр в конфигурации:

Such a situation may occur if images of the control plane components on the master were deleted in a cluster that has a single master node (e.g., the directory /var/lib/containerd was deleted). In this case, kubelet cannot pull images of the control plane components when restarted since the master node lacks authorization parameters required for accessing registry.deckhouse.io.

yaml mcmEmergencyBrake: true

Below is an instruction on how you can restore the master node.

Для отключения CAPI установите следующий параметр в конфигурации:

containerd

yaml capiEmergencyBrake: true

Execute the following command to restore the master node in any cluster running under Deckhouse:

Как восстановить master-узел, если kubelet не может загрузить компоненты control plane?

shell kubectl -n d8-system get secrets deckhouse-registry -o json | jq -r ‘.data.”.dockerconfigjson”’ | base64 -d | jq -r ‘.auths.”registry.deckhouse.io”.auth’

Подобная ситуация может возникнуть, если в кластере с одним master-узлом на нем были удалены образы компонентов control plane (например, удалена директория /var/lib/containerd). В этом случае kubelet при рестарте не сможет скачать образы компонентов control plane, поскольку на master-узле нет параметров авторизации в registry.deckhouse.io.

Copy the command’s output and use it for setting the AUTH variable on the corrupted master.

Далее приведена инструкция по восстановлению master-узла.

Next, you need to pull images of control plane components to the corrupted master:

containerd

shell for image in $(grep “image:” /etc/kubernetes/manifests/* | awk ‘{print $3}’); do crictl pull –auth $AUTH $image done

Для восстановления работоспособности master-узла нужно в любом рабочем кластере под управлением Deckhouse выполнить команду:

You need to restart kubelet after pulling the images.

shell kubectl -n d8-system get secrets deckhouse-registry -o json | jq -r ‘.data.”.dockerconfigjson”’ | base64 -d | jq -r ‘.auths.”registry.deckhouse.io”.auth’

How to change CRI for NodeGroup?

Вывод команды нужно скопировать и присвоить переменной AUTH на поврежденном master-узле.

CRI can only be switched from Containerd to NotManaged and back (the cri.type parameter).

Далее на поврежденном master-узле нужно загрузить образы компонентов control-plane:

Set NodeGroup cri.type to Containerd or NotManaged.

shell for image in $(grep “image:” /etc/kubernetes/manifests/* | awk ‘{print $3}’); do crictl pull –auth $AUTH $image done

NodeGroup YAML example:

После загрузки образов необходимо перезапустить kubelet.

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker spec: nodeType: Static cri: type: Containerd

Как изменить CRI для NodeGroup?

Also, this operation can be done with patch:

Смена CRI возможна только между Containerd на NotManaged и обратно (параметр cri.type).

For Containerd:

Для изменения CRI для NodeGroup, установите параметр cri.type в Containerd или в NotManaged.

shell kubectl patch nodegroup --type merge -p '{"spec":{"cri":{"type":"Containerd"}}}'

Пример YAML-манифеста NodeGroup:

For NotManaged:

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker spec: nodeType: Static cri: type: Containerd

shell kubectl patch nodegroup --type merge -p '{"spec":{"cri":{"type":"NotManaged"}}}'

Также эту операцию можно выполнить с помощью патча:

While changing cri.type for NodeGroups, created using dhctl, you must change it in dhctl config edit provider-cluster-configuration and in NodeGroup object.

Для Containerd:

After setting up a new CRI for NodeGroup, the node-manager module drains nodes one by one and installs a new CRI on them. Node update is accompanied by downtime (disruption). Depending on the disruption setting for NodeGroup, the node-manager module either automatically allows node updates or requires manual confirmation.

shell kubectl patch nodegroup <имя NodeGroup=""> --type merge -p '{"spec":{"cri":{"type":"Containerd"}}}'

How to change CRI for the whole cluster?

Для NotManaged:

CRI can only be switched from Containerd to NotManaged and back (the cri.type parameter).

shell kubectl patch nodegroup <имя NodeGroup=""> --type merge -p '{"spec":{"cri":{"type":"NotManaged"}}}'

It is necessary to use the dhctl utility to edit the defaultCRI parameter in the cluster-configuration config.

При изменении cri.type для NodeGroup, созданных с помощью dhctl, необходимо обновить это значение в dhctl config edit provider-cluster-configuration и настройках объекта NodeGroup.

Also, this operation can be done with the following patch:

После изменения CRI для NodeGroup модуль node-manager будет поочередно перезагружать узлы, применяя новый CRI. Обновление узла сопровождается простоем (disruption). В зависимости от настройки disruption для NodeGroup, модуль node-manager либо автоматически выполнит обновление узлов, либо потребует подтверждения вручную.

For Containerd:

Как изменить CRI для всего кластера?

shell data=”$(kubectl -n kube-system get secret d8-cluster-configuration -o json | jq -r ‘.data.”cluster-configuration.yaml”’ | base64 -d | sed “s/NotManaged/Containerd/” | base64 -w0)” kubectl -n kube-system patch secret d8-cluster-configuration -p “{"data":{"cluster-configuration.yaml":"$data"}}”

Смена CRI возможна только между Containerd на NotManaged и обратно (параметр cri.type).

For NotManaged:

Для изменения CRI для всего кластера, необходимо с помощью утилиты dhctl отредактировать параметр defaultCRI в конфигурационном файле cluster-configuration.

shell data=”$(kubectl -n kube-system get secret d8-cluster-configuration -o json | jq -r ‘.data.”cluster-configuration.yaml”’ | base64 -d | sed “s/Containerd/NotManaged/” | base64 -w0)” kubectl -n kube-system patch secret d8-cluster-configuration -p “{"data":{"cluster-configuration.yaml":"$data"}}”

Также возможно выполнить эту операцию с помощью kubectl patch.

If it is necessary to leave some NodeGroup on another CRI, then before changing the defaultCRI it is necessary to set CRI for this NodeGroup, as described here.

Для Containerd:

Changing defaultCRI entails changing CRI on all nodes, including master nodes. If there is only one master node, this operation is dangerous and can lead to a complete failure of the cluster! The preferred option is to make a multi-master and change the CRI type.

When changing the CRI in the cluster, additional steps are required for the master nodes:

Для NotManaged:

Deckhouse updates nodes in master NodeGroup one by one, so you need to discover which node is updating right now:

shell kubectl get nodes -l node-role.kubernetes.io/control-plane=”” -o json | jq ‘.items[] | select(.metadata.annotations.”update.node.deckhouse.io/approved”==””) | .metadata.name’ -r

Если необходимо, чтобы отдельные NodeGroup использовали другой CRI, перед изменением defaultCRI необходимо установить CRI для этой NodeGroup, как описано в документации.

Confirm the disruption of the master node that was discovered in the previous step:

Изменение defaultCRI влечет за собой изменение CRI на всех узлах, включая master-узлы. Если master-узел один, данная операция является опасной и может привести к полной неработоспособности кластера. Рекомендуется использовать multimaster-конфигурацию и менять тип CRI только после этого.

shell kubectl annotate node update.node.deckhouse.io/disruption-approved=

При изменении CRI в кластере для master-узлов необходимо выполнить дополнительные шаги:

Wait for the updated master node to switch to Ready state. Repeat steps for the next master node.

Чтобы определить, какой узел в текущий момент обновляется в master NodeGroup, используйте следующую команду:

How to add node configuration step?

shell kubectl get nodes -l node-role.kubernetes.io/control-plane=”” -o json | jq ‘.items[] | select(.metadata.annotations.”update.node.deckhouse.io/approved”==””) | .metadata.name’ -r

Additional node configuration steps are set via the NodeGroupConfiguration custom resource.

Подтвердите остановку (disruption) для master-узла, полученного на предыдущем шаге:

How to automatically put custom labels on the node?

shell kubectl annotate node <имя master-узла=""> update.node.deckhouse.io/disruption-approved=

On the node, create the directory /var/lib/node_labels.

Дождитесь перехода обновленного master-узла в Ready. Выполните итерацию для следующего master-узла.

Create a file or files containing the necessary labels in it. The number of files can be any, as well as the number of subdirectories containing them.

Как добавить шаг для конфигурации узлов?

Add the necessary labels to the files in the key=value format. For example:

Дополнительные шаги для конфигурации узлов задаются с помощью кастомного ресурса NodeGroupConfiguration.

console example-label=test

Как автоматически проставить на узел кастомные лейблы?

Save the files.

На узле создайте каталог /var/lib/node_labels.

When adding a node to the cluster, the labels specified in the files will be automatically affixed to the node.

Создайте в нём файл или файлы, содержащие необходимые лейблы. Количество файлов может быть любым, как и вложенность подкаталогов, их содержащих.

Please note that it is not possible to add labels used in DKP in this way. This method will only work with custom labels that do not overlap with those reserved for Deckhouse.

Добавьте в файлы нужные лейблы в формате key=value. Например:

How to deploy custom containerd configuration?

console example-label=test

The example of NodeGroupConfiguration uses functions of the script 032_configure_containerd.sh.

Сохраните файлы.

Adding custom settings causes a restart of the containerd service.

При добавлении узла в кластер указанные в файлах лейблы будут автоматически проставлены на узел.

Bashible on nodes merges main Deckhouse containerd config with configs from /etc/containerd/conf.d/*.toml.

Обратите внимание, что добавить таким образом лейблы, использующиеся в DKP, невозможно. Работать такой метод будет только с кастомными лейблами, не пересекающимися с зарезервированными для Deckhouse.

You can override the values of the parameters that are specified in the file /etc/containerd/deckhouse.toml, but you will have to ensure their functionality on your own. Also, it is better not to change the configuration for the master nodes (nodeGroup master).

Как развернуть кастомный конфигурационный файл containerd?

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-option-config.sh spec: bundles:

‘*’ content: | Copyright 2024 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Пример NodeGroupConfiguration основан на функциях, заложенных в скрипте 032_configure_containerd.sh.

mkdir -p /etc/containerd/conf.d bb-sync-file /etc/containerd/conf.d/additional_option.toml - « EOF oom_score = 500 [metrics] address = “127.0.0.1” grpc_histogram = true EOF nodeGroups:

“worker” weight: 31

Добавление кастомных настроек вызывает перезапуск сервиса containerd.

How to add configuration for an additional registry?

Bashible на узлах объединяет конфигурацию containerd для Deckhouse с конфигурацией из файла /etc/containerd/conf.d/*.toml.

Containerd supports two methods for registry configuration: the old method and the new method. To check for the presence of the old configuration method, run the following commands on the cluster nodes:

Вы можете переопределять значения параметров, которые заданы в файле /etc/containerd/deckhouse.toml, но их работу придётся обеспечивать самостоятельно. Также, лучше изменением конфигурации не затрагивать master-узлы (nodeGroup master).

bash cat /etc/containerd/config.toml | grep ‘plugins.”io.containerd.grpc.v1.cri”.registry.mirrors’ cat /etc/containerd/config.toml | grep ‘plugins.”io.containerd.grpc.v1.cri”.registry.configs’

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-option-config.sh spec: bundles:

‘*’ content: | Copyright 2024 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Example output:

mkdir -p /etc/containerd/conf.d bb-sync-file /etc/containerd/conf.d/additional_option.toml - « EOF oom_score = 500 [metrics] address = “127.0.0.1” grpc_histogram = true EOF nodeGroups:

“worker” weight: 31

[plugins.”io.containerd.grpc.v1.cri”.registry.mirrors]

Как добавить конфигурацию для дополнительного registry?

[plugins.”io.containerd.grpc.v1.cri”.registry.mirrors.”"]

В containerd существует два способа описания конфигурации registry: старый и новый.

[plugins.”io.containerd.grpc.v1.cri”.registry.configs]

Для проверки наличия старого способа конфигурации выполните на узлах кластера следующие команды:

[plugins.”io.containerd.grpc.v1.cri”.registry.configs.”".auth]

To check for the presence of the new configuration method, run the following command on the cluster nodes:

Пример вывода:

bash cat /etc/containerd/config.toml | grep ‘/etc/containerd/registry.d’

[plugins.”io.containerd.grpc.v1.cri”.registry.mirrors]

Example output:

[plugins.”io.containerd.grpc.v1.cri”.registry.mirrors.”"]

config_path = “/etc/containerd/registry.d”

[plugins.”io.containerd.grpc.v1.cri”.registry.configs]

Old Method

[plugins.”io.containerd.grpc.v1.cri”.registry.configs.”".auth]

This containerd configuration format is deprecated.

Для проверки наличия нового способа конфигурации выполните на узлах кластера следующую команду:

The configuration is described in the main containerd configuration file /etc/containerd/config.toml. Adding custom configuration is carried out through the toml merge mechanism. Configuration files from the /etc/containerd/conf.d directory are merged with the main file /etc/containerd/config.toml. The merge takes place during the execution of the 032_configure_containerd.sh script, so the corresponding files must be added in advance. Example configuration file for the /etc/containerd/conf.d/ directory:

bash cat /etc/containerd/config.toml | grep ‘/etc/containerd/registry.d’

toml [plugins] [plugins.”io.containerd.grpc.v1.cri”] [plugins.”io.containerd.grpc.v1.cri”.registry] [plugins.”io.containerd.grpc.v1.cri”.registry.mirrors] [plugins.”io.containerd.grpc.v1.cri”.registry.mirrors.”${REGISTRY_URL}”] endpoint = [“https://${REGISTRY_URL}”] [plugins.”io.containerd.grpc.v1.cri”.registry.configs] [plugins.”io.containerd.grpc.v1.cri”.registry.configs.”${REGISTRY_URL}”.auth] auth = “${BASE_64_AUTH}” username = “${USERNAME}” password = “${PASSWORD}” [plugins.”io.containerd.grpc.v1.cri”.registry.configs.”${REGISTRY_URL}”.tls] ca_file = “${CERT_DIR}/${CERT_NAME}.crt” insecure_skip_verify = true

Пример вывода:

Adding custom settings through the toml merge mechanism causes the containerd service to restart.

config_path = “/etc/containerd/registry.d”

How to add additional registry auth (old method)?

Старый способ

Example of adding authorization to a additional registry when using the old configuration method:

Этот формат конфигурации containerd устарел (deprecated).

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-auth.sh spec: To add a file before the ‘032_configure_containerd.sh’ step weight: 31 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

Конфигурация описывается в основном конфигурационном файле containerd /etc/containerd/config.toml.

mkdir -p /etc/containerd/conf.d bb-sync-file /etc/containerd/conf.d/additional_registry.toml - « EOF [plugins] [plugins.”io.containerd.grpc.v1.cri”] [plugins.”io.containerd.grpc.v1.cri”.registry] [plugins.”io.containerd.grpc.v1.cri”.registry.mirrors] [plugins.”io.containerd.grpc.v1.cri”.registry.mirrors.”${REGISTRY_URL}”] endpoint = [“https://${REGISTRY_URL}”] [plugins.”io.containerd.grpc.v1.cri”.registry.configs] [plugins.”io.containerd.grpc.v1.cri”.registry.configs.”${REGISTRY_URL}”.auth] username = “username” password = “password” OR auth = “dXNlcm5hbWU6cGFzc3dvcmQ=” EOF

Пользовательская конфигурация добавляется через механизм toml merge. Конфигурационные файлы из директории /etc/containerd/conf.d объединяются с основным файлом /etc/containerd/config.toml. Применение merge происходит на этапе выполнения скрипта 032_configure_containerd.sh, поэтому соответствующие файлы должны быть добавлены заранее.

How to configure a certificate for an additional registry (old method)?

Пример конфигурационного файла для директории /etc/containerd/conf.d/:

Example of configuring a certificate for an additional registry when using the old configuration method:

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-tls.sh spec: To add a file before the ‘032_configure_containerd.sh’ step weight: 31 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example CERT_FILE_NAME=${REGISTRY_URL} CERTS_FOLDER=”/var/lib/containerd/certs/”

Добавление кастомных настроек через механизм toml merge вызывает перезапуск сервиса containerd.

mkdir -p ${CERTS_FOLDER} bb-sync-file “${CERTS_FOLDER}/${CERT_FILE_NAME}.crt” - « EOF —–BEGIN CERTIFICATE—– … —–END CERTIFICATE—– EOF

Как добавить авторизацию в дополнительный registry (старый способ)?

Пример добавления авторизации в дополнительный registry при использовании старого способа конфигурации:

In addition to containerd, the certificate can be added into the OS.

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-auth.sh spec: Для добавления файла перед шагом ‘032_configure_containerd.sh’ weight: 31 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

How to add TLS skip verify (old method)?

Example of adding TLS skip verify when using the old configuration method:

Как настроить сертификат для дополнительного registry (старый способ)?

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-skip-tls.sh spec: To add a file before the ‘032_configure_containerd.sh’ step weight: 31 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

Пример настройки сертификата для дополнительного registry при использовании старого способа конфигурации:

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-tls.sh spec: Для добавления файла перед шагом ‘032_configure_containerd.sh’ weight: 31 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example CERT_FILE_NAME=${REGISTRY_URL} CERTS_FOLDER=”/var/lib/containerd/certs/”

After applying the configuration file, verify access to the registry from the nodes using the command:

mkdir -p ${CERTS_FOLDER} bb-sync-file “${CERTS_FOLDER}/${CERT_FILE_NAME}.crt” - « EOF —–BEGIN CERTIFICATE—– … —–END CERTIFICATE—– EOF

bash

Via the CRI interface crictl pull private.registry.example/image/repo:tag

Помимо сontainerd, сертификат можно добавить в операционную систему.

New Method

Как добавить TLS skip verify (старый способ)?

Used in containerd v2.

Пример добавления TLS skip verify при использовании старого способа конфигурации:

The configuration is defined in the /etc/containerd/registry.d directory.
Configuration is specified by creating subdirectories named after the registry address:

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-skip-tls.sh spec: Для добавления файла перед шагом ‘032_configure_containerd.sh’ weight: 31 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

bash /etc/containerd/registry.d ├── private.registry.example:5001 │ ├── ca.crt │ └── hosts.toml └── registry.deckhouse.ru ├── ca.crt └── hosts.toml

Example contents of the hosts.toml file:

После применения конфигурационного файла проверьте доступ к registry с узлов, используя команду:

toml [host] Mirror 1 [host.”https://${REGISTRY_URL_1}”] capabilities = [“pull”, “resolve”] ca = [”${CERT_DIR}/${CERT_NAME}.crt”]

bash

[host.”https://${REGISTRY_URL_1}”.auth] username = “${USERNAME}” password = “${PASSWORD}”

Через cri-интерфейс crictl pull private.registry.example/image/repo:tag

Mirror 2 [host.”http://${REGISTRY_URL_2}”] capabilities = [“pull”, “resolve”] skip_verify = true

Новый способ

Configuration changes do not cause the containerd service to restart.

Используется в containerd v2.

How to add additional registry auth (new method)?

Конфигурация описывается в каталоге /etc/containerd/registry.d и задаётся через создание подкаталогов с именами, соответствующими адресу registry:

Example of adding authorization to a additional registry when using the new configuration method:

bash /etc/containerd/registry.d ├── private.registry.example:5001 │ ├── ca.crt │ └── hosts.toml └── registry.deckhouse.ru ├── ca.crt └── hosts.toml

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-auth.sh spec: The step can be arbitrary, as restarting the containerd service is not required weight: 0 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

Пример содержимого файла hosts.toml:

mkdir -p “/etc/containerd/registry.d/${REGISTRY_URL}” bb-sync-file “/etc/containerd/registry.d/${REGISTRY_URL}/hosts.toml” - « EOF [host] [host.”https://${REGISTRY_URL}”] capabilities = [“pull”, “resolve”] [host.”https://${REGISTRY_URL}”.auth] username = “username” password = “password” EOF

toml [host] Mirror 1 [host.”https://${REGISTRY_URL_1}”] capabilities = [“pull”, “resolve”] ca = [”${CERT_DIR}/${CERT_NAME}.crt”]

How to configure a certificate for an additional registry (new method)?

[host.”https://${REGISTRY_URL_1}”.auth] username = “${USERNAME}” password = “${PASSWORD}”

Example of configuring a certificate for an additional registry when using the new configuration method:

Mirror 2 [host.”http://${REGISTRY_URL_2}”] capabilities = [“pull”, “resolve”] skip_verify = true

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-tls.sh spec: The step can be arbitrary, as restarting the containerd service is not required weight: 0 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

Изменения конфигураций не приводят к перезапуску сервиса containerd.

mkdir -p “/etc/containerd/registry.d/${REGISTRY_URL}”

Как добавить авторизацию в дополнительный registry (новый способ)?

bb-sync-file “/etc/containerd/registry.d/${REGISTRY_URL}/ca.crt” - « EOF —–BEGIN CERTIFICATE—– … —–END CERTIFICATE—– EOF

Пример добавления авторизации в дополнительный registry при использовании нового способа конфигурации:

bb-sync-file “/etc/containerd/registry.d/${REGISTRY_URL}/hosts.toml” - « EOF [host] [host.”https://${REGISTRY_URL}”] capabilities = [“pull”, “resolve”] ca = [“/etc/containerd/registry.d/${REGISTRY_URL}/ca.crt”] EOF

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-auth.sh spec: Шаг может быть любой, т.к. не требуется перезапуск сервиса containerd weight: 0 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

In addition to containerd, the certificate can be added into the OS.

How to add TLS skip verify (new method)?

Как настроить сертификат для дополнительного registry (новый способ)?

Example of adding TLS skip verify when using the new configuration method:

Пример настройки сертификата для дополнительного registry? при использовании нового способа конфигурации:

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-skip-tls.sh spec: The step can be arbitrary, as restarting the containerd service is not required weight: 0 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-tls.sh spec: Шаг может быть любой, тк не требуется перезапуск сервиса containerd weight: 0 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

mkdir -p “/etc/containerd/registry.d/${REGISTRY_URL}”

After applying the configuration file, check access to the registry from the nodes using the following commands:

bb-sync-file “/etc/containerd/registry.d/${REGISTRY_URL}/ca.crt” - « EOF —–BEGIN CERTIFICATE—– … —–END CERTIFICATE—– EOF

bash

Via the CRI interface crictl pull private.registry.example/image/repo:tag

Помимо containerd, сертификат можно добавить в операционную систему.

Via ctr with the configuration directory specified ctr -n k8s.io images pull –hosts-dir=/etc/containerd/registry.d/ private.registry.example/image/repo:tag

Как добавить TLS skip verify (новый способ)?

Via ctr for an HTTP registry ctr -n k8s.io images pull –hosts-dir=/etc/containerd/registry.d/ –plain-http private.registry.example/image/repo:tag

Пример добавления TLS skip verify при использовании нового способа конфигурации:

How to use NodeGroup’s priority feature

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: containerd-additional-config-skip-tls.sh spec: Шаг может быть любой, тк не требуется перезапуск сервиса containerd weight: 0 bundles:

‘*’ nodeGroups:
“*” content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. REGISTRY_URL=private.registry.example

The priority field of the NodeGroup CustomResource allows you to define the order in which nodes will be provisioned in the cluster. For example, cluster-autoscaler can first provision spot-nodes and switch to regular ones when they run out. Or it can provision larger nodes when there are plenty of resources in the cluster and then switch to smaller nodes once cluster resources run out.

Here is an example of creating two NodeGroups using spot-node nodes:

После применения конфигурационного файла проверьте доступ к registry с узлов, используя команды:

yaml

apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker-spot spec: cloudInstances: classReference: kind: AWSInstanceClass name: worker-spot maxPerZone: 5 minPerZone: 0 priority: 50 nodeType: CloudEphemeral — apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: worker spec: cloudInstances: classReference: kind: AWSInstanceClass name: worker maxPerZone: 5 minPerZone: 0 priority: 30 nodeType: CloudEphemeral

bash

In the above example, cluster-autoscaler will first try to provision a spot-node. If it fails to add such a node to the cluster within 15 minutes, the worker-spot NodeGroup will be paused (for 20 minutes), and cluster-autoscaler will start provisioning nodes from the worker NodeGroup. If, after 30 minutes, another node needs to be deployed in the cluster, cluster-autoscaler will first attempt to provision a node from the worker-spot NodeGroup before provisioning one from the worker NodeGroup.

Через cri интерфейс crictl pull private.registry.example/image/repo:tag

Once the worker-spot NodeGroup reaches its maximum (5 nodes in the example above), the nodes will be provisioned from the worker NodeGroup.

Через ctr с указанием директории с конфигурациями ctr -n k8s.io images pull –hosts-dir=/etc/containerd/registry.d/ private.registry.example/image/repo:tag

Note that node templates (labels/taints) for worker and worker-spot NodeGroups must be the same (or at least suitable for the load that triggers the cluster scaling process).

Через ctr для http репозитория ctr -n k8s.io images pull –hosts-dir=/etc/containerd/registry.d/ –plain-http private.registry.example/image/repo:tag

How to interpret Node Group states?

Как использовать NodeGroup с приоритетом?

Ready — the node group contains the minimum required number of scheduled nodes with the status Ready for all zones.

С помощью параметра priority кастомного ресурса NodeGroup можно задавать порядок заказа узлов в кластере. Например, можно сделать так, чтобы сначала заказывались узлы типа spot-node, а если они закончились — обычные узлы. Или чтобы при наличии ресурсов в облаке заказывались узлы большего размера, а при их исчерпании — узлы меньшего размера.

Example 1. A group of nodes in the Ready state:

Пример создания двух NodeGroup с использованием узлов типа spot-node:

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: ng1 spec: nodeType: CloudEphemeral cloudInstances: maxPerZone: 5 minPerZone: 1 status: conditions:

status: “True” type: Ready — apiVersion: v1 kind: Node metadata: name: node1 labels: node.deckhouse.io/group: ng1 status: conditions:
status: “True” type: Ready

yaml

Example 2. A group of nodes in the Not Ready state:

В приведенном выше примере, cluster-autoscaler сначала попытается заказать узел типа _spot-node. Если в течение 15 минут его не получится добавить в кластер, NodeGroup worker-spot будет поставлен на паузу (на 20 минут) и cluster-autoscaler начнет заказывать узлы из NodeGroup worker. Если через 30 минут в кластере возникнет необходимость развернуть еще один узел, cluster-autoscaler сначала попытается заказать узел из NodeGroup worker-spot и только потом — из NodeGroup worker.

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: ng1 spec: nodeType: CloudEphemeral cloudInstances: maxPerZone: 5 minPerZone: 2 status: conditions:

status: “False” type: Ready — apiVersion: v1 kind: Node metadata: name: node1 labels: node.deckhouse.io/group: ng1 status: conditions:
status: “True” type: Ready

После того как NodeGroup worker-spot достигнет своего максимума (5 узлов в примере выше), узлы будут заказываться из NodeGroup worker.

Updating — a node group contains at least one node in which there is an annotation with the prefix update.node.deckhouse.io (for example, update.node.deckhouse.io/waiting-for-approval).

Шаблоны узлов (labels/taints) для NodeGroup worker и worker-spot должны быть одинаковыми, или как минимум подходить для той нагрузки, которая запускает процесс увеличения кластера.

WaitingForDisruptiveApproval - a node group contains at least one node that has an annotation update.node.deckhouse.io/disruption-required and there is no annotation update.node.deckhouse.io/disruption-approved.

Как интерпретировать состояние группы узлов?

Scaling — calculated only for node groups with the type CloudEphemeral. The state True can be in two cases:

Ready — группа узлов содержит минимально необходимое число запланированных узлов с состоянием Ready для всех зон.

When the number of nodes is less than the desired number of nodes in the group, i.e. when it is necessary to increase the number of nodes in the group.
When a node is marked for deletion or the number of nodes is greater than the desired number of nodes, i.e. when it is necessary to reduce the number of nodes in the group.

Пример 1. Группа узлов в состоянии Ready:

The desired number of nodes is the sum of all replicas in the node group.

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: ng1 spec: nodeType: CloudEphemeral cloudInstances: maxPerZone: 5 minPerZone: 1 status: conditions:

status: “True” type: Ready — apiVersion: v1 kind: Node metadata: name: node1 labels: node.deckhouse.io/group: ng1 status: conditions:
status: “True” type: Ready

Example. The desired number of nodes is 2:

Пример 2. Группа узлов в состоянии Not Ready:

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: ng1 spec: nodeType: CloudEphemeral cloudInstances: maxPerZone: 5 minPerZone: 2 status: … desired: 2 …

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: ng1 spec: nodeType: CloudEphemeral cloudInstances: maxPerZone: 5 minPerZone: 2 status: conditions:

status: “False” type: Ready — apiVersion: v1 kind: Node metadata: name: node1 labels: node.deckhouse.io/group: ng1 status: conditions:
status: “True” type: Ready

Error — contains the last error that occurred when creating a node in a node group.

Updating — группа узлов содержит как минимум один узел, в котором присутствует аннотация с префиксом update.node.deckhouse.io (например, update.node.deckhouse.io/waiting-for-approval).

How do I make werf ignore the Ready conditions in a node group?

WaitingForDisruptiveApproval — группа узлов содержит как минимум один узел, в котором присутствует аннотация update.node.deckhouse.io/disruption-required и отсутствует аннотация update.node.deckhouse.io/disruption-approved.

werf checks the Ready status of resources and, if available, waits for the value to become True.

Scaling — рассчитывается только для групп узлов с типом CloudEphemeral. Состояние True может быть в двух случаях:

Creating (updating) a nodeGroup resource in a cluster can take a significant amount of time to create the required number of nodes. When deploying such a resource in a cluster using werf (e.g., as part of a CI/CD process), deployment may terminate when resource readiness timeout is exceeded. To make werf ignore the nodeGroup status, the following nodeGroup annotations must be added:

Когда число узлов меньше желаемого числа узлов в группе, то есть когда нужно увеличить число узлов в группе.
Когда какой-то узел помечается к удалению или число узлов больше желаемого числа узлов, то есть когда нужно уменьшить число узлов в группе.

yaml metadata: annotations: werf.io/fail-mode: IgnoreAndContinueDeployProcess werf.io/track-termination-mode: NonBlocking

Желаемое число узлов — это сумма всех реплик, входящих в группу узлов.

What is an Instance resource?

Пример. Желаемое число узлов равно 2:

An Instance resource contains a description of an implementation-independent ephemeral machine resource. For example, machines created by MachineControllerManager or Cluster API Provider Static will have a corresponding Instance resource.

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: ng1 spec: nodeType: CloudEphemeral cloudInstances: maxPerZone: 5 minPerZone: 2 status: … desired: 2 …

The object does not contain a specification. The status contains:

Error — содержит последнюю ошибку, возникшую при создании узла в группе узлов.

A link to the InstanceClass if it exists for this implementation;
A link to the Kubernetes Node object;
Current machine status;
Information on how to view machine creation logs (at the machine creation stage).

Как заставить werf игнорировать состояние Ready в группе узлов?

When a machine is created/deleted, the Instance object is created/deleted accordingly. You cannot create an Instance resource yourself, but you can delete it. In this case, the machine will be removed from the cluster (the removal process depends on implementation details).

werf проверяет состояние Ready у ресурсов и в случае его наличия дожидается, пока значение станет True.

When is a node reboot required?

Создание (обновление) ресурса nodeGroup в кластере может потребовать значительного времени на развертывание необходимого количества узлов. При развертывании такого ресурса в кластере с помощью werf (например, в рамках процесса CI/CD) развертывание может завершиться по превышении времени ожидания готовности ресурса. Чтобы заставить werf игнорировать состояние nodeGroup, необходимо добавить к nodeGroup следующие аннотации:

Node reboots may be required after configuration changes. For example, after changing certain sysctl settings, specifically when modifying the kernel.yama.ptrace_scope parameter (e.g., using astra-ptrace-lock enable/disable in the Astra Linux distribution).

yaml metadata: annotations: werf.io/fail-mode: IgnoreAndContinueDeployProcess werf.io/track-termination-mode: NonBlocking

How do I work with GPU nodes?

Что такое ресурс Instance?

GPU-node management is available in Enterprise Edition only.

Ресурс Instance в Kubernetes представляет собой описание объекта эфемерной виртуальной машины, но без конкретной реализации. Это абстракция, которая используется для управления машинами, созданными с помощью таких инструментов, как MachineControllerManager или Cluster API Provider Static.

Step-by-step procedure for adding a GPU node to the cluster

Объект не содержит спецификации. Статус содержит:

Starting with Deckhouse 1.71, if a NodeGroup contains the spec.gpu section, the node-manager module automatically:

Ссылку на InstanceClass, если он существует для данной реализации.
Ссылку на объект Node Kubernetes.
Текущий статус машины.
Информацию о том, как проверить логи создания машины (появляется на этапе создания машины).

configures containerd with default_runtime = "nvidia";
applies the required system settings (including fixes for the NVIDIA Container Toolkit);
deploys system components: NFD, GFD, NVIDIA Device Plugin, DCGM Exporter, and, if needed, MIG Manager.

При создании или удалении машины создается или удаляется соответствующий объект Instance. Самостоятельно ресурс Instance создать нельзя, но можно удалить. В таком случае машина будет удалена из кластера (процесс удаления зависит от деталей реализации).

Always specify the desired mode in spec.gpu.sharing (Exclusive, TimeSlicing, or MIG).

Когда требуется перезагрузка узлов?

Manual containerd configuration (via NodeGroupConfiguration, TOML, etc.) is not required and must not be combined with the automatic setup.

Некоторые операции по изменению конфигурации узлов могут потребовать перезагрузки.

To add a GPU node to the cluster, perform the following steps:

Перезагрузка узла может потребоваться при изменении некоторых настроек sysctl, например, при изменении параметра kernel.yama.ptrace_scope (изменяется при использовании команды astra-ptrace-lock enable/disable в Astra Linux).

Create a NodeGroup for GPU nodes.

Как работать с GPU-узлами?

An example with TimeSlicing enabled (partitionCount: 4) and typical taint/label:

Управление GPU-узлами доступно только в Enterprise Edition.

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: gpu spec: nodeType: CloudStatic # or Static/CloudEphemeral — depending on your infrastructure gpu: sharing: TimeSlicing timeSlicing: partitionCount: 4 nodeTemplate: labels: node-role/gpu: “” taints:

key: node-role value: gpu effect: NoSchedule

Порядок действий по добавлению GPU-узла в кластер

If you use custom taint keys, ensure they are allowed in global.modules.placement.customTolerationKeys so workloads can add the corresponding tolerations.

Начиная с Deckhouse 1.71, если в NodeGroup есть секция spec.gpu, модуль node-manager автоматически:

Full field schema: see NodeGroup CR documentation.

настраивает containerd с default_runtime = "nvidia";
применяет необходимые системные параметры (включая фиксы для NVIDIA Container Toolkit);
разворачивает системные компоненты: NFD, GFD, NVIDIA Device Plugin, DCGM Exporter и, при необходимости, MIG Manager.

Install the NVIDIA driver and nvidia-container-toolkit.

Для корректной работы необходимо явно указать режим в spec.gpu.sharing (Exclusive, TimeSlicing или MIG).

Install the NVIDIA driver and NVIDIA Container Toolkit on the nodes—either manually or via a NodeGroupConfiguration. Below are NodeGroupConfiguration examples for the gpu NodeGroup

Ручная конфигурация containerd (через NodeGroupConfiguration, TOML и т.п.) не требуется и не должна комбинироваться с автоматической настройкой.

Ubuntu

Чтобы добавить GPU-узел в кластер, выполните следующие действия:

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-cuda.sh spec: bundles:

ubuntu-lts content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Создайте NodeGroup для GPU-узлов.

if [ ! -f “/etc/apt/sources.list.d/nvidia-container-toolkit.list” ]; then distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/ sources.list.d/nvidia-container-toolkit.list fi bb-apt-install nvidia-container-toolkit nvidia-driver-535-server nvidia-ctk config –set nvidia-container-runtime.log-level=error –in-place nodeGroups:

gpu weight: 30

Пример с включённым TimeSlicing (partitionCount: 4) и типичным taint/label:

CentOS

yaml apiVersion: deckhouse.io/v1 kind: NodeGroup metadata: name: gpu spec: nodeType: CloudStatic # или Static/CloudEphemeral — по вашей инфраструктуре gpu: sharing: TimeSlicing timeSlicing: partitionCount: 4 nodeTemplate: labels: node-role/gpu: “” taints:

key: node-role value: gpu effect: NoSchedule

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-cuda.sh spec: bundles:

centos content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Если вы используете собственные ключи taint, убедитесь, что они разрешены в global.modules.placement. customTolerationKeys, чтобы рабочие нагрузки могли добавлять соответствующие tolerations.

if [ ! -f “/etc/yum.repos.d/nvidia-container-toolkit.repo” ]; then distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo fi bb-dnf-install nvidia-container-toolkit nvidia-driver nvidia-ctk config –set nvidia-container-runtime.log-level=error –in-place nodeGroups:

gpu weight: 30

Полная схема полей находится в описании кастомного ресурса NodeGroup.

After these configurations are applied, perform bootstrap and reboot the nodes so that settings are applied and the drivers get installed.

Установите драйвер NVIDIA и nvidia-container-toolkit.

Verify installation on the node using the command:

Установку драйвера NVIDIA и NVIDIA Container Toolkit выполняйте на самих узлах — вручную или с помощью NodeGroupConfiguration. Ниже приведены примеры NodeGroupConfiguration для группы узлов gpu.

bash nvidia-smi

Ubuntu

Expected healthy output (example):

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-cuda.sh spec: bundles:

ubuntu-lts content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. if [ ! -f “/etc/apt/sources.list.d/nvidia-container-toolkit.list” ]; then distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list fi bb-apt-install nvidia-container-toolkit nvidia-driver-535-server nvidia-ctk config –set nvidia-container-runtime.log-level=error –in-place nodeGroups:
gpu weight: 30

CentOS

Verify infrastructure components in the cluster

yaml apiVersion: deckhouse.io/v1alpha1 kind: NodeGroupConfiguration metadata: name: install-cuda.sh spec: bundles:

centos content: | Copyright 2023 Flant JSC # Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

NVIDIA Pods in d8-nvidia-gpu:

gpu weight: 30

bash kubectl -n d8-nvidia-gpu get pod

После того как конфигурации будут применены, необходимо провести бутстрап и перезагрузить узлы, чтобы применить настройки и установить драйвера.

Expected healthy output (example):

Проверьте установку на узле, используя команду:

console NAME READY STATUS RESTARTS AGE gpu-feature-discovery-80ceb7d-r842q 2/2 Running 0 2m53s nvidia-dcgm-exporter-w9v9h 1/1 Running 0 2m53s nvidia-dcgm-njqqb 1/1 Running 0 2m53s nvidia-device-plugin-80ceb7d-8xt8g 2/2 Running 0 2m53s

bash nvidia-smi

NFD Pods in d8-cloud-instance-manager:

Ожидаемый корректный вывод (пример):

bash kubectl -n d8-cloud-instance-manager get pods | egrep ‘^(NAME|node-feature-discovery)’

Expected healthy output (example):

Проверьте инфраструктурные компоненты в кластере.

console NAME READY STATUS RESTARTS AGE node-feature-discovery-gc-6d845765df-45vpj 1/1 Running 0 3m6s node-feature-discovery-master-74696fd9d5-wkjk4 1/1 Running 0 3m6s node-feature-discovery-worker-5f4kv 1/1 Running 0 3m8s

Поды NVIDIA в d8-nvidia-gpu:

Resource exposure on the node:

bash kubectl -n d8-nvidia-gpu get pod

bash kubectl describe node

Ожидаемый корректный вывод (пример):

Output snippet (example):

console Capacity: cpu: 40 memory: 263566308Ki nvidia.com/gpu: 4 Allocatable: cpu: 39930m memory: 262648294441 nvidia.com/gpu: 4

Поды NFD в d8-cloud-instance-manager:

Run functional tests

bash kubectl -n d8-cloud-instance-manager get pods | egrep ‘^(NAME|node-feature-discovery)’

Option A. Invoke nvidia-smi from inside a container.

Ожидаемый корректный вывод (пример):

Create a Job:

yaml apiVersion: batch/v1 kind: Job metadata: name: nvidia-cuda-test namespace: default spec: completions: 1 template: spec: restartPolicy: Never nodeSelector: node.deckhouse.io/group: gpu containers:

name: nvidia-cuda-test image: nvidia/cuda:11.6.2-base-ubuntu20.04 imagePullPolicy: “IfNotPresent” command:
nvidia-smi

Публикация ресурсов на узле:

Check the logs using the command:

bash kubectl describe node <имя-ноды>

bash kubectl logs job/nvidia-cuda-test

Фрагмент вывода (пример):

Output example:

bash Capacity: cpu: 40 memory: 263566308Ki nvidia.com/gpu: 4 Allocatable: cpu: 39930m memory: 262648294441 nvidia.com/gpu: 4

Запустите функциональные тесты.

Option B. CUDA sample (vectoradd).

Вариант A. Вызов nvidia-smi из контейнера.

Create a Job:

Создайте в кластере задачу (Job):

yaml apiVersion: batch/v1 kind: Job metadata: name: gpu-operator-test namespace: default spec: completions: 1 template: spec: restartPolicy: Never nodeSelector: node.deckhouse.io/group: gpu containers:

name: gpu-operator-test image: nvidia/samples:vectoradd-cuda10.2 imagePullPolicy: “IfNotPresent”

name: nvidia-cuda-test image: nvidia/cuda:11.6.2-base-ubuntu20.04 imagePullPolicy: “IfNotPresent” command:
nvidia-smi

Check the logs using the command::

Проверьте логи командой:

bash kubectl logs job/gpu-operator-test

bash kubectl logs job/nvidia-cuda-test

Output example:

Пример вывода:

console [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done

How to monitor GPUs?

Вариант B. CUDA sample (vectoradd).

Deckhouse Kubernetes Platform automatically deploys DCGM Exporter; GPU metrics are scraped by Prometheus and available in Grafana.

Создайте в кластере Job:

Which GPU modes are supported?

name: gpu-operator-test image: nvidia/samples:vectoradd-cuda10.2 imagePullPolicy: “IfNotPresent”

Exclusive — the node exposes the nvidia.com/gpu resource; each Pod receives an entire GPU.
TimeSlicing — time-sharing a single GPU among multiple Pods (default partitionCount: 4); Pods still request nvidia.com/gpu.
MIG (Multi-Instance GPU) — hardware partitioning of supported GPUs into independent instances; with the all-1g.5gb profile the cluster exposes resources like nvidia.com/mig-1g.5gb.

Проверьте логи командой:

See examples in Examples → GPU nodes.

shell kubectl logs job/gpu-operator-test

How to view available MIG profiles in the cluster?

Пример вывода:

Pre-defined profiles are stored in the mig-parted-config ConfigMap inside the d8-nvidia-gpu namespace and can be viewed with the command:

Как мониторить GPU?

bash kubectl -n d8-nvidia-gpu get cm mig-parted-config -o json | jq -r ‘.data[“config.yaml”]’

Deckhouse Kubernetes Platform автоматически устанавливает DCGM Exporter; метрики GPU попадают в Prometheus и доступны в Grafana.

The mig-configs: section lists the GPU models (by PCI ID) and the MIG profiles each card supports—for example all-1g.5gb, all-2g.10gb, all-balanced. Select the profile that matches your accelerator and set its name in spec.gpu.mig.partedConfig of the NodeGroup.

Какие режимы работы GPU поддерживаются?

MIG profile does not activate — what to check?

Поддерживаются следующие режимы работы GPU:

GPU model: MIG is supported on H100/A100/A30; it is not supported on V100/T4. See the profile tables in the NVIDIA MIG guide.
NodeGroup configuration:

Exclusive — узел публикует ресурс nvidia.com/gpu; каждому поду выделяется целый GPU.
TimeSlicing — временное разделение одного GPU между несколькими подами (по умолчанию partitionCount: 4), при этом под по-прежнему запрашивает nvidia.com/gpu.
MIG (Multi-Instance GPU) — аппаратное разделение совместимых GPU на независимые экземпляры; при профиле all-1g.5gb появятся ресурсы вида nvidia.com/mig-1g.5gb.

yaml gpu: sharing: MIG mig: partedConfig: all-1g.5gb

Примеры см. в разделе Примеры → GPU-узлы.

Wait until nvidia-mig-manager completes the drain of the node and reconfigures the GPU.

Как посмотреть доступные MIG-профили в кластере?

This process can take several minutes.

While it is running, the node is tainted with mig-reconfigure. When the operation succeeds, that taint is removed.

Предустановленные профили находятся в ConfigMap mig-parted-config в пространстве имен d8-nvidia-gpu. Для их просмотра используйте команду:

Track the progress via the nvidia.com/mig.config.state label on the node:

bash kubectl -n d8-nvidia-gpu get cm mig-parted-config -o json | jq -r ‘.data[“config.yaml”]’

pending, rebooting, success (or error if something goes wrong).

В разделе mig-configs вы увидите конкретные модели ускорителей (по PCI-ID) и список совместимых MIG-профилей для каждой из них. Найдите свою видеокарту и выберите подходящий профиль — его имя указывается в spec.gpu.mig.partedConfig вашего NodeGroup. Это позволит применить правильный профиль именно к вашей карте.

If nvidia.com/mig-* resources are still missing, check:

Для GPU не активируется MIG-профиль — что проверить?

bash kubectl -n d8-nvidia-gpu logs daemonset/nvidia-mig-manager nvidia-smi -L

Модель GPU: MIG поддерживают H100/A100/A30, не поддерживает V100/T4. См. таблицы профилей в руководстве NVIDIA MIG.
Конфигурация NodeGroup:

Are AMD or Intel GPUs supported?

yaml gpu: sharing: MIG mig: partedConfig: all-1g.5gb

At this time, Deckhouse Kubernetes Platform automatically configures NVIDIA GPUs only. Support for AMD (ROCm) and Intel GPUs is being worked on and is planned for future releases.

Дождитесь, пока nvidia-mig-manager выполнит drain узла и переконфигурирует GPU.

Этот процесс может занять несколько минут.

Пока операция идёт, на узле стоит taint mig-reconfigure. После успешного окончания taint удаляется.

Ход процесса можно отслеживать по label nvidia.com/mig.config.state на узле:

pending, rebooting, success (или error, если что-то пошло не так).

Если ресурсы nvidia.com/mig-* не появились — проверьте:

bash kubectl -n d8-nvidia-gpu logs daemonset/nvidia-mig-manager nvidia-smi -L

Поддерживаются ли AMD или Intel GPU?

Сейчас Deckhouse Kubernetes Platform автоматически настраивает только NVIDIA GPU. Поддержка AMD (ROCm) и Intel GPU находится в проработке и планируется к добавлению в будущих релизах.

Compare languages | Управление узлами: FAQ

How do I add a master nodes to a cloud cluster (single-master to a multi-master)?

Как добавить master-узлы в облачном кластере?

How do I reduce the number of master nodes in a cloud cluster (multi-master to single-master)?

Как уменьшить число master-узлов в облачном кластере?

Static nodes

Статические узлы

How do I add a static node to a cluster (Cluster API Provider Static)?

Как добавить статический узел в кластер (Cluster API Provider Static)?

How do I add a batch of static nodes to a cluster manually?

Как добавить несколько статических узлов в кластер вручную?

How do I clean up a static node manually?

Как вручную очистить статический узел?

Can I delete a StaticInstance?

Можно ли удалить StaticInstance?

How do I change the IP address of a StaticInstance?

How do I migrate a manually configured static node under CAPS control?

Как изменить IP-адрес StaticInstance?

How do I change the NodeGroup of a static node?

Как мигрировать статический узел настроенный вручную под управление CAPS?

Как изменить NodeGroup у статического узла?

How to clean up a node for adding to the cluster?

Как очистить узел для последующего ввода в кластер?

How do I know if something went wrong?

Как понять, что что-то пошло не так?

How do I know what is running on a node while it is being created?

Как посмотреть, что в данный момент выполняется на узле при его создании?

How do I update kernel on nodes?

Debian-based distros

Как обновить ядро на узлах?

Для дистрибутивов, основанных на Debian

CentOS-based distros

Для дистрибутивов, основанных на CentOS

NodeGroup parameters and their result

Какие параметры NodeGroup к чему приводят?

How do I redeploy ephemeral machines in the cloud with a new configuration?

How do I allocate nodes to specific loads?

Как пересоздать эфемерные машины в облаке с новой конфигурацией?

Как выделить узлы под специфические нагрузки?

How to allocate nodes to system components?

Frontend

System components

Как выделить узлы под системные компоненты?

Фронтенд

How do I speed up node provisioning on the cloud when scaling applications horizontally?

Системные

Как ускорить заказ узлов в облаке при горизонтальном масштабировании приложений?

How do I disable machine-controller-manager/CAPI in the case of potentially cluster-damaging changes?

Как выключить machine-controller-manager/CAPI в случае выполнения потенциально деструктивных изменений в кластере?

How do I restore the master node if kubelet cannot load the control plane components?

containerd

Как восстановить master-узел, если kubelet не может загрузить компоненты control plane?

containerd

How to change CRI for NodeGroup?

Как изменить CRI для NodeGroup?

How to change CRI for the whole cluster?

Как изменить CRI для всего кластера?

How to add node configuration step?

How to automatically put custom labels on the node?

Как добавить шаг для конфигурации узлов?

Как автоматически проставить на узел кастомные лейблы?

How to deploy custom containerd configuration?

Как развернуть кастомный конфигурационный файл containerd?

How to add configuration for an additional registry?

Как добавить конфигурацию для дополнительного registry?

Old Method

How to add additional registry auth (old method)?

Старый способ

How to configure a certificate for an additional registry (old method)?

Как добавить авторизацию в дополнительный registry (старый способ)?

How to add TLS skip verify (old method)?

Как настроить сертификат для дополнительного registry (старый способ)?

New Method

Как добавить TLS skip verify (старый способ)?

Новый способ

How to add additional registry auth (new method)?

How to configure a certificate for an additional registry (new method)?

Как добавить авторизацию в дополнительный registry (новый способ)?

How to add TLS skip verify (new method)?

Как настроить сертификат для дополнительного registry (новый способ)?