Compare languages | The Prometheus monitoring module: FAQ

How do I collect metrics from applications running outside of the cluster?

Как собирать метрики с приложений, расположенных вне кластера?

  1. Configure a Service similar to the one that collects metrics from your application (but do not set the spec.selector parameter).
  2. Create Endpoints for this Service and explicitly specify the IP:PORT pairs that your applications use to expose metrics.
  1. Сконфигурировать Service по аналогии с сервисом для сбора метрик с вашего приложения, но без указания параметра spec.selector.
  2. Создать Endpoints для этого Service, явно указав в них IP:PORT, по которым ваши приложения отдают метрики.

    Важный момент: имена портов в Endpoints должны совпадать с именами этих портов в Service.

Port names in Endpoints must match those in the Service.

Пример

An example

Метрики приложения доступны без TLS, по адресу http://10.182.10.5:9114/metrics.

Application metrics are freely available (no TLS involved) at http://10.182.10.5:9114/metrics.

yaml apiVersion: v1 kind: Service metadata: name: my-app namespace: my-namespace labels: prometheus.deckhouse.io/custom-target: my-app spec: ports:

  • name: http-metrics port: 9114 — apiVersion: v1 kind: Endpoints metadata: name: my-app namespace: my-namespace subsets:
  • addresses:
  • ip: 10.182.10.5 ports:
  • name: http-metrics port: 9114

yaml apiVersion: v1 kind: Service metadata: name: my-app namespace: my-namespace labels: prometheus.deckhouse.io/custom-target: my-app spec: ports:

  • name: http-metrics port: 9114 — apiVersion: v1 kind: Endpoints metadata: name: my-app namespace: my-namespace subsets:
  • addresses:
  • ip: 10.182.10.5 ports:
  • name: http-metrics port: 9114

Как добавить дополнительные dashboard’ы в вашем проекте?

How do I create custom Grafana dashboards?

Добавление пользовательских dashboard’ов для Grafana в Deckhouse реализовано с помощью подхода Infrastructure as a Code. Чтобы ваш dashboard появился в Grafana, необходимо создать в кластере специальный ресурс — GrafanaDashboardDefinition.

Custom Grafana dashboards can be added to the project using the Infrastructure as a Code approach. To add your dashboard to Grafana, create the dedicated GrafanaDashboardDefinition Custom Resource in the cluster.

Пример:

An example:

yaml apiVersion: deckhouse.io/v1 kind: GrafanaDashboardDefinition metadata: name: my-dashboard spec: folder: My folder # Папка, в которой в Grafana будет отображаться ваш dashboard. definition: | { “annotations”: { “list”: [ { “builtIn”: 1, “datasource”: “– Grafana –”, “enable”: true, “hide”: true, “iconColor”: “rgba(0, 211, 255, 1)”, “limit”: 100, …

yaml apiVersion: deckhouse.io/v1 kind: GrafanaDashboardDefinition metadata: name: my-dashboard spec: folder: My folder # The folder where the custom dashboard will be located. definition: | { “annotations”: { “list”: [ { “builtIn”: 1, “datasource”: “– Grafana –”, “enable”: true, “hide”: true, “iconColor”: “rgba(0, 211, 255, 1)”, “limit”: 100, …

Важно! Системные и добавленные через GrafanaDashboardDefinition dashboard’ы нельзя изменить через интерфейс Grafana.

Caution! System dashboards and dashboards added using GrafanaDashboardDefinition cannot be modified via the Grafana interface.

Как добавить алерты и/или recording-правила для вашего проекта?

How do I add alerts and/or recording rules?

Для добавления алертов существует специальный ресурс — CustomPrometheusRules.

The CustomPrometheusRules resource allows you to add alerts.

Параметры:

  • groups — единственный параметр, в котором необходимо описать группы алертов. Структура групп полностью совпадает с аналогичной в prometheus-operator.

Parameters:

  • groups — is the only parameter where you need to define alert groups. The structure of the groups is similar to that of prometheus-operator.

Пример:

An example:

yaml apiVersion: deckhouse.io/v1 kind: CustomPrometheusRules metadata: name: my-rules spec: groups:

  • name: cluster-state-alert.rules rules:
  • alert: CephClusterErrorState annotations: description: Storage cluster is in error state for more than 10m. summary: Storage cluster is in error state plk_markup_format: markdown expr: | ceph_health_status{job=”rook-ceph-mgr”} > 1

yaml apiVersion: deckhouse.io/v1 kind: CustomPrometheusRules metadata: name: my-rules spec: groups:

  • name: cluster-state-alert.rules rules:
  • alert: CephClusterErrorState annotations: description: Storage cluster is in error state for more than 10m. summary: Storage cluster is in error state plk_markup_format: markdown expr: | ceph_health_status{job=”rook-ceph-mgr”} > 1

Как подключить дополнительные data source для Grafana?

How do I provision additional Grafana data sources?

Для подключения дополнительных data source к Grafana существует специальный ресурс — GrafanaAdditionalDatasource.

The GrafanaAdditionalDatasource allows you to provision additional Grafana data sources.

Параметры ресурса подробно описаны в документации к Grafana. Тип ресурса смотрите в документации по конкретному datasource.

A detailed description of the resource parameters is available in the Grafana documentation.

Пример:

See the datasource type in the documentation for the specific datasource.

yaml apiVersion: deckhouse.io/v1 kind: GrafanaAdditionalDatasource metadata: name: another-prometheus spec: type: prometheus access: Proxy url: https://another-prometheus.example.com/prometheus basicAuth: true basicAuthUser: foo jsonData: timeInterval: 30s httpMethod: POST secureJsonData: basicAuthPassword: bar

An example:

Как обеспечить безопасный доступ к метрикам?

yaml apiVersion: deckhouse.io/v1 kind: GrafanaAdditionalDatasource metadata: name: another-prometheus spec: type: prometheus access: Proxy url: https://another-prometheus.example.com/prometheus basicAuth: true basicAuthUser: foo jsonData: timeInterval: 30s httpMethod: POST secureJsonData: basicAuthPassword: bar

Для обеспечения безопасности настоятельно рекомендуем использовать kube-rbac-proxy.

How do I enable secure access to metrics?

Пример безопасного сбора метрик с приложения, расположенного в кластере

To enable secure access to metrics, we strongly recommend using kube-rbac-proxy.

Для настройки защиты метрик приложения с использованием kube-rbac-proxy и последующей сборки метрик с него средствами Prometheus выполните следующие шаги:

An example of collecting metrics securely from an application inside a cluster

  1. Создайте ServiceAccount с указанными ниже правами:

Do the following to set up application metrics protection via the kube-rbac-proxy with the subsequent metrics scraping using Prometheus tools:

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: rbac-proxy-test — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8:rbac-proxy subjects:

  • kind: ServiceAccount name: rbac-proxy-test namespace: default
  1. Create a new ServiceAccount with the following permissions:

Обратите внимание, что используется встроенная в Deckhouse ClusterRole d8:rbac-proxy.

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: rbac-proxy-test — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8:rbac-proxy subjects:

  • kind: ServiceAccount name: rbac-proxy-test namespace: default
  1. Создайте конфигурацию для kube-rbac-proxy:

The example uses the d8:rbac-proxy built-in Deckhouse ClusterRole.

yaml

apiVersion: v1 kind: ConfigMap metadata: name: rbac-proxy-config-test namespace: rbac-proxy-test data: config-file.yaml: |+ authorization: resourceAttributes: namespace: default apiVersion: v1 resource: services subresource: proxy name: rbac-proxy-test

  1. Create a configuration for the kube-rbac-proxy:

Более подробную информацию по атрибутам можно найти в документации Kubernetes.

yaml

apiVersion: v1 kind: ConfigMap metadata: name: rbac-proxy-config-test namespace: rbac-proxy-test data: config-file.yaml: |+ authorization: resourceAttributes: namespace: default apiVersion: v1 resource: services subresource: proxy name: rbac-proxy-test

  1. Создайте Service и Deployment для вашего приложения, где kube-rbac-proxy займет позицию sidecar-контейнера:

Get more information on authorization attributes in the Kubernetes documentation.

yaml

apiVersion: v1 kind: Service metadata: name: rbac-proxy-test labels: prometheus.deckhouse.io/custom-target: rbac-proxy-test spec: ports:

  • name: https-metrics port: 8443 targetPort: https-metrics selector: app: rbac-proxy-test — apiVersion: apps/v1 kind: Deployment metadata: name: rbac-proxy-test spec: replicas: 1 selector: matchLabels: app: rbac-proxy-test template: metadata: labels: app: rbac-proxy-test spec: securityContext: runAsUser: 65532 serviceAccountName: rbac-proxy-test containers:
  • name: kube-rbac-proxy image: quay.io/brancz/kube-rbac-proxy:v0.14.0 args:
  • ”–secure-listen-address=0.0.0.0:8443”
  • ”–upstream=http://127.0.0.1:8081/”
  • ”–config-file=/kube-rbac-proxy/config-file.yaml”
  • ”–logtostderr=true”
  • ”–v=10” ports:
  • containerPort: 8443 name: https-metrics volumeMounts:
  • name: config mountPath: /kube-rbac-proxy
  • name: prometheus-example-app image: quay.io/brancz/prometheus-example-app:v0.1.0 args:
  • ”–bind=127.0.0.1:8081” volumes:
  • name: config configMap: name: rbac-proxy-config-test
  1. Create Service and Deployment for your application with the kube-rbac-proxy as a sidecar container:
  1. Назначьте необходимые права на ресурс для Prometheus:

yaml

apiVersion: v1 kind: Service metadata: name: rbac-proxy-test labels: prometheus.deckhouse.io/custom-target: rbac-proxy-test spec: ports:

  • name: https-metrics port: 8443 targetPort: https-metrics selector: app: rbac-proxy-test — apiVersion: apps/v1 kind: Deployment metadata: name: rbac-proxy-test spec: replicas: 1 selector: matchLabels: app: rbac-proxy-test template: metadata: labels: app: rbac-proxy-test spec: securityContext: runAsUser: 65532 serviceAccountName: rbac-proxy-test containers:
  • name: kube-rbac-proxy image: quay.io/brancz/kube-rbac-proxy:v0.14.0 args:
  • ”–secure-listen-address=0.0.0.0:8443”
  • ”–upstream=http://127.0.0.1:8081/”
  • ”–config-file=/kube-rbac-proxy/config-file.yaml”
  • ”–logtostderr=true”
  • ”–v=10” ports:
  • containerPort: 8443 name: https-metrics volumeMounts:
  • name: config mountPath: /kube-rbac-proxy
  • name: prometheus-example-app image: quay.io/brancz/prometheus-example-app:v0.1.0 args:
  • ”–bind=127.0.0.1:8081” volumes:
  • name: config configMap: name: rbac-proxy-config-test

yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: rbac-proxy-test-client rules:

  • apiGroups: [””] resources: [“services/proxy”] verbs: [“get”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test-client roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rbac-proxy-test-client subjects:
  • kind: ServiceAccount name: prometheus namespace: d8-monitoring
  1. Add the necessary resource permissions to Prometheus:

После шага 4 метрики вашего приложения должны появиться в Prometheus.

yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: rbac-proxy-test-client rules:

  • apiGroups: [””] resources: [“services/proxy”] verbs: [“get”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: rbac-proxy-test-client roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: rbac-proxy-test-client subjects:
  • kind: ServiceAccount name: prometheus namespace: d8-monitoring

Пример безопасного сбора метрик с приложения, расположенного вне кластера

After step 4, your application’s metrics should become available in Prometheus.

Предположим, что есть доступный через интернет сервер, на котором работает node-exporter. По умолчанию node-exporter слушает на порту 9100 и доступен на всех интерфейсах. Необходимо обеспечить контроль доступа к node-exporter для безопасного сбора метрик. Ниже приведен пример такой настройки.

An example of collecting metrics securely from an application outside a cluster

Требования:

  • Из кластера должен быть доступ до сервиса kube-rbac-proxy, запущенного на удаленном сервере.
  • От удаленного сервера должен быть доступ до API-сервера кластера.

Suppose there is a server exposed to the Internet on which the node-exporter is running. By default, the node-exporter listens on port 9100 and is available on all interfaces. One needs to ensure access control to the node-exporter so that metrics can be collected securely. Below is an example of how you can set this up.

Выполните следующие шаги:

  1. Создайте ServiceAccount с указанными ниже правами:

Requirements:

  • There must be network access from the cluster to the kube-rbac-proxy service running on the remote server.
  • The remote server must have access to the Kubernetes API server.

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-external-endpoint rules:

  • apiGroups: [“authentication.k8s.io”] resources:
  • tokenreviews verbs: [“create”]
  • apiGroups: [“authorization.k8s.io”] resources:
  • subjectaccessreviews verbs: [“create”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-external-endpoint-server-01 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-external-endpoint subjects:
  • kind: ServiceAccount name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts

Follow these steps:

  1. Create a new ServiceAccount with the following permissions:
  1. Сгенерируйте kubeconfig для созданного ServiceAccount (пример генерации kubeconfig для ServiceAccount).

yaml

apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-external-endpoint rules:

  • apiGroups: [“authentication.k8s.io”] resources:
  • tokenreviews verbs: [“create”]
  • apiGroups: [“authorization.k8s.io”] resources:
  • subjectaccessreviews verbs: [“create”] — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-external-endpoint-server-01 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-external-endpoint subjects:
  • kind: ServiceAccount name: prometheus-external-endpoint-server-01 namespace: d8-service-accounts
  1. Положите получившийся kubeconfig на удаленный сервер. В дальнейшем понадобится указать путь к этому kubeconfig в настройках kube-rbac-proxy (в примере используется путь ${PWD}/.kube/config).
  1. Generate a kubeconfig file for the created ServiceAccount (refer to the example on how to generate kubeconfig for ServiceAccount).
  1. Настройте node-exporter на удаленном сервере, чтобы он был доступен только на локальном интерфейсе (слушал 127.0.0.1:9100).
  2. Запустите kube-rbac-proxy на удаленном сервере:
  1. Copy the kubeconfig file to the remote server. You will also have to specify the kubeconfig path in the kube-rbac-proxy settings (our example uses ${PWD}/.kube/config).

shell docker run –network host -d -v ${PWD}/.kube/config:/config quay.io/brancz/kube-rbac-proxy:v0.14.0 –secure-listen-address=0.0.0.0:8443
–upstream=http://127.0.0.1:9100 –kubeconfig=/config –logtostderr=true –v=10

  1. Configure node-exporter on the remote server to be accessible only on the local interface (i.e., listening on 127.0.0.1:9100).
  2. Run kube-rbac-proxy on the remote server:
  1. Проверьте, что порт 8443 доступен по внешнему адресу удаленного сервера.

shell docker run –network host -d -v ${PWD}/.kube/config:/config quay.io/brancz/kube-rbac-proxy:v0.14.0 –secure-listen-address=0.0.0.0:8443
–upstream=http://127.0.0.1:9100 –kubeconfig=/config –logtostderr=true –v=10

  1. Создайте в кластере Service и Endpoint, указав в качестве <server_ip_address> внешний адрес удаленного сервера:
  1. Check that port 8443 is accessible at the remote server’s external address.

yaml

apiVersion: v1 kind: Service metadata: name: prometheus-external-endpoint-server-01 labels: prometheus.deckhouse.io/custom-target: prometheus-external-endpoint-server-01 spec: ports:

  • name: https-metrics port: 8443 — apiVersion: v1 kind: Endpoints metadata: name: prometheus-external-endpoint-server-01 subsets:
  • addresses:
  • ip: ports:
  • name: https-metrics port: 8443
  1. Create Service and Endpoint, specifying the external address of the remote server as <server_ip_address>:

Как добавить Alertmanager?

yaml

apiVersion: v1 kind: Service metadata: name: prometheus-external-endpoint-server-01 labels: prometheus.deckhouse.io/custom-target: prometheus-external-endpoint-server-01 spec: ports:

  • name: https-metrics port: 8443 — apiVersion: v1 kind: Endpoints metadata: name: prometheus-external-endpoint-server-01 subsets:
  • addresses:
  • ip: ports:
  • name: https-metrics port: 8443

Создайте custom resource CustomAlertmanager с типом Internal.

How do I add Alertmanager?

Пример:

Create a custom resource CustomAlertmanager with type Internal.

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: webhook spec: type: Internal internal: route: groupBy: [‘job’] groupWait: 30s groupInterval: 5m repeatInterval: 12h receiver: ‘webhook’ receivers:

  • name: ‘webhook’ webhookConfigs:
  • url: ‘http://webhookserver:8080/’

Example:

Подробно о всех параметрах можно прочитать в описании custom resource CustomAlertmanager.

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: webhook spec: type: Internal internal: route: groupBy: [‘job’] groupWait: 30s groupInterval: 5m repeatInterval: 12h receiver: ‘webhook’ receivers:

  • name: ‘webhook’ webhookConfigs:
  • url: ‘http://webhookserver:8080/’

Как добавить внешний дополнительный Alertmanager?

Refer to the description of the CustomAlertmanager custom resource for more information about the parameters.

Создайте custom resource CustomAlertmanager с типом External, который может указывать на Alertmanager по FQDN или через сервис в Kubernetes-кластере.

How do I add an additional Alertmanager?

Пример FQDN Alertmanager:

Create a custom resource CustomAlertmanager with the type External, it can point to Alertmanager through the FQDN or Kubernetes service.

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-fqdn-alertmanager spec: external: address: https://alertmanager.mycompany.com/myprefix type: External

FQDN Alertmanager example:

Пример Alertmanager с Kubernetes service:

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-fqdn-alertmanager spec: external: address: https://alertmanager.mycompany.com/myprefix type: External

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-service-alertmanager spec: external: service: namespace: myns name: my-alertmanager path: /myprefix/ type: External

Alertmanager with a Kubernetes service:

Подробно о всех параметрах можно прочитать в описании custom resource CustomAlertmanager.

yaml apiVersion: deckhouse.io/v1alpha1 kind: CustomAlertmanager metadata: name: my-service-alertmanager spec: external: service: namespace: myns name: my-alertmanager path: /myprefix/ type: External

Как в Alertmanager игнорировать лишние алерты?

Refer to the description of the CustomAlertmanager Custom Resource for more information about the parameters.

Решение сводится к настройке маршрутизации алертов в вашем Alertmanager.

How do I ignore unnecessary alerts in Alertmanager?

Потребуется:

The solution comes down to configuring alert routing in the Alertmanager.

  1. Завести получателя без параметров.
  2. Смаршрутизировать лишние алерты в этого получателя.

You will need to:

Ниже приведены примеры настройки CustomAlertmanager.

  1. Create a parameterless receiver.
  2. Route unwanted alerts to this receiver.

Чтобы получать только алерты с лейблами service: foo|bar|baz:

Below are samples for configuring CustomAlertmanager.

yaml receivers: Получатель, определенный без параметров, будет работать как “/dev/null”.

  • name: blackhole Действующий получатель
  • name: some-other-receiver … route: receiver по умолчанию. receiver: blackhole routes: Дочерний маршрут
  • matchers:
  • matchType: =~ name: service value: ^(foo|bar|baz)$ receiver: some-other-receiver

Receive all alerts with labels service: foo|bar|baz:

Чтобы получать все алерты, кроме DeadMansSwitch:

yaml receivers: The parameterless receiver is similar to “/dev/null”.

  • name: blackhole Your valid receiver.
  • name: some-other-receiver … route: Default receiver. receiver: blackhole routes: Child receiver.
  • matchers:
  • matchType: =~ name: service value: ^(foo|bar|baz)$ receiver: some-other-receiver

yaml receivers: Получатель, определенный без параметров, будет работать как “/dev/null”.

  • name: blackhole Действующий получатель.
  • name: some-other-receiver … route: receiver по умолчанию. receiver: some-other-receiver routes: Дочерний маршрут.
  • matchers:
  • matchType: = name: alertname value: DeadMansSwitch receiver: blackhole

Receive all alerts except for DeadMansSwitch:

С подробным описанием всех параметров можно ознакомиться в официальной документации.

yaml receivers: The parameterless receiver is similar to “/dev/null”.

  • name: blackhole Your valid receiver.
  • name: some-other-receiver … route: default receiver receiver: some-other-receiver routes: Child receiver.
  • matchers:
  • matchType: = name: alertname value: DeadMansSwitch receiver: blackhole

Почему нельзя установить разный scrapeInterval для отдельных таргетов?

A detailed description of all parameters can be found in the official documentation.

Наиболее полный ответ на этот вопрос дает разработчик Prometheus Brian Brazil. Если коротко, разные scrapeInterval’ы принесут следующие проблемы:

  • увеличение сложности конфигурации;
  • проблемы при написании запросов и создании графиков;
  • короткие интервалы больше похожи на профилирование приложения, и, скорее всего, Prometheus — не самый подходящий инструмент для этого.

Why can’t different scrape Intervals be set for individual targets?

Наиболее разумное значение для scrapeInterval находится в диапазоне 10–60 секунд.

The Prometheus developer Brian Brazil provides, probably, the most comprehensive answer to this question. In short, different scrapeIntervals are likely to cause the following complications:

  • Increasing configuration complexity;
  • Problems with writing queries and creating graphs;
  • Short intervals are more like profiling an app, and Prometheus isn’t the best tool to do this in most cases.

Как ограничить потребление ресурсов Prometheus?

The most appropriate value for scrapeInterval is in the range of 10-60s.

Чтобы избежать ситуаций, когда VPA запрашивает для Prometheus или Longterm Prometheus ресурсов больше, чем есть на выделенном для этого узле, можно явно ограничить VPA с помощью параметров модуля:

  • vpa.longtermMaxCPU;
  • vpa.longtermMaxMemory;
  • vpa.maxCPU;
  • vpa.maxMemory.

How do I limit Prometheus resource consumption?

Как настроить ServiceMonitor или PodMonitor для работы с Prometheus?

To avoid situations when VPA requests more resources for Prometheus or Longterm Prometheus than those available on the corresponding node, you can explicitly limit VPA using module parameters:

  • vpa.longtermMaxCPU
  • vpa.longtermMaxMemory
  • vpa.maxCPU
  • vpa.maxMemory

Добавьте лейбл prometheus: main к Pod/Service Monitor. Добавьте в namespace, в котором находится Pod/Service Monitor, лейбл prometheus.deckhouse.io/monitor-watcher-enabled: "true".

How do I set up a ServiceMonitor or PodMonitor to work with Prometheus?

Пример:

Add the prometheus: main label to the PodMonitor or ServiceMonitor. Add the label prometheus.deckhouse.io/monitor-watcher-enabled: "true" to the namespace where the PodMonitor or ServiceMonitor was created.

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/monitor-watcher-enabled: “true” — apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app namespace: frontend labels: prometheus: main spec: selector: matchLabels: app: example-app endpoints:

  • port: web

Example:

Как настроить PrometheusRules для работы с Prometheus?

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/monitor-watcher-enabled: “true” — apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app namespace: frontend labels: prometheus: main spec: selector: matchLabels: app: example-app endpoints:

  • port: web

Добавьте в namespace, в котором находятся PrometheusRules, лейбл prometheus.deckhouse.io/rules-watcher-enabled: "true".

How do I set up a PrometheusRules to work with Prometheus?

Пример:

Add the label prometheus.deckhouse.io/rules-watcher-enabled: "true" to the namespace where the PrometheusRules was created.

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/rules-watcher-enabled: “true”

Example:

Как увеличить размер диска

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/rules-watcher-enabled: “true”

  1. Для увеличения размера отредактируйте PersistentVolumeClaim, указав новый размер в поле spec.resources.requests.storage.
    • Увеличение размера возможно, если в StorageClass поле allowVolumeExpansion установлено в true.
  2. Если используемое хранилище не поддерживает изменение диска на лету, в статусе PersistentVolumeClaim появится сообщение Waiting for user to (re-)start a pod to finish file system resize of volume on node..
  3. Перезапустите под для завершения изменения размера файловой системы.

How to expand disk size

Как получить информацию об алертах в кластере?

  1. To request a larger volume for a PVC, edit the PVC object and specify a larger size in spec.resources.requests.storage field.
    • You can only expand a PVC if its storage class’s allowVolumeExpansion field is set to true.
  2. If storage doesn’t support online resize, the message Waiting for user to (re-)start a pod to finish file system resize of volume on node. will appear in the PersistentVolumeClaim status.
  3. Restart the Pod to complete the file system resizing.

Информацию об активных алертах можно получить не только в веб-интерфейсе Grafana/Prometheus, но и в CLI. Это может быть полезным, если у вас есть только доступ к API-серверу кластера и нет возможности открыть веб-интерфейс Grafana/Prometheus.

How to get information about alerts in a cluster?

Выполните следующую команду для получения списка алертов в кластере:

You can get information about active alerts not only in the Grafana/Prometheus web interface but in the CLI. This can be useful if you only have access to the cluster API server and there is no way to open the Grafana/Prometheus web interface.

shell kubectl get clusteralerts

Run the following command to get cluster alerts:

Пример:

shell kubectl get clusteralerts

shell

Example:

kubectl get clusteralerts NAME ALERT SEVERITY AGE LAST RECEIVED STATUS 086551aeee5b5b24 ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 226d35c886464d6e ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 235d4efba7df6af4 D8SnapshotControllerPodIsNotReady 8 5d4h 44s firing 27464763f0aa857c D8PrometheusOperatorPodIsNotReady 7 5d4h 43s firing ab17837fffa5e440 DeadMansSwitch 4 5d4h 41s firing

shell

Выполните следующую команду для просмотра конкретного алерта:

kubectl get clusteralerts NAME ALERT SEVERITY AGE LAST RECEIVED STATUS 086551aeee5b5b24 ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 226d35c886464d6e ExtendedMonitoringDeprecatatedAnnotation 4 3h25m 38s firing 235d4efba7df6af4 D8SnapshotControllerPodIsNotReady 8 5d4h 44s firing 27464763f0aa857c D8PrometheusOperatorPodIsNotReady 7 5d4h 43s firing ab17837fffa5e440 DeadMansSwitch 4 5d4h 41s firing

shell kubectl get clusteralerts -o yaml

Run the following command to view a specific alert:

Пример:

shell kubectl get clusteralerts -o yaml

shell

Example:

kubectl get clusteralerts 235d4efba7df6af4 -o yaml alert: description: | The recommended course of action:

  1. Retrieve details of the Deployment: kubectl -n d8-snapshot-controller describe deploy snapshot-controller
  2. View the status of the Pod and try to figure out why it is not running: kubectl -n d8-snapshot-controller describe pod -l app=snapshot-controller labels: pod: snapshot-controller-75bd776d76-xhb2c prometheus: deckhouse tier: cluster name: D8SnapshotControllerPodIsNotReady severityLevel: “8” summary: The snapshot-controller Pod is NOT Ready. apiVersion: deckhouse.io/v1alpha1 kind: ClusterAlert metadata: creationTimestamp: “2023-05-15T14:24:08Z” generation: 1 labels: app: prometheus heritage: deckhouse name: 235d4efba7df6af4 resourceVersion: “36262598” uid: 817f83e4-d01a-4572-8659-0c0a7b6ca9e7 status: alertStatus: firing lastUpdateTime: “2023-05-15T18:10:09Z” startsAt: “2023-05-10T13:43:09Z”

shell

Помните о специальном алерте DeadMansSwitch — его присутствие в кластере говорит о работоспособности Prometheus.

kubectl get clusteralerts 235d4efba7df6af4 -o yaml alert: description: | The recommended course of action:

  1. Retrieve details of the Deployment: kubectl -n d8-snapshot-controller describe deploy snapshot-controller
  2. View the status of the Pod and try to figure out why it is not running: kubectl -n d8-snapshot-controller describe pod -l app=snapshot-controller labels: pod: snapshot-controller-75bd776d76-xhb2c prometheus: deckhouse tier: cluster name: D8SnapshotControllerPodIsNotReady severityLevel: “8” summary: The snapshot-controller Pod is NOT Ready. apiVersion: deckhouse.io/v1alpha1 kind: ClusterAlert metadata: creationTimestamp: “2023-05-15T14:24:08Z” generation: 1 labels: app: prometheus heritage: deckhouse name: 235d4efba7df6af4 resourceVersion: “36262598” uid: 817f83e4-d01a-4572-8659-0c0a7b6ca9e7 status: alertStatus: firing lastUpdateTime: “2023-05-15T18:10:09Z” startsAt: “2023-05-10T13:43:09Z”

Как добавить дополнительные эндпоинты в scrape config?

Remember the special alert DeadMansSwitch — its presence in the cluster indicates that Prometheus is working.

Добавьте в namespace, в котором находится ScrapeConfig, лейбл prometheus.deckhouse.io/scrape-configs-watcher-enabled: "true".

How do I add additional endpoints to a scrape config?

Пример:

Add the label prometheus.deckhouse.io/scrape-configs-watcher-enabled: "true" to the namespace where the ScrapeConfig was created.

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/scrape-configs-watcher-enabled: “true”

Example:

Добавьте ScrapeConfig, который имеет обязательный лейбл prometheus: main:

yaml

apiVersion: v1 kind: Namespace metadata: name: frontend labels: prometheus.deckhouse.io/scrape-configs-watcher-enabled: “true”

yaml apiVersion: monitoring.coreos.com/v1alpha1 kind: ScrapeConfig metadata: name: example-scrape-config namespace: frontend labels: prometheus: main spec: honorLabels: true staticConfigs:

  • targets: [‘example-app.frontend.svc.{{ .Values.global.discovery.clusterDomain }}.:8080’] relabelings:
  • regex: endpoint|namespace|pod|service action: labeldrop
  • targetLabel: scrape_endpoint replacement: main
  • targetLabel: job replacement: kube-state-metrics metricsPath: ‘/metrics’

Add the ScrapeConfig with the required label prometheus: main:

yaml apiVersion: monitoring.coreos.com/v1alpha1 kind: ScrapeConfig metadata: name: example-scrape-config namespace: frontend labels: prometheus: main spec: honorLabels: true staticConfigs:

  • targets: [‘example-app.frontend.svc.{{ .Values.global.discovery.clusterDomain }}.:8080’] relabelings:
  • regex: endpoint|namespace|pod|service action: labeldrop
  • targetLabel: scrape_endpoint replacement: main
  • targetLabel: job replacement: kube-state-metrics metricsPath: ‘/metrics’