From 06434464bbc87c684f4c3f8999b8cb0e459061e4 Mon Sep 17 00:00:00 2001 From: Stefan Prodan Date: Mon, 19 Oct 2020 23:32:01 +0300 Subject: [PATCH] Add monitoring guide Signed-off-by: Stefan Prodan --- docs/guides/installation.md | 53 ++------------------- docs/guides/monitoring.md | 95 +++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 99 insertions(+), 50 deletions(-) create mode 100644 docs/guides/monitoring.md diff --git a/docs/guides/installation.md b/docs/guides/installation.md index 19ea708f..8c858551 100644 --- a/docs/guides/installation.md +++ b/docs/guides/installation.md @@ -375,57 +375,10 @@ gotk create helmrelease sealed-secrets \ ## Monitoring with Prometheus and Grafana -The GitOps Toolkit comes with a monitoring stack composed of: +The GitOps Toolkit comes with a monitoring stack composed of Prometheus and Grafana. The controllers expose +metrics that can be used to track the readiness of the cluster reconciliation process. -* **Prometheus** server - collects metrics from the toolkit controllers and stores them for 2h -* **Grafana** dashboards - displays the control plane resource usage and reconciliation stats - -To install the monitoring stack with `gotk`, first register the toolkit Git repository on your cluster: - -```sh -gotk create source git monitoring \ - --interval=30m \ - --url=https://github.com/fluxcd/toolkit \ - --branch=main -``` - -Then apply the [manifests/monitoring](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring) -kustomization: - -```sh -gotk create kustomization monitoring \ - --interval=1h \ - --prune=true \ - --source=monitoring \ - --path="./manifests/monitoring" \ - --health-check="Deployment/prometheus.gotk-system" \ - --health-check="Deployment/grafana.gotk-system" -``` - -You can access Grafana using port forwarding: - -```sh -kubectl -n gotk-system port-forward svc/grafana 3000:3000 -``` - -Navigate to [http://localhost:3000/d/gitops-toolkit-control-plane](http://localhost:3000/d/gitops-toolkit-control-plane/gitops-toolkit-control-plane) -for the control plane dashboard: - -![](../_files/cp-dashboard-p1.png) - -![](../_files/cp-dashboard-p2.png) - -Navigate to [http://localhost:3000/d/gitops-toolkit-cluster](http://localhost:3000/d/gitops-toolkit-cluster/gitops-toolkit-cluster-stats) -for the cluster reconciliation stats dashboard: - -![](../_files/cluster-dashboard.png) - -If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from -[GitHub](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring/grafana/dashboards). - -!!! hint - Note that the toolkit controllers expose the `/metrics` endpoint on port `8080`. - When using Prometheus Operator you should create `PodMonitor` objects to configure scraping. +To install the monitoring stack please follow this [guide](monitoring.md). ## Uninstall diff --git a/docs/guides/monitoring.md b/docs/guides/monitoring.md new file mode 100644 index 00000000..46964f26 --- /dev/null +++ b/docs/guides/monitoring.md @@ -0,0 +1,95 @@ +# Monitoring + +This guide walks you through configuring monitoring for the GitOps Toolkit control plane. + +The toolkit comes with a monitoring stack composed of: + +* **Prometheus** server - collects metrics from the toolkit controllers and stores them for 2h +* **Grafana** dashboards - displays the control plane resource usage and reconciliation stats + +## Install the monitoring stack + +To install the monitoring stack with `gotk`, first register the toolkit Git repository on your cluster: + +```sh +gotk create source git monitoring \ + --interval=30m \ + --url=https://github.com/fluxcd/toolkit \ + --branch=main +``` + +Then apply the [manifests/monitoring](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring) +kustomization: + +```sh +gotk create kustomization monitoring \ + --interval=1h \ + --prune=true \ + --source=monitoring \ + --path="./manifests/monitoring" \ + --health-check="Deployment/prometheus.gotk-system" \ + --health-check="Deployment/grafana.gotk-system" +``` + +You can access Grafana using port forwarding: + +```sh +kubectl -n gotk-system port-forward svc/grafana 3000:3000 +``` + +## Grafana dashboards + +Control plane dashboard [http://localhost:3000/d/gitops-toolkit-control-plane](http://localhost:3000/d/gitops-toolkit-control-plane/gitops-toolkit-control-plane): + +![](../_files/cp-dashboard-p1.png) + +![](../_files/cp-dashboard-p2.png) + +Cluster reconciliation dashboard [http://localhost:3000/d/gitops-toolkit-cluster](http://localhost:3000/d/gitops-toolkit-cluster/gitops-toolkit-cluster-stats): + +![](../_files/cluster-dashboard.png) + +If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from +[GitHub](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring/grafana/dashboards). + +!!! hint + Note that the toolkit controllers expose the `/metrics` endpoint on port `8080`. + When using Prometheus Operator you should create `PodMonitor` objects to configure scraping. + +## Metrics + +For each `toolkit.fluxcd.io` kind, +the controllers expose a gauge metric to track the Ready condition status, +and a histogram with the reconciliation duration in seconds. + +Ready status metrics: + +```sh +gotk_reconcile_condition{kind, name, namespace, type="Ready", status="True"} +gotk_reconcile_condition{kind, name, namespace, type="Ready", status="False"} +gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Unkown"} +gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Deleted"} +``` + +Time spent reconciling: + +``` +gotk_reconcile_duration_bucket{kind, name, namespace, le} +gotk_reconcile_duration_sum{kind, name, namespace} +gotk_reconcile_duration_count{kind, name, namespace} +``` + +Alert manager example: + +```yaml +groups: +- name: GitOpsToolkit + rules: + - alert: ReconciliationFailure + expr: gotk_reconcile_condition{type="Ready",status="False"} == 1 + for: 10m + labels: + severity: page + annotations: + summary: '{{ $labels.kind }} {{ $labels.namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.' +``` diff --git a/mkdocs.yml b/mkdocs.yml index 88ca670b..2d17f64b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -49,6 +49,7 @@ nav: - Manage Helm Releases: guides/helmreleases.md - Setup Notifications: guides/notifications.md - Setup Webhook Receivers: guides/webhook-receivers.md + - Monitoring with Prometheus: guides/monitoring.md - Sealed Secrets: guides/sealed-secrets.md - Mozilla SOPS: guides/mozilla-sops.md - Toolkit Components: