# Monitoring This guide walks you through configuring monitoring for the Flux control plane. Flux comes with a monitoring stack composed of: * **Prometheus** server - collects metrics from the toolkit controllers and stores them for 2h * **Grafana** dashboards - displays the control plane resource usage and reconciliation stats ## Install the monitoring stack To install the monitoring stack with `flux`, first register the toolkit Git repository on your cluster: ```sh flux create source git monitoring \ --interval=30m \ --url=https://github.com/fluxcd/flux2 \ --branch=main ``` Then apply the [manifests/monitoring](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring) kustomization: ```sh flux create kustomization monitoring \ --interval=1h \ --prune=true \ --source=monitoring \ --path="./manifests/monitoring" \ --health-check="Deployment/prometheus.flux-system" \ --health-check="Deployment/grafana.flux-system" ``` You can access Grafana using port forwarding: ```sh kubectl -n flux-system port-forward svc/grafana 3000:3000 ``` ## Grafana dashboards Control plane dashboard [http://localhost:3000/d/gitops-toolkit-control-plane](http://localhost:3000/d/gitops-toolkit-control-plane/gitops-toolkit-control-plane): ![](../_files/cp-dashboard-p1.png) ![](../_files/cp-dashboard-p2.png) Cluster reconciliation dashboard [http://localhost:3000/d/gitops-toolkit-cluster](http://localhost:3000/d/gitops-toolkit-cluster/gitops-toolkit-cluster-stats): ![](../_files/cluster-dashboard.png) If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from [GitHub](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/grafana/dashboards). !!! hint Note that the toolkit controllers expose the `/metrics` endpoint on port `8080`. When using Prometheus Operator you should create a `PodMonitor` object for each controller to configure scraping. ```yaml apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: source-controller namespace: flux-system spec: namespaceSelector: matchNames: - flux-system selector: matchLabels: app: source-controller podMetricsEndpoints: - port: http-prom ``` ## Metrics For each `toolkit.fluxcd.io` kind, the controllers expose a gauge metric to track the Ready condition status, and a histogram with the reconciliation duration in seconds. Ready status metrics: ```sh gotk_reconcile_condition{kind, name, namespace, type="Ready", status="True"} gotk_reconcile_condition{kind, name, namespace, type="Ready", status="False"} gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Unknown"} gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Deleted"} ``` Time spent reconciling: ``` gotk_reconcile_duration_seconds_bucket{kind, name, namespace, le} gotk_reconcile_duration_seconds_sum{kind, name, namespace} gotk_reconcile_duration_seconds_count{kind, name, namespace} ``` Alert manager example: ```yaml groups: - name: GitOpsToolkit rules: - alert: ReconciliationFailure expr: max(gotk_reconcile_condition{status="False",type="Ready"}) by (namespace, name, kind) + on(namespace, name, kind) (max(gotk_reconcile_condition{status="Deleted"}) by (namespace, name, kind)) * 2 == 1 for: 10m labels: severity: page annotations: summary: '{{ $labels.kind }} {{ $labels.namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.' ```