Add monitoring guide

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
pull/369/head
Stefan Prodan 4 years ago
parent 3e7bfc72a2
commit 06434464bb
No known key found for this signature in database
GPG Key ID: 3299AEB0E4085BAF

@ -375,57 +375,10 @@ gotk create helmrelease sealed-secrets \
## Monitoring with Prometheus and Grafana ## Monitoring with Prometheus and Grafana
The GitOps Toolkit comes with a monitoring stack composed of: The GitOps Toolkit comes with a monitoring stack composed of Prometheus and Grafana. The controllers expose
metrics that can be used to track the readiness of the cluster reconciliation process.
* **Prometheus** server - collects metrics from the toolkit controllers and stores them for 2h To install the monitoring stack please follow this [guide](monitoring.md).
* **Grafana** dashboards - displays the control plane resource usage and reconciliation stats
To install the monitoring stack with `gotk`, first register the toolkit Git repository on your cluster:
```sh
gotk create source git monitoring \
--interval=30m \
--url=https://github.com/fluxcd/toolkit \
--branch=main
```
Then apply the [manifests/monitoring](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring)
kustomization:
```sh
gotk create kustomization monitoring \
--interval=1h \
--prune=true \
--source=monitoring \
--path="./manifests/monitoring" \
--health-check="Deployment/prometheus.gotk-system" \
--health-check="Deployment/grafana.gotk-system"
```
You can access Grafana using port forwarding:
```sh
kubectl -n gotk-system port-forward svc/grafana 3000:3000
```
Navigate to [http://localhost:3000/d/gitops-toolkit-control-plane](http://localhost:3000/d/gitops-toolkit-control-plane/gitops-toolkit-control-plane)
for the control plane dashboard:
![](../_files/cp-dashboard-p1.png)
![](../_files/cp-dashboard-p2.png)
Navigate to [http://localhost:3000/d/gitops-toolkit-cluster](http://localhost:3000/d/gitops-toolkit-cluster/gitops-toolkit-cluster-stats)
for the cluster reconciliation stats dashboard:
![](../_files/cluster-dashboard.png)
If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from
[GitHub](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring/grafana/dashboards).
!!! hint
Note that the toolkit controllers expose the `/metrics` endpoint on port `8080`.
When using Prometheus Operator you should create `PodMonitor` objects to configure scraping.
## Uninstall ## Uninstall

@ -0,0 +1,95 @@
# Monitoring
This guide walks you through configuring monitoring for the GitOps Toolkit control plane.
The toolkit comes with a monitoring stack composed of:
* **Prometheus** server - collects metrics from the toolkit controllers and stores them for 2h
* **Grafana** dashboards - displays the control plane resource usage and reconciliation stats
## Install the monitoring stack
To install the monitoring stack with `gotk`, first register the toolkit Git repository on your cluster:
```sh
gotk create source git monitoring \
--interval=30m \
--url=https://github.com/fluxcd/toolkit \
--branch=main
```
Then apply the [manifests/monitoring](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring)
kustomization:
```sh
gotk create kustomization monitoring \
--interval=1h \
--prune=true \
--source=monitoring \
--path="./manifests/monitoring" \
--health-check="Deployment/prometheus.gotk-system" \
--health-check="Deployment/grafana.gotk-system"
```
You can access Grafana using port forwarding:
```sh
kubectl -n gotk-system port-forward svc/grafana 3000:3000
```
## Grafana dashboards
Control plane dashboard [http://localhost:3000/d/gitops-toolkit-control-plane](http://localhost:3000/d/gitops-toolkit-control-plane/gitops-toolkit-control-plane):
![](../_files/cp-dashboard-p1.png)
![](../_files/cp-dashboard-p2.png)
Cluster reconciliation dashboard [http://localhost:3000/d/gitops-toolkit-cluster](http://localhost:3000/d/gitops-toolkit-cluster/gitops-toolkit-cluster-stats):
![](../_files/cluster-dashboard.png)
If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from
[GitHub](https://github.com/fluxcd/toolkit/tree/main/manifests/monitoring/grafana/dashboards).
!!! hint
Note that the toolkit controllers expose the `/metrics` endpoint on port `8080`.
When using Prometheus Operator you should create `PodMonitor` objects to configure scraping.
## Metrics
For each `toolkit.fluxcd.io` kind,
the controllers expose a gauge metric to track the Ready condition status,
and a histogram with the reconciliation duration in seconds.
Ready status metrics:
```sh
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="True"}
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="False"}
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Unkown"}
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Deleted"}
```
Time spent reconciling:
```
gotk_reconcile_duration_bucket{kind, name, namespace, le}
gotk_reconcile_duration_sum{kind, name, namespace}
gotk_reconcile_duration_count{kind, name, namespace}
```
Alert manager example:
```yaml
groups:
- name: GitOpsToolkit
rules:
- alert: ReconciliationFailure
expr: gotk_reconcile_condition{type="Ready",status="False"} == 1
for: 10m
labels:
severity: page
annotations:
summary: '{{ $labels.kind }} {{ $labels.namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.'
```

@ -49,6 +49,7 @@ nav:
- Manage Helm Releases: guides/helmreleases.md - Manage Helm Releases: guides/helmreleases.md
- Setup Notifications: guides/notifications.md - Setup Notifications: guides/notifications.md
- Setup Webhook Receivers: guides/webhook-receivers.md - Setup Webhook Receivers: guides/webhook-receivers.md
- Monitoring with Prometheus: guides/monitoring.md
- Sealed Secrets: guides/sealed-secrets.md - Sealed Secrets: guides/sealed-secrets.md
- Mozilla SOPS: guides/mozilla-sops.md - Mozilla SOPS: guides/mozilla-sops.md
- Toolkit Components: - Toolkit Components:

Loading…
Cancel
Save