Commit Graph

32 Commits (ksm-dashboard)

Author SHA1 Message Date
Sunny 3dbe870455 monitoring: Configure KSM & cluster dashboard
Update kube-prometheus-stack helm release values to configure
kube-state-metrics and use kube-state-metrics to collect gotk resource
state metrics.

- Configure kube-state-metrics to run in custom resource state only
  mode. In this mode, it'll only watch custom resources. Also, pass
  empty collectors as extra args to prevent passing all the core
  resources to watch as an argument.
- Running kube-state-metrics in custom resource state only mode makes
  the default grafana dashboards of no use. Disable the default
  dashboards.
- Add kube-state-metrics configuration to provide RBAC permissions to it
  to allow listing and watching flux CRDs.
- Also, configure custom resource state for each of the flux custom
  resources using Info type metrics called `gotk_resource_info`. KSM
  issues a warning if an Info type object doesn't have `_info` suffix.
  These metrics have the value 1 always. This works well for the CRD
  state metrics as a zero value would mean that the resource doesn't
  exist, in which case, the resource is deleted.
- Update the cluster dashboard panels to use `gotk_resource_info` in the
  queries.
  - Only the following panels have been updated
    - Cluster Reconcilers
    - Failing Reconcilers
    - Cluster reconciliation readiness
    - Kubernetes Manifests Sources
    - Failing Sources
    - Source acquisition readiness
  - The panels have been updated such that it's work with static
  resources which don't have any status as well. By default, it assumes
  such static resources to be in a Ready state. Resources are seen as
  failed only when the ready value is false.
  - The queries have been updated to Instant type in order to show the
  current data, instead of the result of past 15 minutes. This shows
  more accurate resource data as the resource metrics change.
  - The Stat visualizers have been updated to have zero as the default
  value when there's no data. This is to prevent showing no data when
  there's no object. This was motivated by the behavior of the previous
  configuration which depended on stale metrics from controllers and
  deleted conditions to show zero value when objects get deleted. With
  the fixes in the controller metrics that removes stale metrics, this
  will no longer work. In order to show a zero value for these stats, a
  default is set.
  - The `$namespace` variable has been updated to refer to
  `exported_namespace` from `gotk_resource_info`.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
1 year ago
Hey 08859f1588
Outdated URL
The location of this URL was moved

Signed-off-by: Hey <18427051+Hey@users.noreply.github.com>
2 years ago
Stefan Prodan 06ed881e37
Disable drift detection for kube-prometheus-stack webhooks
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
2 years ago
Alex Howard 87f792915a Fix kube-prometheus-stack manifests
Updates the HelmRepository and HelmRelease to remove chart
verification and switch to using the official HTTPS repository
at https://prometheus-community.github.io/helm-charts.

OCI builds have temporarily been suspended for these charts due
to pipeline errors.

See: prometheus-community/helm-charts#2841

Signed-off-by: Alex Howard <thezanke@gmail.com>
2 years ago
Stefan Prodan 98e0774f56
Use kube-prometheus-stack signed OCI Helm chart
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
2 years ago
Santosh Kaluskar 6a1ba3c545 monitoring: use container_memory_working_set_bytes
Signed-off-by: Santosh Kaluskar <dtshbl@gmail.com>
2 years ago
Arcadie Condrat 82f847e21d
Filter out non-running pods in Prometheus
Prometheus job generated by the PodMonitor does not exclude non-running pods. All the "completed" Pods are still going to be  listed as targets in Prometheus and marked as down. This issue is related to PodMonitor implementation and is discussed in prometheus-operator/prometheus-operator#4816

Signed-off-by: Arcadie Condrat <arcadie.condrat@gmail.com>
2 years ago
bart-plasmeijer 5f35bd4e00 put the dashboard config map in the right namespace
Signed-off-by: Bart Plasmeijer <bart.plasmeijer@gmail.com>
3 years ago
Stefan Prodan 8576073b9d
monitoring: Add Grafana Loki HR and Flux logs dashboard
- add loki-stack HelmRelease to install Loki and Promtail in the monitoring namespace
- make the loki-stack HelmRelease depend on kube-prometheus-stack to install Loki's datasource and service monitors in the correct order
- add a Grafana dashboard for displaying and filtering the Flux controllers JSON logs

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
3 years ago
Stefan Prodan 4acef9d508
Add Flux events to dashboard annotations
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
3 years ago
Stefan Prodan 8128fc190d
Update kube-prometheus-stack chart to v35
- Automate kube-prometheus-stack helm release upgrades for the v35.x range
- Remove deprecated Grafana settings
- Set Prometheus retention to 24h
- Label Flux dashboards and PodMonitors with `app.kubernetes.io/component: monitoring`
- Change the `podMonitorSelector` to match the label `app.kubernetes.io/component: monitoring`

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
3 years ago
Stefan Prodan 2ba0c4435e
Remove deprecated monitoring stack
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
3 years ago
Cristian Chiru 38c62d46c7 [grafana dashboard] display exported namespace, slight resizing, default sorting by state
Signed-off-by: Cristian Chiru <cristi.chiru@gmail.com>
3 years ago
Sunny b44e4617e0
monitoring-config: grafana dashboards labelValue
Since kube-prometheus-stack helm chart v32.2.0, the `labelValue` has to
be set to "1" for the default grafana dashboard label selector to select
the flux dashboard configuration.

Also, update kube-prometheus-stack to v34.7.0, latest.

Refer: eba5b198f5

Signed-off-by: Sunny <darkowlzz@protonmail.com>
3 years ago
Johannes Graf ba5eea861e bump kube-prometheus-stack to 23.2.0
this release contains the prometheus operator in version 0.52.1

see https://github.com/fluxcd/flux2/issues/2192
https://github.com/fluxcd/flux2/pull/2193 for issues

Signed-off-by: Johannes Graf <graf@synyx.de>
3 years ago
Luke Mallon (Nalum) 6f0ea04ff3
[refactor] Update JSON from Grafana export
Signed-off-by: Luke Mallon (Nalum) <luke.mallon@weave.works>
3 years ago
Kingdon Barrett 1393e7a62b
pin monitoring release version at 19.3.0
Something in kube-prometheus-stack 20.0.0 has broken our example.
See https://github.com/fluxcd/flux2/pull/2193 for more information.

Signed-off-by: Kingdon Barrett <kingdon@weave.works>
3 years ago
Daniel AguadoAraujo 80cf5fa729 Add new variable to filter by exported namespace.
Edit definition of namespace variable to use grafana custom promql function `label_values`.
Rename variable namespace to operator_namespace.
Rename variable exported_namespace to namespace

Signed-off-by: Daniel AguadoAraujo <daniel.aguadoaraujo@gfk.com>
3 years ago
Paweł Krupa fcb73554c9
Update podmonitor.yaml
`targetPort` is deprecated since prometheus-operator 0.38.0 as per https://github.com/prometheus-operator/prometheus-operator/blob/master/CHANGELOG.md#0380--2020-03-20

Signed-off-by: paulfantom <pawel@krupa.net.pl>
4 years ago
Daniel-Andrei Minca c98cd10621
fix Control Plane dashboard legend
The legend was not showing the Pod name, instead the whole resource in
the dashboard

As a result, use the correct Prometheus label

Resolves:
Related:
Signed-off-by: Daniel-Andrei Minca <mandrei17@gmail.com>
4 years ago
Stefan Prodan c7d876eb8f
Enable CRDs upgrade for kube-prometheus-stack
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
4 years ago
Somtochi Onyekwere 3b91e14f6d Use kube-prometheus-stack for monitoring
Signed-off-by: Somtochi Onyekwere <somtochionyekwere@gmail.com>
4 years ago
Somtochi Onyekwere be65cf8052 Change labels in prometheus and grafana dashboard
Signed-off-by: Somtochi Onyekwere <somtochionyekwere@gmail.com>
4 years ago
Léopold Jacquot 344a909d19 Fix datasource for cluster Grafana dashboard
Signed-off-by: Léopold Jacquot <leopold.jacquot@infomaniak.com>
4 years ago
Hidde Beydals 345707e6cc Incorporate name and metric changes in Grafana cfg
Signed-off-by: Hidde Beydals <hello@hidde.co>

Signed-off-by: Hidde Beydals <hello@hidde.co>
4 years ago
Hidde Beydals 9916a53761 Rename `gotk-system` namespace to `flux-system`
Signed-off-by: Hidde Beydals <hello@hidde.co>
4 years ago
Stefan Prodan 4565165579
Add cluster stats dashboard
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
4 years ago
Stefan Prodan 8a96e32679
Update Prometheus and Grafana
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
4 years ago
Hidde Beydals ff6a1c14be Rename 'gitops-system' namespace to 'gotk-system'
To align with the project name, and the group introduced in #236.
4 years ago
stefanprodan 824de61579 Filter controllers in control plane dashboard 5 years ago
stefanprodan 87a299736e Add control plane Grafana dashboard 5 years ago
stefanprodan e86286722a Add Prom+Grafana monitoring stack 5 years ago