Update kube-prometheus-stack helm release values to configure
kube-state-metrics and use kube-state-metrics to collect gotk resource
state metrics.
- Configure kube-state-metrics to run in custom resource state only
mode. In this mode, it'll only watch custom resources. Also, pass
empty collectors as extra args to prevent passing all the core
resources to watch as an argument.
- Running kube-state-metrics in custom resource state only mode makes
the default grafana dashboards of no use. Disable the default
dashboards.
- Add kube-state-metrics configuration to provide RBAC permissions to it
to allow listing and watching flux CRDs.
- Also, configure custom resource state for each of the flux custom
resources using Info type metrics called `gotk_resource_info`. KSM
issues a warning if an Info type object doesn't have `_info` suffix.
These metrics have the value 1 always. This works well for the CRD
state metrics as a zero value would mean that the resource doesn't
exist, in which case, the resource is deleted.
- Update the cluster dashboard panels to use `gotk_resource_info` in the
queries.
- Only the following panels have been updated
- Cluster Reconcilers
- Failing Reconcilers
- Cluster reconciliation readiness
- Kubernetes Manifests Sources
- Failing Sources
- Source acquisition readiness
- The panels have been updated such that it's work with static
resources which don't have any status as well. By default, it assumes
such static resources to be in a Ready state. Resources are seen as
failed only when the ready value is false.
- The queries have been updated to Instant type in order to show the
current data, instead of the result of past 15 minutes. This shows
more accurate resource data as the resource metrics change.
- The Stat visualizers have been updated to have zero as the default
value when there's no data. This is to prevent showing no data when
there's no object. This was motivated by the behavior of the previous
configuration which depended on stale metrics from controllers and
deleted conditions to show zero value when objects get deleted. With
the fixes in the controller metrics that removes stale metrics, this
will no longer work. In order to show a zero value for these stats, a
default is set.
- The `$namespace` variable has been updated to refer to
`exported_namespace` from `gotk_resource_info`.
Signed-off-by: Sunny <darkowlzz@protonmail.com>
Updates the HelmRepository and HelmRelease to remove chart
verification and switch to using the official HTTPS repository
at https://prometheus-community.github.io/helm-charts.
OCI builds have temporarily been suspended for these charts due
to pipeline errors.
See: prometheus-community/helm-charts#2841
Signed-off-by: Alex Howard <thezanke@gmail.com>
Prometheus job generated by the PodMonitor does not exclude non-running pods. All the "completed" Pods are still going to be listed as targets in Prometheus and marked as down. This issue is related to PodMonitor implementation and is discussed in prometheus-operator/prometheus-operator#4816
Signed-off-by: Arcadie Condrat <arcadie.condrat@gmail.com>
- add loki-stack HelmRelease to install Loki and Promtail in the monitoring namespace
- make the loki-stack HelmRelease depend on kube-prometheus-stack to install Loki's datasource and service monitors in the correct order
- add a Grafana dashboard for displaying and filtering the Flux controllers JSON logs
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
- Automate kube-prometheus-stack helm release upgrades for the v35.x range
- Remove deprecated Grafana settings
- Set Prometheus retention to 24h
- Label Flux dashboards and PodMonitors with `app.kubernetes.io/component: monitoring`
- Change the `podMonitorSelector` to match the label `app.kubernetes.io/component: monitoring`
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Since kube-prometheus-stack helm chart v32.2.0, the `labelValue` has to
be set to "1" for the default grafana dashboard label selector to select
the flux dashboard configuration.
Also, update kube-prometheus-stack to v34.7.0, latest.
Refer: eba5b198f5
Signed-off-by: Sunny <darkowlzz@protonmail.com>
Something in kube-prometheus-stack 20.0.0 has broken our example.
See https://github.com/fluxcd/flux2/pull/2193 for more information.
Signed-off-by: Kingdon Barrett <kingdon@weave.works>
Edit definition of namespace variable to use grafana custom promql function `label_values`.
Rename variable namespace to operator_namespace.
Rename variable exported_namespace to namespace
Signed-off-by: Daniel AguadoAraujo <daniel.aguadoaraujo@gfk.com>
The legend was not showing the Pod name, instead the whole resource in
the dashboard
As a result, use the correct Prometheus label
Resolves:
Related:
Signed-off-by: Daniel-Andrei Minca <mandrei17@gmail.com>