1
0
mirror of synced 2026-03-02 03:26:57 +00:00

Compare commits

..

1 Commits

Author SHA1 Message Date
Sunny
3dbe870455 monitoring: Configure KSM & cluster dashboard
Update kube-prometheus-stack helm release values to configure
kube-state-metrics and use kube-state-metrics to collect gotk resource
state metrics.

- Configure kube-state-metrics to run in custom resource state only
  mode. In this mode, it'll only watch custom resources. Also, pass
  empty collectors as extra args to prevent passing all the core
  resources to watch as an argument.
- Running kube-state-metrics in custom resource state only mode makes
  the default grafana dashboards of no use. Disable the default
  dashboards.
- Add kube-state-metrics configuration to provide RBAC permissions to it
  to allow listing and watching flux CRDs.
- Also, configure custom resource state for each of the flux custom
  resources using Info type metrics called `gotk_resource_info`. KSM
  issues a warning if an Info type object doesn't have `_info` suffix.
  These metrics have the value 1 always. This works well for the CRD
  state metrics as a zero value would mean that the resource doesn't
  exist, in which case, the resource is deleted.
- Update the cluster dashboard panels to use `gotk_resource_info` in the
  queries.
  - Only the following panels have been updated
    - Cluster Reconcilers
    - Failing Reconcilers
    - Cluster reconciliation readiness
    - Kubernetes Manifests Sources
    - Failing Sources
    - Source acquisition readiness
  - The panels have been updated such that it's work with static
  resources which don't have any status as well. By default, it assumes
  such static resources to be in a Ready state. Resources are seen as
  failed only when the ready value is false.
  - The queries have been updated to Instant type in order to show the
  current data, instead of the result of past 15 minutes. This shows
  more accurate resource data as the resource metrics change.
  - The Stat visualizers have been updated to have zero as the default
  value when there's no data. This is to prevent showing no data when
  there's no object. This was motivated by the behavior of the previous
  configuration which depended on stale metrics from controllers and
  deleted conditions to show zero value when objects get deleted. With
  the fixes in the controller metrics that removes stale metrics, this
  will no longer work. In order to show a zero value for these stats, a
  default is set.
  - The `$namespace` variable has been updated to refer to
  `exported_namespace` from `gotk_resource_info`.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
2023-08-07 19:18:32 +05:30
3 changed files with 477 additions and 345 deletions

View File

@@ -6,7 +6,7 @@ spec:
interval: 5m
chart:
spec:
version: "45.x"
version: "48.x"
chart: kube-prometheus-stack
sourceRef:
kind: HelmRepository
@@ -31,6 +31,249 @@ spec:
podMonitorSelector:
matchLabels:
app.kubernetes.io/component: monitoring
grafana:
defaultDashboardsEnabled: false
kube-state-metrics:
collectors: []
extraArgs:
- --custom-resource-state-only=true
rbac:
extraRules:
- apiGroups:
- source.toolkit.fluxcd.io
- kustomize.toolkit.fluxcd.io
- helm.toolkit.fluxcd.io
- image.toolkit.fluxcd.io
- notification.toolkit.fluxcd.io
resources:
- gitrepositories
- buckets
- helmrepositories
- helmcharts
- ocirepositories
- kustomizations
- helmreleases
- imagerepositories
- imagepolicies
- imageupdateautomations
- alerts
- providers
- receivers
verbs: ["list", "watch"]
customResourceState:
enabled: true
config:
spec:
resources:
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1"
kind: GitRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: Bucket
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: HelmRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
type: [spec, type]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: HelmChart
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: OCIRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: kustomize.toolkit.fluxcd.io
version: "v1"
kind: Kustomization
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: helm.toolkit.fluxcd.io
version: "v2beta1"
kind: HelmRelease
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: image.toolkit.fluxcd.io
version: "v1beta2"
kind: ImageRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: image.toolkit.fluxcd.io
version: "v1beta2"
kind: ImagePolicy
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: image.toolkit.fluxcd.io
version: "v1beta1"
kind: ImageUpdateAutomation
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: notification.toolkit.fluxcd.io
version: "v1beta2"
kind: Alert
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: notification.toolkit.fluxcd.io
version: "v1beta2"
kind: Provider
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: notification.toolkit.fluxcd.io
version: "v1"
kind: Receiver
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
postRenderers:
- kustomize:
patches:

View File

@@ -30,18 +30,23 @@
]
},
"editable": true,
"gnetId": null,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"iteration": 1652337714814,
"id": 5,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": "${DS_PROMETHEUS}",
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "",
"fieldConfig": {
"defaults": {
"decimals": 0,
"mappings": [],
"noValue": "0",
"thresholds": {
"mode": "absolute",
"steps": [
@@ -81,28 +86,37 @@
"text": {},
"textMode": "value"
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"exemplar": true,
"expr": "count(gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"True\",kind=~\"Kustomization|HelmRelease\"})\n-\nsum(gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"Deleted\",kind=~\"Kustomization|HelmRelease\"})",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"exemplar": false,
"expr": "count(gotk_resource_info{exported_namespace=~\"$namespace\", customresource_kind=~\"Kustomization|HelmRelease\"})",
"instant": true,
"interval": "",
"legendFormat": "",
"range": false,
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Cluster Reconcilers",
"type": "stat"
},
{
"datasource": "${DS_PROMETHEUS}",
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "",
"fieldConfig": {
"defaults": {
"decimals": 0,
"mappings": [],
"noValue": "0",
"thresholds": {
"mode": "absolute",
"steps": [
@@ -138,28 +152,37 @@
"text": {},
"textMode": "value"
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"exemplar": true,
"expr": "sum(gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"False\",kind=~\"Kustomization|HelmRelease\"})",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"exemplar": false,
"expr": "count(gotk_resource_info{exported_namespace=~\"$namespace\", customresource_kind=~\"Kustomization|HelmRelease\", ready=\"False\"})",
"instant": true,
"interval": "",
"legendFormat": "",
"range": false,
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Failing Reconcilers",
"type": "stat"
},
{
"datasource": "${DS_PROMETHEUS}",
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "",
"fieldConfig": {
"defaults": {
"decimals": 0,
"mappings": [],
"noValue": "0",
"thresholds": {
"mode": "absolute",
"steps": [
@@ -199,28 +222,37 @@
"text": {},
"textMode": "value"
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"exemplar": true,
"expr": "count(gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"True\",kind=~\"GitRepository|HelmRepository|Bucket\"})\n-\nsum(gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"Deleted\",kind=~\"GitRepository|HelmRepository|Bucket\"})",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"exemplar": false,
"expr": "count(gotk_resource_info{exported_namespace=~\"$namespace\", customresource_kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\"})",
"instant": true,
"interval": "",
"legendFormat": "",
"range": false,
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Kubernetes Manifests Sources",
"type": "stat"
},
{
"datasource": "${DS_PROMETHEUS}",
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "",
"fieldConfig": {
"defaults": {
"decimals": 0,
"mappings": [],
"noValue": "0",
"thresholds": {
"mode": "absolute",
"steps": [
@@ -256,18 +288,23 @@
"text": {},
"textMode": "value"
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"exemplar": true,
"expr": "sum(gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"False\",kind=~\"GitRepository|HelmRepository|Bucket\"})",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"exemplar": false,
"expr": "count(gotk_resource_info{exported_namespace=~\"$namespace\", customresource_kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\", ready=\"False\"})",
"instant": true,
"interval": "",
"legendFormat": "",
"range": false,
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Failing Sources",
"type": "stat"
},
@@ -318,9 +355,10 @@
"values": false
},
"showUnfilled": true,
"text": {}
"text": {},
"valueMode": "color"
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"exemplar": true,
@@ -330,8 +368,6 @@
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Reconciler ops avg. duration",
"type": "bargauge"
},
@@ -382,20 +418,19 @@
"values": false
},
"showUnfilled": true,
"text": {}
"text": {},
"valueMode": "color"
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"exemplar": true,
"expr": " sum(rate(gotk_reconcile_duration_seconds_sum{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket\"}[5m])) by (kind)\n/\n sum(rate(gotk_reconcile_duration_seconds_count{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket\"}[5m])) by (kind)",
"expr": " sum(rate(gotk_reconcile_duration_seconds_sum{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\"}[5m])) by (kind)\n/\n sum(rate(gotk_reconcile_duration_seconds_count{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\"}[5m])) by (kind)",
"interval": "",
"legendFormat": "{{kind}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Source ops avg. duration",
"type": "bargauge"
},
@@ -414,23 +449,33 @@
"type": "row"
},
{
"datasource": "${DS_PROMETHEUS}",
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "",
"fieldConfig": {
"defaults": {
"custom": {
"displayMode": "auto",
"align": "auto",
"cellOptions": {
"type": "auto"
},
"filterable": true,
"inspect": false
},
"mappings": [
{
"options": {
"0": {
"text": "Ready"
},
"1": {
"False": {
"color": "red",
"index": 1,
"text": "Not Ready"
},
"True": {
"color": "blue",
"index": 0,
"text": "Ready"
}
},
"type": "value"
@@ -440,16 +485,8 @@
"mode": "absolute",
"steps": [
{
"color": "blue",
"color": "transparent",
"value": null
},
{
"color": "blue",
"value": 0
},
{
"color": "red",
"value": 1
}
]
}
@@ -457,13 +494,16 @@
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Status"
"id": "byType",
"options": "string"
},
"properties": [
{
"id": "custom.displayMode",
"value": "color-background"
"id": "custom.cellOptions",
"value": {
"mode": "basic",
"type": "color-background"
}
}
]
}
@@ -477,7 +517,9 @@
},
"id": 33,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
@@ -492,11 +534,16 @@
}
]
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"exemplar": true,
"expr": "gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"False\",kind=~\"Kustomization|HelmRelease\"}",
"expr": "gotk_resource_info{exported_namespace=~\"$namespace\", customresource_kind=~\"Kustomization|HelmRelease\"}",
"format": "table",
"instant": true,
"interval": "",
@@ -504,8 +551,6 @@
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Cluster reconciliation readiness ",
"transformations": [
{
@@ -513,11 +558,16 @@
"options": {
"excludeByName": {
"Time": true,
"Value": true,
"__name__": true,
"app": true,
"container": true,
"customresource_group": true,
"customresource_kind": false,
"customresource_version": true,
"endpoint": true,
"exported_namespace": false,
"gotk_type": true,
"instance": true,
"job": true,
"kubernetes_namespace": true,
@@ -525,16 +575,36 @@
"namespace": true,
"pod": true,
"pod_template_hash": true,
"service": true,
"status": true,
"type": true
},
"indexByName": {},
"indexByName": {
"Time": 0,
"Value": 15,
"__name__": 1,
"container": 2,
"customresource_group": 4,
"customresource_kind": 5,
"customresource_version": 6,
"endpoint": 7,
"exported_namespace": 3,
"instance": 8,
"job": 9,
"name": 10,
"namespace": 11,
"pod": 12,
"ready": 13,
"service": 14
},
"renameByName": {
"Value": "Status",
"Value": "",
"customresource_kind": "Kind",
"exported_namespace": "Namespace",
"kind": "Kind",
"name": "Name",
"namespace": "Operator Namespace"
"namespace": "Operator Namespace",
"ready": "Status"
}
}
}
@@ -542,23 +612,36 @@
"type": "table"
},
{
"datasource": "${DS_PROMETHEUS}",
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"displayMode": "auto",
"align": "auto",
"cellOptions": {
"type": "auto"
},
"filterable": true,
"inspect": false
},
"mappings": [
{
"options": {
"0": {
"text": "Ready"
},
"1": {
"False": {
"color": "red",
"index": 1,
"text": "Not Ready"
},
"True": {
"color": "blue",
"index": 0,
"text": "Ready"
}
},
"type": "value"
@@ -568,21 +651,28 @@
"mode": "absolute",
"steps": [
{
"color": "blue",
"color": "transparent",
"value": null
},
{
"color": "blue",
"value": 0
},
{
"color": "red",
"value": 1
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byType",
"options": "string"
},
"properties": [
{
"id": "custom.cellOptions",
"value": {
"mode": "basic",
"type": "color-background"
}
}
]
},
{
"matcher": {
"id": "byName",
@@ -590,8 +680,15 @@
},
"properties": [
{
"id": "custom.displayMode",
"value": "color-background"
"id": "noValue",
"value": "Ready"
},
{
"id": "color",
"value": {
"fixedColor": "blue",
"mode": "fixed"
}
}
]
}
@@ -605,7 +702,9 @@
},
"id": 34,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
@@ -620,11 +719,16 @@
}
]
},
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"exemplar": true,
"expr": "gotk_reconcile_condition{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",type=\"Ready\",status=\"False\",kind=~\"GitRepository|HelmRepository|Bucket\"}",
"expr": "gotk_resource_info{exported_namespace=~\"$namespace\", customresource_kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\"}",
"format": "table",
"instant": true,
"interval": "",
@@ -632,8 +736,6 @@
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Source acquisition readiness ",
"transformations": [
{
@@ -641,11 +743,16 @@
"options": {
"excludeByName": {
"Time": true,
"Value": true,
"__name__": true,
"app": true,
"container": true,
"customresource_group": true,
"customresource_kind": false,
"customresource_version": true,
"endpoint": true,
"exported_namespace": false,
"gotk_type": true,
"instance": true,
"job": true,
"kubernetes_namespace": true,
@@ -653,16 +760,37 @@
"namespace": true,
"pod": true,
"pod_template_hash": true,
"ready": false,
"service": true,
"status": true,
"type": true
},
"indexByName": {},
"indexByName": {
"Time": 0,
"Value": 15,
"__name__": 1,
"container": 2,
"customresource_group": 5,
"customresource_kind": 6,
"customresource_version": 7,
"endpoint": 8,
"exported_namespace": 4,
"instance": 9,
"job": 10,
"name": 11,
"namespace": 3,
"pod": 12,
"ready": 13,
"service": 14
},
"renameByName": {
"Value": "Status",
"Value": "",
"customresource_kind": "Kind",
"exported_namespace": "Namespace",
"kind": "Kind",
"name": "Name",
"namespace": "Operator Namespace"
"namespace": "Operator Namespace",
"ready": "Status"
}
}
}
@@ -690,10 +818,6 @@
"dashes": false,
"datasource": "${DS_PROMETHEUS}",
"description": "",
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -724,7 +848,7 @@
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"pointradius": 2,
"points": false,
"renderer": "flot",
@@ -743,9 +867,7 @@
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Cluster reconciliation duration",
"tooltip": {
"shared": true,
@@ -754,33 +876,24 @@
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
"align": false
}
},
{
@@ -790,10 +903,6 @@
"dashes": false,
"datasource": "${DS_PROMETHEUS}",
"description": "",
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -824,7 +933,7 @@
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.5",
"pluginVersion": "10.0.2",
"pointradius": 2,
"points": false,
"renderer": "flot",
@@ -835,7 +944,7 @@
"targets": [
{
"exemplar": true,
"expr": " sum(rate(gotk_reconcile_duration_seconds_sum{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket\"}[5m])) by (kind, name)\n/\n sum(rate(gotk_reconcile_duration_seconds_count{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket\"}[5m])) by (kind, name)",
"expr": " sum(rate(gotk_reconcile_duration_seconds_sum{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\"}[5m])) by (kind, name)\n/\n sum(rate(gotk_reconcile_duration_seconds_count{namespace=~\"$operator_namespace\",exported_namespace=~\"$namespace\",kind=~\"GitRepository|HelmRepository|Bucket|OCIRepository\"}[5m])) by (kind, name)",
"hide": false,
"interval": "",
"legendFormat": "{{kind}}/{{name}}",
@@ -843,9 +952,7 @@
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Source acquisition duration",
"tooltip": {
"shared": true,
@@ -854,38 +961,29 @@
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
"align": false
}
}
],
"refresh": "30s",
"schemaVersion": 36,
"schemaVersion": 38,
"style": "light",
"tags": [
"flux"
@@ -903,13 +1001,13 @@
"$__all"
]
},
"datasource": "$DS_PROMETHEUS",
"datasource": {
"type": "prometheus",
"uid": "$DS_PROMETHEUS"
},
"definition": "label_values(gotk_reconcile_condition, namespace)",
"description": null,
"error": null,
"hide": 0,
"includeAll": true,
"label": null,
"multi": true,
"name": "operator_namespace",
"options": [],
@@ -928,10 +1026,8 @@
"useTags": false
},
{
"allValue": null,
"current": {
"selected": true,
"tags": [],
"text": [
"All"
],
@@ -939,19 +1035,19 @@
"$__all"
]
},
"datasource": "$DS_PROMETHEUS",
"definition": "label_values(gotk_reconcile_condition, exported_namespace)",
"description": null,
"error": null,
"datasource": {
"type": "prometheus",
"uid": "$DS_PROMETHEUS"
},
"definition": "label_values(gotk_resource_info,exported_namespace)",
"hide": 0,
"includeAll": true,
"label": null,
"multi": true,
"name": "namespace",
"options": [],
"query": {
"query": "label_values(gotk_reconcile_condition, exported_namespace)",
"refId": "StandardVariableQuery"
"query": "label_values(gotk_resource_info,exported_namespace)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 2,
"regex": "",
@@ -1000,7 +1096,9 @@
"1d"
]
},
"timezone": "",
"title": "Flux Cluster Stats",
"uid": "flux-cluster",
"version": 3
"version": 4,
"weekStart": ""
}

View File

@@ -1,209 +0,0 @@
# RFC-0006 Passwordless authentication for Git repositories
**Status:** provisional
**Creation date:** 2023-31-07
## Summary
Flux should provide a mechanism to authenticate against Git repositories without
the use of passwords. This RFC proposes the use of alternative authentication
methods like OIDC, OAuth2 and IAM to access Git repositories hosted on specific
Git SaaS platforms and cloud providers.
## Motivation
At the moment, Flux supports HTTP basic and bearer authentication. Users are
required to create a Secret containing the username and the password/bearer
token, which is then referred to in the GitRepository using `.spec.secretRef`.
While this works fine, it has a couple of drawbacks:
* Scalability: Each new GitRepository potentially warrants another credentials
pair, which doesn't scale well in big organizations with hundreds of
repositories with different owners, increasing the risk of mismanagement and
leaks.
* Identity: A username is associated with an actual human. But often, the
repository belongs to a team of 2 or more people. This leads to a problem where
teams have to decide whose credentials should Flux use for authentication.
These problems exist not due to flaws in Flux, but because of the inherent
nature of password based authentication.
With support for OIDC, OAuth2 and IAM based authentication, we can eliminate
these problems:
* Scalability: Since OIDC is fully handled by the cloud provider, it eliminates
any user involvement in managing credentials. For OAuth2 and IAM, users do need
to provide certain information like the ID of the resource, private key, etc.
but these are still a better alternative to passwords since the same resource
can be reused by multiple teams with different members.
* Identity: Since all the above authentication methods are associated with a
virtual resource independent of a user, it solves the problem of a single person
being tied to automation that several people are involved in.
### Goals
* Integrate with major cloud providers' OIDC and IAM offerings to provide a
seamless way of Git repository authentication.
* Integrate with major Git SaaS providers to support their app based OAuth2
mechanism.
### Non-Goals
* Replace the existing basic and bearer authentication API.
## Proposal
A new string field `.spec.provider` shall be added to the GitRepository API.
The field will be an enum with the following variants:
* `azure`
* `github`
* `gcp`
> AWS CodeCommit is not supported as it does not support authentication via IAM
Roles without the use of https://github.com/aws/git-remote-codecommit.
By default, it will be blank, which indicates that the user wants to
authenticate via HTTP basic/bearer auth or SSH.
### Azure
Git repositories hosted on Azure Devops can be accessed by Flux using OIDC if
the cluster running Flux is hosted on AKS with [managed identity](https://learn.microsoft.com/en-us/azure/devops/integrate/get-started/authentication/service-principal-managed-identity?view=azure-devops)
enabled. The managed identity associated with the cluster must have sufficient
permissions to be able to access Azure Devops resources. This enables Flux to
access the Git repository without the need for any credentials.
```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: azure-devops
spec:
interval: 1m
url: https://dev.azure.com/<org>/<project>/_git/<repository>
ref:
branch: master
# notice the lack of secretRef
provider: azure
```
### GCP
Git repositories hosted on Google Cloud Source Repositories can be accessed by
Flux via a [GCP Service Account](https://cloud.google.com/iam/docs/service-account-overview).
The Service Account must have sufficient permissions to be able to access Google
Cloud Source Repositories and its credentials should be specified in the secret
referred to in `.spec.secretRef`.
```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: gcp-repo
spec:
interval: 1m
url: https://source.developers.google.com/p/<project>/r/<repository>
ref:
branch: master
provider: gcp
secretRef:
name: gcp-sa
---
kind: Secret
metadata:
name: gcp-sa
stringData:
gcpServiceAccount: |
{
"type": "service_account",
"project_id": "my-google-project",
"private_key_id": "REDACTED",
"private_key": "-----BEGIN PRIVATE KEY-----\nREDACTED\n-----END PRIVATE KEY-----\n",
"client_email": "<service-account-id>@my-google-project.iam.gserviceaccount.com",
"client_id": "REDACTED",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<service-account-id>%40my-google-project.iam.gserviceaccount.com"
}
```
### GitHub
Git repositories hosted on GitHub can be accessed via [GitHub Apps](https://docs.github.com/en/apps/overview).
This allows users to create a single resource from which they can access all
their GitHub repositories. The app must have sufficient permissions to be able
to access repositories. The app's ID, private key and installation ID should
be mentioned in the Secret referred to by `.spec.secretRef`. GitHub Enterprise
users will also need to mention their GitHub API URL in the Secret.
```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: github-repo
spec:
interval: 1m
url: https://github.com/<org>/<repository>
ref:
branch: master
provider: github
secretRef:
name: github-app
---
kind: Secret
metadata:
name: gcp-sa
stringData:
githubAppID: <app-id>
githubInstallationID: <installation-id>
githubPrivateKey: |
<PEM-private-key>
githubApiURl: <github-enterprise-api-url> #optional, required only for GitHub Enterprise users
```
## Design Details
### Azure
If `.spec.provider` is set to `azure`, Flux controllers will reach out to
[Azure IMDS (Instance Metadata Service)](https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#get-a-token-using-go)
to get an access token. This [access token will be then used as a bearer token](https://learn.microsoft.com/en-us/azure/devops/integrate/get-started/authentication/service-principal-managed-identity?view=azure-devops#q-can-i-use-a-service-principal-to-do-git-operations-like-clone-a-repo)
to perform HTTP bearer authentication.
### GCP
If `.spec.provider` is set to `gcp`, Flux controllers will get the Service
Account credentials from the specified Secret and use
[`google.CredentialsFromJSON`](https://pkg.go.dev/golang.org/x/oauth2/google#CredentialsFromJSON)
to fetch the access token. This access token will be then used as the password
and the `client_email` as the username to perform HTTP basic authentication.
### GitHub
If `.spec.provider` is set to `github`, Flux controllers will get the app
details from the specified Secret and use it to [generate an app installation
token](https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/generating-an-installation-access-token-for-a-github-app).
This token is then used as the password and [`x-access-token` as the username](https://docs.github.com/en/apps/creating-github-apps/registering-a-github-app/choosing-permissions-for-a-github-app#choosing-permissions-for-git-access)
to perform HTTP basic authentication.
### Token Caching and Refreshing
To avoid calling the upstream API for a token during every reconciliation, Flux
controllers shall cache the token after fetching it. Since GitHub tokens
self-expire, the cache shall automatically evict the token after it has expired,
triggering a fetch of a fresh token.
For GCP, the [`TokenSource`](https://pkg.go.dev/golang.org/x/oauth2@v0.10.0#TokenSource)
object will be cached, since it automatically handles refreshing an expired
token and always returns a valid token. Since a `TokenSource` never expires, it
need not be evicted from the cache.
While Azure's managed identities subsystem caches the token, it is
[recommended for the consumer application to implement their own caching](https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#token-caching)
as well.
The caches for all three providers are separate, i.e. there shall exist a
dedicated cache for each provider.
Since the proposed authentication methods for GitHub and GCP involve some form
of credentials stored in a Kubernetes Secret, the cache key can be the Secret's
`<namespace/name>`. Since authentication for Azure is configured directly via
the source-controller Deployment, the token can just be stored in a global
variable, which is refreshed whenever the token expires.