mirror of https://github.com/fluxcd/flux2.git
				
				
				
			
			You cannot select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
	
	
		
			317 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
			
		
		
	
	
			317 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
| # RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL)
 | |
| 
 | |
| **Status:** provisional
 | |
| 
 | |
| **Creation date:** 2024-01-05
 | |
| 
 | |
| **Last update:** 2025-01-17
 | |
| 
 | |
| ## Summary
 | |
| 
 | |
| This RFC proposes to extend the Flux `Kustomization` API with custom health checks for
 | |
| custom resources using the Common Expression Language (CEL).
 | |
| 
 | |
| In order to provide flexibility, we propose to use CEL expressions for defining the
 | |
| conditions that need to be met in order to determine the status of a custom resource.
 | |
| We will introduce a new field called `healthCheckExprs` in the `Kustomization` CRD
 | |
| which will be a list of CEL expressions for evaluating the status of a particular
 | |
| Kubernetes resource kind.
 | |
| 
 | |
| ## Motivation
 | |
| 
 | |
| Flux uses the `kstatus` library during the health check phase to compute owned 
 | |
| resources status. This works just fine for all the Kubernetes core resources
 | |
| and custom resources that comply with the `kstatus` conventions.
 | |
| 
 | |
| There are cases where the status of a custom resource does not follow the
 | |
| `kstatus` conventions. For example, we might want to compute the status of a custom
 | |
| resource based on a condition other than `Ready`. This is the case for resources
 | |
| that do intermediate patching like `Certificate` where you should look at the `Issued`
 | |
| condition to know if the certificate has been issued or not before looking at the
 | |
| `Ready` condition.
 | |
| 
 | |
| In order to provide a generic solution for custom resources, that would not imply
 | |
| writing a custom `kstatus` reader for each CRD, we need to provide a way for the user
 | |
| to express the conditions that need to be met in order to determine the status.
 | |
| It should be done in a way that is flexible enough to cover all possible use cases,
 | |
| without having to change Flux source code for each new CRD.
 | |
| 
 | |
| ### Goals
 | |
| 
 | |
| - Provide a generic solution for users to customise the health check evaluation of custom resources.
 | |
| - Provide a space for the community to contribute custom health checks for popular custom resources.
 | |
| 
 | |
| ### Non-Goals
 | |
| 
 | |
| - We do not plan to support custom health checks for Kubernetes core resources.
 | |
| 
 | |
| ## Proposal
 | |
| 
 | |
| ### Introduce a new field `HealthCheckExprs` in the `Kustomization` CRD
 | |
| 
 | |
| The `HealthCheckExprs` field will be a list of `CustomHealthCheck` objects.
 | |
| The `CustomHealthCheck` object fields would be: `apiVersion`, `kind`, `inProgress`,
 | |
| `failed` and `current`.
 | |
| 
 | |
| To give an example, here is how we would declare a custom health check for a `Certificate`
 | |
| resource:
 | |
| 
 | |
| ```yaml
 | |
| ---
 | |
| apiVersion: cert-manager.io/v1
 | |
| kind: Certificate
 | |
| metadata:
 | |
|   name: app-certificate
 | |
| spec:
 | |
|   commonName: cert-manager-tls
 | |
|   dnsNames:
 | |
|   - app.ns.svc.cluster.local
 | |
|   ipAddresses:
 | |
|   - x.x.x.x
 | |
|   isCA: true
 | |
|   issuerRef:
 | |
|     group: cert-manager.io
 | |
|     kind: ClusterIssuer
 | |
|     name: app-issuer
 | |
|   secretName: app-tls-certs
 | |
|   subject:
 | |
|     organizations:
 | |
|     - example.com
 | |
| ```
 | |
| 
 | |
| This `Certificate` resource will transition through the following `conditions`: 
 | |
| `Issuing` and `Ready`.
 | |
| 
 | |
| In order to compute the status of this resource, we need to look at both the `Issuing`
 | |
| and `Ready` conditions.
 | |
| 
 | |
| The Flux `Kustomization` object used to apply the `Certificate` will look like this:
 | |
| 
 | |
| ```yaml
 | |
| apiVersion: kustomize.toolkit.fluxcd.io/v1
 | |
| kind: Kustomization
 | |
| metadata:
 | |
|   name: certs
 | |
| spec:
 | |
|   interval: 5m
 | |
|   prune: true
 | |
|   sourceRef:
 | |
|     kind: GitRepository
 | |
|     name: flux-system
 | |
|   path: ./certs
 | |
|   wait: true
 | |
|   healthCheckExprs:
 | |
|   - apiVersion: cert-manager.io/v1
 | |
|     kind: Certificate
 | |
|     inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
 | |
|     failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')"
 | |
|     current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
 | |
| ```
 | |
| 
 | |
| The `.spec.healthCheckExprs` field contains an entry for the `Certificate` kind, its `apiVersion`,
 | |
| and the CEL expressions that need to be met in order to determine the health status of all custom resources
 | |
| of this kind reconciled by the Flux `Kustomization`.
 | |
| 
 | |
| Note that all the Kubernetes core resources are discarded from the `healthCheckExprs` list.
 | |
| 
 | |
| ### Custom Health Check Library
 | |
| 
 | |
| To help users define custom health checks, we will provide on the [fluxcd.io](https://fluxcd.io)
 | |
| website a library of custom health checks for popular custom resources.
 | |
| 
 | |
| The Flux community will be able to contribute to this library by submitting pull requests
 | |
| to the [fluxcd/website](https://github.com/fluxcd/website) repository. 
 | |
| 
 | |
| ### User Stories
 | |
| 
 | |
| #### Configure health checks for non-standard custom resources
 | |
| 
 | |
| > As a Flux user, I want to be able to specify health checks for
 | |
| > custom resources that don't have a Ready condition, so that I can be notified
 | |
| > when the status of my resources transitions to a failed state based on the evaluation
 | |
| > of a different condition.
 | |
| 
 | |
| Using `.spec.healthCheckExprs`, Flux users have the ability to
 | |
| specify the conditions that need to be met in order to determine the status of
 | |
| a custom resource. This enables Flux to query any `.status` field,
 | |
| besides the standard `Ready` condition, and evaluate it using a CEL expression.
 | |
| 
 | |
| Example for `SealedSecret` which has a `Synced` condition:
 | |
| 
 | |
| ```yaml
 | |
|   - apiVersion: bitnami.com/v1alpha1
 | |
|     kind: SealedSecret
 | |
|     inProgress: "metadata.generation != status.observedGeneration"
 | |
|     failed: "status.conditions.filter(e, e.type == 'Synced').all(e, e.status == 'False')"
 | |
|     current: "status.conditions.filter(e, e.type == 'Synced').all(e, e.status == 'True')"
 | |
| ```
 | |
| 
 | |
| #### Use Flux dependencies for Kubernetes ClusterAPI
 | |
| 
 | |
| > As a Flux user, I want to be able to use Flux dependencies bases on the 
 | |
| > readiness of ClusterAPI resources, so that I can ensure that my applications
 | |
| > are deployed only when the ClusterAPI resources are ready.
 | |
| 
 | |
| The ClusterAPI resources have a `Ready` condition, but this is set in the status
 | |
| after the cluster is first created. Given this behavior, at creation time, Flux
 | |
| cannot find any condition to evaluate the status of the ClusterAPI resources,
 | |
| thus it considers them as static resources which are always ready.
 | |
| 
 | |
| Using `.spec.healthCheckExprs`, Flux users can specify that the `Cluster`
 | |
| kind is expected to have a `Ready` condition which will force Flux into waiting
 | |
| for the ClusterAPI resources status to be populated.
 | |
| 
 | |
| Example for `Cluster`:
 | |
| 
 | |
| ```yaml
 | |
|   - apiVersion: cluster.x-k8s.io/v1beta1
 | |
|     kind: Cluster
 | |
|     inProgress: "metadata.generation != status.observedGeneration"
 | |
|     failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.status == 'False')"
 | |
|     current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.status == 'True')"
 | |
| ```
 | |
| 
 | |
| ### Alternatives
 | |
| 
 | |
| We need an expression language that is flexible enough to cover all possible use
 | |
| cases, without having to change Flux source code for each new use case.
 | |
| 
 | |
| An alternative that has been considered was to use `CUE` instead of `CEL`.
 | |
| `CUE` lang is a more powerful expression language, but given the fact that
 | |
| Kubernetes makes use of `CEL` for CRD validation and admission control,
 | |
| we have decided to also use `CEL` in Flux in order to be consistent with
 | |
| the Kubernetes ecosystem.
 | |
| 
 | |
| ## Design Details
 | |
| 
 | |
| ### Introduce a new field `HealthCheckExprs` in the `Kustomization` CRD
 | |
| 
 | |
| The `api/v1/kustomization_types.go` file will be updated to add the `HealthCheckExprs`
 | |
| field to the `KustomizationSpec` struct.
 | |
| 
 | |
| ```go
 | |
| type KustomizationSpec struct {
 | |
| 	// +optional
 | |
| 	HealthCheckExprs []CustomHealthCheck `json:"healthCheckExprs,omitempty"`
 | |
| }
 | |
| 
 | |
| type CustomHealthCheck struct {
 | |
| 	// APIVersion of the custom resource under evaluation.
 | |
| 	// +required
 | |
| 	APIVersion string `json:"apiVersion"`
 | |
| 	// Kind of the custom resource under evaluation.
 | |
| 	// +required
 | |
| 	Kind string `json:"kind"`
 | |
| 	// Current is the CEL expression that determines if the status
 | |
| 	// of the custom resource has reached the desired state.
 | |
| 	// +required
 | |
| 	Current string `json:"current"`
 | |
| 	// InProgress is the CEL expression that determines if the status
 | |
| 	// of the custom resource has not yet reached the desired state.
 | |
| 	// +optional
 | |
| 	InProgress string `json:"inProgress,omitempty"`
 | |
| 	// Failed is the CEL expression that determines if the status
 | |
| 	// of the custom resource has failed to reach the desired state.
 | |
| 	// +optional
 | |
| 	Failed string `json:"failed,omitempty"`
 | |
| }
 | |
| ```
 | |
| 
 | |
| If a CEL expression evaluation results in an error, for example looking for a field that does not exist,
 | |
| the health check will fail. Users will be encouraged to test their expressions
 | |
| in the [CEL Playground](https://playcel.undistro.io/). Here is where the community maintained
 | |
| [library](#custom-health-check-library) will be super useful as some of the expressions might be complex.
 | |
| 
 | |
| The `InProgress` expression is optional, when not specified the  controller will determine
 | |
| if the resource is in progress if both `Failed` and `Current` evaluate to `false`.
 | |
| Moreover, if the `InProgress` expression is not specified and the custom resource has a
 | |
| `.status.observedGeneration` field, the controller with compare it with the `.metadata.generation`
 | |
| field to determine if the resource is in progress.
 | |
| 
 | |
| The `Failed` expression is optional, when not specified the controller will keep evaluating the
 | |
| `Current` expression until it returns `true`, and will give up after the timeout is reached.
 | |
| Users will be encouraged to provide a `Failed` expression to avoid staling the reconciliation
 | |
| loop until the timeout is reached.
 | |
| 
 | |
| ### Introduce a generic custom status reader
 | |
| 
 | |
| We'll Introduce a `StatusReader` that will be used to compute the status
 | |
| of custom resources based on the `CEL` expressions provided in the `CustomHealthCheck`:
 | |
| 
 | |
| ```go
 | |
| import (
 | |
|   "k8s.io/apimachinery/pkg/runtime/schema"
 | |
|   "github.com/fluxcd/cli-utils/pkg/kstatus/polling/engine"
 | |
|   "github.com/fluxcd/cli-utils/pkg/kstatus/polling/event"
 | |
|   kstatusreaders "github.com/fluxcd/cli-utils/pkg/kstatus/polling/statusreaders"
 | |
| )
 | |
| 
 | |
| type CELStatusReader struct {
 | |
| 	genericStatusReader engine.StatusReader
 | |
| 	gvk                 schema.GroupVersionKind
 | |
| }
 | |
| 
 | |
| func NewCELStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
 | |
| 	genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
 | |
| 	return &CELStatusReader{
 | |
| 		genericStatusReader: genericStatusReader,
 | |
| 		gvk:                 gvk,
 | |
| 	}
 | |
| }
 | |
| 
 | |
| func (g *CELStatusReader) Supports(gk schema.GroupKind) bool {
 | |
| 	return gk == g.gvk.GroupKind()
 | |
| }
 | |
| 
 | |
| func (g *CELStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
 | |
| 	return g.genericStatusReader.ReadStatus(ctx, reader, resource)
 | |
| }
 | |
| 
 | |
| func (g *CELStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
 | |
| 	return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
 | |
| }
 | |
| ```
 | |
| 
 | |
| The `genericConditions` function will take a `kind` and a map of `CEL` expressions as parameters
 | |
| and returns a function that takes an `Unstructured` object and returns a `status.Result` object.
 | |
| 
 | |
| ````go
 | |
| import (
 | |
|   "github.com/fluxcd/cli-utils/pkg/kstatus/status"
 | |
|   "github.com/fluxcd/pkg/runtime/cel"
 | |
|   "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
 | |
| )
 | |
| 
 | |
| func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) {
 | |
|   return func(u *unstructured.Unstructured) (*status.Result, error) {
 | |
| 		obj := u.UnstructuredContent()
 | |
| 
 | |
| 		for statusKey, expr := range exprs {
 | |
| 			// Use CEL to evaluate the expression
 | |
| 			result, err := cel.ProcessExpr(expr, obj)
 | |
| 			if err != nil {
 | |
| 				// handle error
 | |
| 			}
 | |
| 			switch statusKey {
 | |
| 			case status.CurrentStatus.String():
 | |
| 			// If the expression evaluates to true, we return the current status
 | |
| 			case status.FailedStatus.String():
 | |
| 			// If the expression evaluates to true, we return the failed status
 | |
| 			case status.InProgressStatus.String():
 | |
| 			// If the expression evaluates to true, we return the reconciling status
 | |
| 			}
 | |
| 		}
 | |
| 		
 | |
| 		// If none of the expressions evaluate to true, we return the reconciling status
 | |
| 	}
 | |
| }
 | |
| ````
 | |
| 
 | |
| The CEL status reader will be used by the `statusPoller` provided to the kustomize-controller `reconciler`
 | |
| to compute the status of the resources for the registered custom resources GVKs.
 | |
| 
 | |
| We will implement a `CEL` environment that will use the Kubernetes CEL library to evaluate the `CEL` expressions.
 | |
| 
 | |
| ## Implementation History
 | |
| 
 |