mirror of https://github.com/fluxcd/flux2.git
				
				
				
			Add RFC - Custom Health Checks for Kustomization using Common Expression Language(CEL)
Signed-off-by: Soule BA <bah.soule@gmail.com>pull/4528/head
							parent
							
								
									20fbcfadac
								
							
						
					
					
						commit
						cc0abd53c3
					
				| @ -0,0 +1,330 @@ | ||||
| # RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL) | ||||
| 
 | ||||
| **Status:** provisional | ||||
| 
 | ||||
| **Creation date:** 2024-01-05 | ||||
| 
 | ||||
| **Last update:** 2024-01-05 | ||||
| 
 | ||||
| ## Summary | ||||
| 
 | ||||
| This RFC proposes to support customization of the status readers in `Kustomizations` | ||||
| during the `healthCheck` phase for custom resources. The user will be able to declare | ||||
| the needed `conditions` in order to compute a custom resource status. | ||||
| In order to provide flexibility, we propose to use `CEL` expressions to declare  | ||||
| the expected conditions and their status. | ||||
| This will introduce a new field `customHealthChecks` in the `Kustomization` CRD | ||||
| which will be a list of `CustomHealthCheck` objects. | ||||
| 
 | ||||
| ## Motivation | ||||
| 
 | ||||
| Flux uses the `Kstatus` library during the `healthCheck` phase to compute owned  | ||||
| resources status. This works just fine for all standard resources and custom resources | ||||
| that comply with `Kstatus` interfaces. | ||||
| 
 | ||||
| In the current Kustomization implementation, we have addressed such a problem for | ||||
| kubernetes Jobs. We have implemented a `customJobStatusReader` that computes the | ||||
| status of a Job based on a defined set of conditions. This is a good solution for | ||||
| Jobs, but it is not generic and thus not applicable to other custom resources. | ||||
| 
 | ||||
| Another use case is relying on non-standard `conditions` to compute the status of | ||||
| a custom resource. For example, we might want to compute the status of a custom | ||||
| resource based on a condtion other then `Ready`. This is the case for `Resources` | ||||
| that do intermediate patching like `Certificate` where you should look at the `Issued` | ||||
| condition to know if the certificate has been issued or not before looking at the | ||||
| `Ready` condition. | ||||
| 
 | ||||
| In order to provide a generic solution for custom resources, that would not imply | ||||
| writing a custom status reader for each new custom resource, we need to provide a | ||||
| way for the user to express the `conditions` that need to be met in order to compute | ||||
| the status of a given custom resource. And we need to do this in a way that is | ||||
| flexible enough to cover all possible use cases, without having to change `Flux` | ||||
| source code for each new use case. | ||||
| 
 | ||||
| ### Goals | ||||
| 
 | ||||
| - provide a generic solution for user to customize the health check of custom resources | ||||
| - support non-standard resources in `kustomize-controller` | ||||
| 
 | ||||
| ### Non-Goals | ||||
| 
 | ||||
| - We do not plan to support custom `healthChecks` for core resources. | ||||
| 
 | ||||
| ## Proposal | ||||
| 
 | ||||
| ### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD | ||||
| 
 | ||||
| The `CustomHealthChecksExprs` field will be a list of `CustomHealthCheck` objects. | ||||
| Each `CustomHealthChecksExprs` object will have a `apiVersion`, `kind`, `inProgress`, | ||||
| `failed` and `current` fields. | ||||
| 
 | ||||
| To give an example, here is how we would declare a custom health check for a `Certificate` | ||||
| resource: | ||||
| 
 | ||||
| ```yaml | ||||
| --- | ||||
| apiVersion: cert-manager.io/v1 | ||||
| kind: Certificate | ||||
| metadata: | ||||
|   name: app-certificate | ||||
|   namespace: cert-manager | ||||
| spec: | ||||
|   commonName: cert-manager-tls | ||||
|   dnsNames: | ||||
|   - app.ns.svc.cluster.local | ||||
|   ipAddresses: | ||||
|   - x.x.x.x | ||||
|   isCA: true | ||||
|   issuerRef: | ||||
|     group: cert-manager.io | ||||
|     kind: ClusterIssuer | ||||
|     name: app-issuer | ||||
|   privateKey: | ||||
|     algorithm: RSA | ||||
|     encoding: PKCS1 | ||||
|     size: 2048 | ||||
|   secretName: app-tls-certs | ||||
|   subject: | ||||
|     organizations: | ||||
|     - example.com | ||||
| ``` | ||||
| 
 | ||||
| This `Certificate` resource will transition through the following `conditions`:  | ||||
| `Issuing` and `Ready`. | ||||
| 
 | ||||
| In order to compute the status of this resource, we need to look at both the `Issuing` | ||||
| and `Ready` conditions. | ||||
| 
 | ||||
| The resulting `Kustomization` object will look like this: | ||||
| 
 | ||||
| ```yaml | ||||
| apiVersion: kustomize.toolkit.fluxcd.io/v1beta1 | ||||
| kind: Kustomization | ||||
| metadata: | ||||
| 	name: application-kustomization | ||||
| spec: | ||||
|   force: false | ||||
|   interval: 5m0s | ||||
|   path: ./overlays/application | ||||
|   prune: false | ||||
|   sourceRef: | ||||
|     kind: GitRepository | ||||
|     name: application-git | ||||
|   healthChecks: | ||||
|   - apiVersion: cert-manager.io/v1 | ||||
|     kind: Certificate | ||||
|     name: service-certificate | ||||
|     namespace: cert-manager | ||||
|   - apiVersion: apps/v1 | ||||
|     kind: Deployment | ||||
| 	  name: app | ||||
| 	  namespace: app | ||||
|   customHealthChecksExprs: | ||||
|   - apiVersion: cert-manager.io/v1 | ||||
| 	  kind: Certificate | ||||
| 	  inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" | ||||
| 	  failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" | ||||
| 	  current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" | ||||
| ``` | ||||
| 
 | ||||
| The `HealthChecks` field still contains the objects that should be included in  | ||||
| the health assessment. The `CustomHealthChecksExprs` field will be used to declare | ||||
| the `conditions` that need to be met in order to compute the status of the custom resource. | ||||
| 
 | ||||
| Note that all core resources are discarded from the `CustomHealthChecksExprs` field. | ||||
| 
 | ||||
| 
 | ||||
| #### Provide an evaluator for `CEL` expressions for users | ||||
| 
 | ||||
| We will provide a CEL environment that can be used by the user to evaluate `CEL` | ||||
| expressions. Users will use it to test their expressions before applying them to | ||||
| their `Kustomization` object. | ||||
| 
 | ||||
| ```shell | ||||
| $ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml | ||||
| ``` | ||||
| 
 | ||||
| ### User Stories | ||||
| 
 | ||||
| #### Configure custom health checks for a custom resource | ||||
| 
 | ||||
| > As a user of Flux, I want to be able to specify custom health checks for my | ||||
| > custom resources, so that I can have more control over the status of my | ||||
| > resources. | ||||
| 
 | ||||
| #### Enable health checks support in Flux for non-standard resources | ||||
| 
 | ||||
| > As a user of Flux, I want to be able to use the health check feature for | ||||
| > non-standard resources, so that I can have more control over the status of my | ||||
| > resources. | ||||
| 
 | ||||
| ### Alternatives | ||||
| 
 | ||||
| We need an expression language that is flexible enough to cover all possible use | ||||
| cases, without having to change `Flux` source code for each new use case. | ||||
| 
 | ||||
| On alternative that have been considered is to use `cuelang` instead of `CEL`. | ||||
| `cuelang` is a more powerful expression language, but it is also more complex and  | ||||
| requires more work to integrate with `Flux`. it also does not have any support in | ||||
| `Kubernetes` yet while `CEL` is already used in `Kubernetes` and libraries are | ||||
| available to use it. | ||||
| 
 | ||||
| ## Design Details | ||||
| 
 | ||||
| ### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD | ||||
| 
 | ||||
| The `api/v1/kustomization_types.go` file will be updated to add the `CustomHealthChecksExprs` | ||||
| field to the `KustomizationSpec` struct. | ||||
| 
 | ||||
| ```go | ||||
| type KustomizationSpec struct { | ||||
| ... | ||||
| 	// A list of resources to be included in the health assessment. | ||||
| 	// +optional | ||||
| 	HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"` | ||||
| 
 | ||||
| 	// A list of custom health checks expressed as CEL expressions. | ||||
| 	// The CEL expression must evaluate to a boolean value. | ||||
| 	// +optional | ||||
| 	CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"` | ||||
| ... | ||||
| } | ||||
| 
 | ||||
| // CustomHealthCheckExprs defines the CEL expressions for custom health checks. | ||||
| // The CEL expressions must evaluate to a boolean value. The expressions are used | ||||
| // to determine the status of the custom resource. | ||||
| type CustomHealthCheckExprs struct { | ||||
| 	// apiVersion of the custom health check. | ||||
| 	// +required | ||||
| 	APIVersion string `json:"apiVersion"` | ||||
| 	// Kind of the custom health check. | ||||
| 	// +required | ||||
| 	Kind string `json:"kind"` | ||||
| 	// InProgress is the CEL expression that verifies that the status | ||||
| 	// of the custom resource is in progress. | ||||
| 	// +optional | ||||
| 	InProgress string `json:"inProgress"` | ||||
| 	// Failed is the CEL expression that verifies that the status | ||||
| 	// of the custom resource is failed. | ||||
| 	// +optional | ||||
| 	Failed string `json:"failed"` | ||||
| 	// Current is the CEL expression that verifies that the status | ||||
| 	// of the custom resource is ready. | ||||
| 	// +optional | ||||
| 	Current string `json:"current"` | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| ### Introduce a generic custom status reader | ||||
| 
 | ||||
| Introduce  a generic custom status reader that will be able to compute the status of | ||||
| a custom resource based on a list of `conditions` that need to be met. | ||||
| 
 | ||||
| ```go | ||||
| import ( | ||||
|   "k8s.io/apimachinery/pkg/runtime/schema" | ||||
| 	"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine" | ||||
|   "sigs.k8s.io/cli-utils/pkg/kstatus/polling/event" | ||||
| 	kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders" | ||||
| ) | ||||
| type customGenericStatusReader struct { | ||||
| 	genericStatusReader engine.StatusReader | ||||
| 	gvk                 schema.GroupVersionKind | ||||
| } | ||||
| 
 | ||||
| func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader { | ||||
| 	genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs)) | ||||
| 	return &customJobStatusReader{ | ||||
| 		genericStatusReader: genericStatusReader, | ||||
|     gvk:                 gvk, | ||||
| 	} | ||||
| } | ||||
| 
 | ||||
| func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool { | ||||
| 	return gk == g.gvk.GroupKind() | ||||
| } | ||||
| 
 | ||||
| func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) { | ||||
| 	return g.genericStatusReader.ReadStatus(ctx, reader, resource) | ||||
| } | ||||
| 
 | ||||
| func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) { | ||||
| 	return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource) | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| A `genericConditions` closure will takes a `kind` and a map of `CEL` expressions as parameters | ||||
| and returns a function that takes an `Unstructured` object and returns a `status.Result` object. | ||||
| 
 | ||||
| ````go | ||||
| import ( | ||||
|   "sigs.k8s.io/cli-utils/pkg/kstatus/status" | ||||
|   "github.com/fluxcd/pkg/runtime/cel" | ||||
|   "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" | ||||
| ) | ||||
| 
 | ||||
| func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) { | ||||
|   return func(u *unstructured.Unstructured) (*status.Result, error) { | ||||
| 		obj := u.UnstructuredContent() | ||||
| 
 | ||||
| 		for statusKey, expr := range exprs { | ||||
|       // Use CEL to evaluate the expression | ||||
| 			result, err := cel.ProcessExpr(expr, obj) | ||||
| 			if err != nil { | ||||
| 				return nil, err | ||||
| 			} | ||||
| 			switch statusKey { | ||||
| 			case status.CurrentStatus.String(): | ||||
|       // If the expression evaluates to true, we return the current status | ||||
| 			case status.FailedStatus.String(): | ||||
|       // If the expression evaluates to true, we return the failed status | ||||
| 			case status.InProgressStatus.String(): | ||||
|       // If the expression evaluates to true, we return the reconciling status | ||||
| 			} | ||||
| 		} | ||||
| 	} | ||||
| } | ||||
| ```` | ||||
| 
 | ||||
| The generic status reader will be used by the `statusPoller` provided to the `reconciler` | ||||
| to compute the status of the resources for the registered custom resources `kind`. | ||||
| 
 | ||||
| We will provide a `CEL` environment that will use the Kubernetes CEL library to | ||||
| evaluate the `CEL` expressions. | ||||
| 
 | ||||
| ### StatusPoller configuration | ||||
| 
 | ||||
| The `reconciler` holds a `statusPoller` that is used to compute the status of the | ||||
| resources during the `healthCheck` phase of the reconciliation. The `statusPoller` | ||||
| is configured with a list of `statusReaders` that are used to compute the status | ||||
| of the resources. | ||||
| 
 | ||||
| The `statusPoller` is not configurable once instantiated. This means | ||||
| that we cannot add new `statusReaders` to the `statusPoller` once it is created. | ||||
| This is a problem for custom resources because we need to be able to add new | ||||
| `statusReaders` for each new custom resource that is declared in the `Kustomization` | ||||
| object's `customHealthChecksExprs` field. Fortunately, the `cli-utils` library has | ||||
| been forked in the `fluxcd` organization and we can make a change to the `statusPoller` | ||||
| exposed the `statusReaders` field so that we can add new `statusReaders` to it. | ||||
| 
 | ||||
| 
 | ||||
| The `statusPoller` used by `kustomize-controller` will be updated for every reconciliation | ||||
| in order to add new polling options for custom resources that have a `CustomHealthChecksExprs` | ||||
| field defined in their `Kustomization` object. | ||||
| 
 | ||||
| ### K8s CEL Library | ||||
| 
 | ||||
| The `K8s CEL Library` is a library that provides `CEL` functions to help in evaluating | ||||
| `CEL` expressions on `Kubernetes` objects. | ||||
| 
 | ||||
| Unfortunately, this means that we will need to follow the `K8s CEL Library` releases | ||||
| in order to make sure that we are using the same version of the `CEL` library as | ||||
| `Kubernetes`. As of the time of writing this RFC, the `K8s CEL Library` is using the | ||||
| `v0.16.1` version of the `CEL` library while the latest version of the `CEL` library | ||||
| is `v0.18.2`. This means that we will need to use the `v0.16.1` version of the `CEL` | ||||
| library in order to be able to use the `K8s CEL Library`. | ||||
| 
 | ||||
| 
 | ||||
| ## Implementation History | ||||
| 
 | ||||
| See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health | ||||
					Loading…
					
					
				
		Reference in New Issue