You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
flux2/rfcs/0000-custom-health-checks
Soule BA f0e7f92ef1
Add RFC - Custom Health Checks for Kustomization using Common Expression Language(CEL)
Signed-off-by: Soule BA <bah.soule@gmail.com>
3 weeks ago
..
README.md Add RFC - Custom Health Checks for Kustomization using Common Expression Language(CEL) 3 weeks ago

README.md

RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL)

Status: provisional

Creation date: 2024-01-05

Last update: 2024-01-05

Summary

This RFC proposes to support customization of the status readers in Kustomizations during the healthCheck phase for custom resources. The user will be able to declare the needed conditions in order to compute a custom resource status. In order to provide flexibility, we propose to use CEL expressions to declare the expected conditions and their status. This will introduce a new field customHealthChecks in the Kustomization CRD which will be a list of CustomHealthCheck objects.

Motivation

Flux uses the Kstatus library during the healthCheck phase to compute owned resources status. This works just fine for all standard resources and custom resources that comply with Kstatus interfaces.

In the current Kustomization implementation, we have addressed such a problem for kubernetes Jobs. We have implemented a customJobStatusReader that computes the status of a Job based on a defined set of conditions. This is a good solution for Jobs, but it is not generic and thus not applicable to other custom resources.

Another use case is relying on non-standard conditions to compute the status of a custom resource. For example, we might want to compute the status of a custom resource based on a condtion other then Ready. This is the case for Resources that do intermediate patching like Certificate where you should look at the Issued condition to know if the certificate has been issued or not before looking at the Ready condition.

In order to provide a generic solution for custom resources, that would not imply writing a custom status reader for each new custom resource, we need to provide a way for the user to express the conditions that need to be met in order to compute the status of a given custom resource. And we need to do this in a way that is flexible enough to cover all possible use cases, without having to change Flux source code for each new use case.

Goals

  • provide a generic solution for user to customize the health check of custom resources
  • support non-standard resources in kustomize-controller

Non-Goals

  • We do not plan to support custom healthChecks for core resources.

Proposal

Introduce a new field CustomHealthChecksExprs in the Kustomization CRD

The CustomHealthChecksExprs field will be a list of CustomHealthCheck objects. Each CustomHealthChecksExprs object will have a apiVersion, kind, inProgress, failed and current fields.

To give an example, here is how we would declare a custom health check for a Certificate resource:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: app-certificate
  namespace: cert-manager
spec:
  commonName: cert-manager-tls
  dnsNames:
  - app.ns.svc.cluster.local
  ipAddresses:
  - x.x.x.x
  isCA: true
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: app-issuer
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  secretName: app-tls-certs
  subject:
    organizations:
    - example.com

This Certificate resource will transition through the following conditions: Issuing and Ready.

In order to compute the status of this resource, we need to look at both the Issuing and Ready conditions.

The resulting Kustomization object will look like this:

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
	name: application-kustomization
spec:
  force: false
  interval: 5m0s
  path: ./overlays/application
  prune: false
  sourceRef:
    kind: GitRepository
    name: application-git
  healthChecks:
  - apiVersion: cert-manager.io/v1
    kind: Certificate
    name: service-certificate
    namespace: cert-manager
  - apiVersion: apps/v1
    kind: Deployment
	  name: app
	  namespace: app
  customHealthChecksExprs:
  - apiVersion: cert-manager.io/v1
	  kind: Certificate
	  inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
	  failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')"
	  current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"

The HealthChecks field still contains the objects that should be included in the health assessment. The CustomHealthChecksExprs field will be used to declare the conditions that need to be met in order to compute the status of the custom resource.

Note that all core resources are discarded from the CustomHealthChecksExprs field.

Provide an evaluator for CEL expressions for users

We will provide a CEL environment that can be used by the user to evaluate CEL expressions. Users will use it to test their expressions before applying them to their Kustomization object.

$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml

User Stories

Configure custom health checks for a custom resource

As a user of Flux, I want to be able to specify custom health checks for my custom resources, so that I can have more control over the status of my resources.

Enable health checks support in Flux for non-standard resources

As a user of Flux, I want to be able to use the health check feature for non-standard resources, so that I can have more control over the status of my resources.

Alternatives

We need an expression language that is flexible enough to cover all possible use cases, without having to change Flux source code for each new use case.

On alternative that have been considered is to use cuelang instead of CEL. cuelang is a more powerful expression language, but it is also more complex and requires more work to integrate with Flux. it also does not have any support in Kubernetes yet while CEL is already used in Kubernetes and libraries are available to use it.

Design Details

Introduce a new field CustomHealthChecksExprs in the Kustomization CRD

The api/v1/kustomization_types.go file will be updated to add the CustomHealthChecksExprs field to the KustomizationSpec struct.

type KustomizationSpec struct {
...
	// A list of resources to be included in the health assessment.
	// +optional
	HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"`

	// A list of custom health checks expressed as CEL expressions.
	// The CEL expression must evaluate to a boolean value.
	// +optional
	CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"`
...
}

// CustomHealthCheckExprs defines the CEL expressions for custom health checks.
// The CEL expressions must evaluate to a boolean value. The expressions are used
// to determine the status of the custom resource.
type CustomHealthCheckExprs struct {
	// apiVersion of the custom health check.
	// +required
	APIVersion string `json:"apiVersion"`
	// Kind of the custom health check.
	// +required
	Kind string `json:"kind"`
	// InProgress is the CEL expression that verifies that the status
	// of the custom resource is in progress.
	// +optional
	InProgress string `json:"inProgress"`
	// Failed is the CEL expression that verifies that the status
	// of the custom resource is failed.
	// +optional
	Failed string `json:"failed"`
	// Current is the CEL expression that verifies that the status
	// of the custom resource is ready.
	// +optional
	Current string `json:"current"`
}

Introduce a generic custom status reader

Introduce a generic custom status reader that will be able to compute the status of a custom resource based on a list of conditions that need to be met.

import (
  "k8s.io/apimachinery/pkg/runtime/schema"
	"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine"
  "sigs.k8s.io/cli-utils/pkg/kstatus/polling/event"
	kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders"
)
type customGenericStatusReader struct {
	genericStatusReader engine.StatusReader
	gvk                 schema.GroupVersionKind
}

func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
	genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
	return &customJobStatusReader{
		genericStatusReader: genericStatusReader,
    gvk:                 gvk,
	}
}

func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool {
	return gk == g.gvk.GroupKind()
}

func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
	return g.genericStatusReader.ReadStatus(ctx, reader, resource)
}

func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
	return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
}

A genericConditions closure will takes a kind and a map of CEL expressions as parameters and returns a function that takes an Unstructured object and returns a status.Result object.

import (
  "sigs.k8s.io/cli-utils/pkg/kstatus/status"
  "github.com/fluxcd/pkg/runtime/cel"
  "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
)

func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) {
  return func(u *unstructured.Unstructured) (*status.Result, error) {
		obj := u.UnstructuredContent()

		for statusKey, expr := range exprs {
      // Use CEL to evaluate the expression
			result, err := cel.ProcessExpr(expr, obj)
			if err != nil {
				return nil, err
			}
			switch statusKey {
			case status.CurrentStatus.String():
      // If the expression evaluates to true, we return the current status
			case status.FailedStatus.String():
      // If the expression evaluates to true, we return the failed status
			case status.InProgressStatus.String():
      // If the expression evaluates to true, we return the reconciling status
			}
		}
	}
}

The generic status reader will be used by the statusPoller provided to the reconciler to compute the status of the resources for the registered custom resources kind.

We will provide a CEL environment that will use the Kubernetes CEL library to evaluate the CEL expressions.

StatusPoller configuration

The reconciler holds a statusPoller that is used to compute the status of the resources during the healthCheck phase of the reconciliation. The statusPoller is configured with a list of statusReaders that are used to compute the status of the resources.

The statusPoller is not configurable once instantiated. This means that we cannot add new statusReaders to the statusPoller once it is created. This is a problem for custom resources because we need to be able to add new statusReaders for each new custom resource that is declared in the Kustomization object's customHealthChecksExprs field. Fortunately, the cli-utils library has been forked in the fluxcd organization and we can make a change to the statusPoller exposed the statusReaders field so that we can add new statusReaders to it.

The statusPoller used by kustomize-controller will be updated for every reconciliation in order to add new polling options for custom resources that have a CustomHealthChecksExprs field defined in their Kustomization object.

K8s CEL Library

The K8s CEL Library is a library that provides CEL functions to help in evaluating CEL expressions on Kubernetes objects.

Unfortunately, this means that we will need to follow the K8s CEL Library releases in order to make sure that we are using the same version of the CEL library as Kubernetes. As of the time of writing this RFC, the K8s CEL Library is using the v0.16.1 version of the CEL library while the latest version of the CEL library is v0.18.2. This means that we will need to use the v0.16.1 version of the CEL library in order to be able to use the K8s CEL Library.

Implementation History

See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health