Rework the custom health check spec

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
pull/5151/head
Stefan Prodan 3 weeks ago
parent f0e7f92ef1
commit 3346542917
No known key found for this signature in database
GPG Key ID: 3299AEB0E4085BAF

@ -4,42 +4,37 @@
**Creation date:** 2024-01-05 **Creation date:** 2024-01-05
**Last update:** 2024-01-05 **Last update:** 2025-01-17
## Summary ## Summary
This RFC proposes to support customization of the status readers in `Kustomizations` This RFC proposes to extend the Flux `Kustomization` API with custom health checks for
during the `healthCheck` phase for custom resources. The user will be able to declare custom resources using the Common Expression Language (CEL).
the needed `conditions` in order to compute a custom resource status.
In order to provide flexibility, we propose to use `CEL` expressions to declare
the expected conditions and their status.
This will introduce a new field `customHealthChecks` in the `Kustomization` CRD
which will be a list of `CustomHealthCheck` objects.
## Motivation In order to provide flexibility, we propose to use CEL expressions for defining the
conditions that need to be met in order to determine the status of a custom resource.
We will introduce a new field called `healthCheckExprs` in the `Kustomization` CRD
which will be a list of CEL expressions for evaluating the status of a particular
Kubernetes resource kind.
Flux uses the `Kstatus` library during the `healthCheck` phase to compute owned ## Motivation
resources status. This works just fine for all standard resources and custom resources
that comply with `Kstatus` interfaces.
In the current Kustomization implementation, we have addressed such a problem for Flux uses the `kstatus` library during the health check phase to compute owned
kubernetes Jobs. We have implemented a `customJobStatusReader` that computes the resources status. This works just fine for all the Kubernetes core resources
status of a Job based on a defined set of conditions. This is a good solution for and custom resources that comply with the `kstatus` conventions.
Jobs, but it is not generic and thus not applicable to other custom resources.
Another use case is relying on non-standard `conditions` to compute the status of There are cases where the status of a custom resource does not follow the
a custom resource. For example, we might want to compute the status of a custom `kstatus` conventions. For example, we might want to compute the status of a custom
resource based on a condtion other then `Ready`. This is the case for `Resources` resource based on a condition other than `Ready`. This is the case for resources
that do intermediate patching like `Certificate` where you should look at the `Issued` that do intermediate patching like `Certificate` where you should look at the `Issued`
condition to know if the certificate has been issued or not before looking at the condition to know if the certificate has been issued or not before looking at the
`Ready` condition. `Ready` condition.
In order to provide a generic solution for custom resources, that would not imply In order to provide a generic solution for custom resources, that would not imply
writing a custom status reader for each new custom resource, we need to provide a writing a custom `kstatus` reader for each CRD, we need to provide a way for the user
way for the user to express the `conditions` that need to be met in order to compute to express the conditions that need to be met in order to determine the status.
the status of a given custom resource. And we need to do this in a way that is And we need to do this in a way that is flexible enough to cover all possible use cases,
flexible enough to cover all possible use cases, without having to change `Flux` without having to change Flux source code for each new use case.
source code for each new use case.
### Goals ### Goals
@ -48,15 +43,15 @@ source code for each new use case.
### Non-Goals ### Non-Goals
- We do not plan to support custom `healthChecks` for core resources. - We do not plan to support custom health checks for Kubernetes core resources.
## Proposal ## Proposal
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD ### Introduce a new field `HealthCheckExprs` in the `Kustomization` CRD
The `CustomHealthChecksExprs` field will be a list of `CustomHealthCheck` objects. The `HealthCheckExprs` field will be a list of `CustomHealthCheck` objects.
Each `CustomHealthChecksExprs` object will have a `apiVersion`, `kind`, `inProgress`, The `CustomHealthCheck` object fields would be: `apiVersion`, `kind`, `inProgress`,
`failed` and `current` fields. `failed` and `current`.
To give an example, here is how we would declare a custom health check for a `Certificate` To give an example, here is how we would declare a custom health check for a `Certificate`
resource: resource:
@ -67,7 +62,6 @@ apiVersion: cert-manager.io/v1
kind: Certificate kind: Certificate
metadata: metadata:
name: app-certificate name: app-certificate
namespace: cert-manager
spec: spec:
commonName: cert-manager-tls commonName: cert-manager-tls
dnsNames: dnsNames:
@ -79,10 +73,6 @@ spec:
group: cert-manager.io group: cert-manager.io
kind: ClusterIssuer kind: ClusterIssuer
name: app-issuer name: app-issuer
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
secretName: app-tls-certs secretName: app-tls-certs
subject: subject:
organizations: organizations:
@ -95,31 +85,22 @@ This `Certificate` resource will transition through the following `conditions`:
In order to compute the status of this resource, we need to look at both the `Issuing` In order to compute the status of this resource, we need to look at both the `Issuing`
and `Ready` conditions. and `Ready` conditions.
The resulting `Kustomization` object will look like this: The Flux `Kustomization` object used to apply the `Certificate` will look like this:
```yaml ```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1 apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization kind: Kustomization
metadata: metadata:
name: application-kustomization name: certs
spec: spec:
force: false interval: 5m
interval: 5m0s prune: true
path: ./overlays/application
prune: false
sourceRef: sourceRef:
kind: GitRepository kind: GitRepository
name: application-git name: flux-system
healthChecks: path: ./certs
- apiVersion: cert-manager.io/v1 wait: true
kind: Certificate healthCheckExprs:
name: service-certificate
namespace: cert-manager
- apiVersion: apps/v1
kind: Deployment
name: app
namespace: app
customHealthChecksExprs:
- apiVersion: cert-manager.io/v1 - apiVersion: cert-manager.io/v1
kind: Certificate kind: Certificate
inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
@ -127,138 +108,132 @@ spec:
current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
``` ```
The `HealthChecks` field still contains the objects that should be included in The `.spec.healthCheckExprs` field contains an entry for the `Certificate` kind, its `apiVersion`,
the health assessment. The `CustomHealthChecksExprs` field will be used to declare and the CEL expressions that need to be met in order to determine the health status of all custom resources
the `conditions` that need to be met in order to compute the status of the custom resource. of this kind reconciled by the Flux `Kustomization`.
Note that all core resources are discarded from the `CustomHealthChecksExprs` field.
Note that all the Kubernetes core resources are discarded from the `healthCheckExprs` list.
#### Provide an evaluator for `CEL` expressions for users ### User Stories
We will provide a CEL environment that can be used by the user to evaluate `CEL` #### Configure health checks for non-standard custom resources
expressions. Users will use it to test their expressions before applying them to
their `Kustomization` object.
```shell > As a Flux user, I want to be able to specify health checks for
$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml > custom resources that don't have a Ready condition, so that I can be notified
``` > when the status of my resources transitions to a failed state based on the evaluation
> of a different condition.
### User Stories Using `.spec.healthCheckExprs`, Flux users have the ability to
specify the conditions that need to be met in order to determine the status of
a custom resource. This enables Flux to query any `.status` field,
besides the standard `Ready` condition, and evaluate it using a CEL expression.
#### Configure custom health checks for a custom resource #### Use Flux dependencies for Kubernetes ClusterAPI
> As a user of Flux, I want to be able to specify custom health checks for my > As a Flux user, I want to be able to use Flux dependencies bases on the
> custom resources, so that I can have more control over the status of my > readiness of ClusterAPI resources, so that I can ensure that my applications
> resources. > are deployed only when the ClusterAPI resources are ready.
#### Enable health checks support in Flux for non-standard resources The ClusterAPI resources have a `Ready` condition, but this is set in the status
after the cluster is first created. Given this behavior, at creation time, Flux
cannot find any condition to evaluate the status of the ClusterAPI resources,
thus it considers them as static resources which are always ready.
> As a user of Flux, I want to be able to use the health check feature for Using `.spec.healthCheckExprs`, Flux users can specify that the `Cluster`
> non-standard resources, so that I can have more control over the status of my kind is expected to have a `Ready` condition which will force Flux into waiting
> resources. for the ClusterAPI resources status to be populated.
### Alternatives ### Alternatives
We need an expression language that is flexible enough to cover all possible use We need an expression language that is flexible enough to cover all possible use
cases, without having to change `Flux` source code for each new use case. cases, without having to change Flux source code for each new use case.
On alternative that have been considered is to use `cuelang` instead of `CEL`. An alternative that have been considered was to use `CUE` instead of `CEL`.
`cuelang` is a more powerful expression language, but it is also more complex and `CUE` lang is a more powerful expression language, but given the fact that
requires more work to integrate with `Flux`. it also does not have any support in Kubernetes makes use of `CEL` for CRD validation and admission control,
`Kubernetes` yet while `CEL` is already used in `Kubernetes` and libraries are we have decided to also use `CEL` in Flux in order to be consistent with
available to use it. the Kubernetes ecosystem.
## Design Details ## Design Details
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD ### Introduce a new field `HealthCheckExprs` in the `Kustomization` CRD
The `api/v1/kustomization_types.go` file will be updated to add the `CustomHealthChecksExprs` The `api/v1/kustomization_types.go` file will be updated to add the `HealthCheckExprs`
field to the `KustomizationSpec` struct. field to the `KustomizationSpec` struct.
```go ```go
type KustomizationSpec struct { type KustomizationSpec struct {
...
// A list of resources to be included in the health assessment.
// +optional
HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"`
// A list of custom health checks expressed as CEL expressions.
// The CEL expression must evaluate to a boolean value.
// +optional // +optional
CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"` HealthCheckExprs []CustomHealthCheck `json:"healthCheckExprs,omitempty"`
...
} }
// CustomHealthCheckExprs defines the CEL expressions for custom health checks. type CustomHealthCheck struct {
// The CEL expressions must evaluate to a boolean value. The expressions are used // APIVersion of the custom resource under evaluation.
// to determine the status of the custom resource.
type CustomHealthCheckExprs struct {
// apiVersion of the custom health check.
// +required // +required
APIVersion string `json:"apiVersion"` APIVersion string `json:"apiVersion"`
// Kind of the custom health check. // Kind of the custom resource under evaluation.
// +required // +required
Kind string `json:"kind"` Kind string `json:"kind"`
// InProgress is the CEL expression that verifies that the status // Current is the CEL expression that determines if the status
// of the custom resource is in progress. // of the custom resource has reached the desired state.
// +optional // +required
InProgress string `json:"inProgress"` Current string `json:"current"`
// Failed is the CEL expression that verifies that the status // InProgress is the CEL expression that determines if the status
// of the custom resource is failed. // of the custom resource has not yet reached the desired state.
// +optional // +optional
Failed string `json:"failed"` InProgress string `json:"inProgress,omitempty"`
// Current is the CEL expression that verifies that the status // Failed is the CEL expression that determines if the status
// of the custom resource is ready. // of the custom resource has failed to reach the desired state.
// +optional // +optional
Current string `json:"current"` Failed string `json:"failed,omitempty"`
} }
``` ```
### Introduce a generic custom status reader ### Introduce a generic custom status reader
Introduce a generic custom status reader that will be able to compute the status of We'll Introduce a `StatusReader` that will be used to compute the status
a custom resource based on a list of `conditions` that need to be met. of custom resources based on the `CEL` expressions provided in the `CustomHealthCheck`:
```go ```go
import ( import (
"k8s.io/apimachinery/pkg/runtime/schema" "k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine" "github.com/fluxcd/cli-utils/pkg/kstatus/polling/engine"
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/event" "github.com/fluxcd/cli-utils/pkg/kstatus/polling/event"
kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders" kstatusreaders "github.com/fluxcd/cli-utils/pkg/kstatus/polling/statusreaders"
) )
type customGenericStatusReader struct {
type CELStatusReader struct {
genericStatusReader engine.StatusReader genericStatusReader engine.StatusReader
gvk schema.GroupVersionKind gvk schema.GroupVersionKind
} }
func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader { func NewCELStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs)) genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
return &customJobStatusReader{ return &CELStatusReader{
genericStatusReader: genericStatusReader, genericStatusReader: genericStatusReader,
gvk: gvk, gvk: gvk,
} }
} }
func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool { func (g *CELStatusReader) Supports(gk schema.GroupKind) bool {
return gk == g.gvk.GroupKind() return gk == g.gvk.GroupKind()
} }
func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) { func (g *CELStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
return g.genericStatusReader.ReadStatus(ctx, reader, resource) return g.genericStatusReader.ReadStatus(ctx, reader, resource)
} }
func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) { func (g *CELStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource) return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
} }
``` ```
A `genericConditions` closure will takes a `kind` and a map of `CEL` expressions as parameters The `genericConditions` function will take a `kind` and a map of `CEL` expressions as parameters
and returns a function that takes an `Unstructured` object and returns a `status.Result` object. and returns a function that takes an `Unstructured` object and returns a `status.Result` object.
````go ````go
import ( import (
"sigs.k8s.io/cli-utils/pkg/kstatus/status" "github.com/fluxcd/cli-utils/pkg/kstatus/status"
"github.com/fluxcd/pkg/runtime/cel" "github.com/fluxcd/pkg/runtime/cel"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
) )
@ -286,45 +261,10 @@ func genericConditions(kind string, exprs map[string]string) func(u *unstructure
} }
```` ````
The generic status reader will be used by the `statusPoller` provided to the `reconciler` The CEL status reader will be used by the `statusPoller` provided to the kustomize-controller `reconciler`
to compute the status of the resources for the registered custom resources `kind`. to compute the status of the resources for the registered custom resources GVKs.
We will provide a `CEL` environment that will use the Kubernetes CEL library to
evaluate the `CEL` expressions.
### StatusPoller configuration
The `reconciler` holds a `statusPoller` that is used to compute the status of the
resources during the `healthCheck` phase of the reconciliation. The `statusPoller`
is configured with a list of `statusReaders` that are used to compute the status
of the resources.
The `statusPoller` is not configurable once instantiated. This means
that we cannot add new `statusReaders` to the `statusPoller` once it is created.
This is a problem for custom resources because we need to be able to add new
`statusReaders` for each new custom resource that is declared in the `Kustomization`
object's `customHealthChecksExprs` field. Fortunately, the `cli-utils` library has
been forked in the `fluxcd` organization and we can make a change to the `statusPoller`
exposed the `statusReaders` field so that we can add new `statusReaders` to it.
The `statusPoller` used by `kustomize-controller` will be updated for every reconciliation
in order to add new polling options for custom resources that have a `CustomHealthChecksExprs`
field defined in their `Kustomization` object.
### K8s CEL Library
The `K8s CEL Library` is a library that provides `CEL` functions to help in evaluating
`CEL` expressions on `Kubernetes` objects.
Unfortunately, this means that we will need to follow the `K8s CEL Library` releases
in order to make sure that we are using the same version of the `CEL` library as
`Kubernetes`. As of the time of writing this RFC, the `K8s CEL Library` is using the
`v0.16.1` version of the `CEL` library while the latest version of the `CEL` library
is `v0.18.2`. This means that we will need to use the `v0.16.1` version of the `CEL`
library in order to be able to use the `K8s CEL Library`.
We will implement a `CEL` environment that will use the Kubernetes CEL library to evaluate the `CEL` expressions.
## Implementation History ## Implementation History
See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health

Loading…
Cancel
Save