Add RFC - Custom Health Checks for Kustomization using Common Expression Language(CEL)

Signed-off-by: Soule BA <bah.soule@gmail.com>
pull/4528/head
Soule BA 1 year ago
parent 20fbcfadac
commit cc0abd53c3
No known key found for this signature in database
GPG Key ID: 4D40965192802994

@ -0,0 +1,330 @@
# RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL)
**Status:** provisional
**Creation date:** 2024-01-05
**Last update:** 2024-01-05
## Summary
This RFC proposes to support customization of the status readers in `Kustomizations`
during the `healthCheck` phase for custom resources. The user will be able to declare
the needed `conditions` in order to compute a custom resource status.
In order to provide flexibility, we propose to use `CEL` expressions to declare
the expected conditions and their status.
This will introduce a new field `customHealthChecks` in the `Kustomization` CRD
which will be a list of `CustomHealthCheck` objects.
## Motivation
Flux uses the `Kstatus` library during the `healthCheck` phase to compute owned
resources status. This works just fine for all standard resources and custom resources
that comply with `Kstatus` interfaces.
In the current Kustomization implementation, we have addressed such a problem for
kubernetes Jobs. We have implemented a `customJobStatusReader` that computes the
status of a Job based on a defined set of conditions. This is a good solution for
Jobs, but it is not generic and thus not applicable to other custom resources.
Another use case is relying on non-standard `conditions` to compute the status of
a custom resource. For example, we might want to compute the status of a custom
resource based on a condtion other then `Ready`. This is the case for `Resources`
that do intermediate patching like `Certificate` where you should look at the `Issued`
condition to know if the certificate has been issued or not before looking at the
`Ready` condition.
In order to provide a generic solution for custom resources, that would not imply
writing a custom status reader for each new custom resource, we need to provide a
way for the user to express the `conditions` that need to be met in order to compute
the status of a given custom resource. And we need to do this in a way that is
flexible enough to cover all possible use cases, without having to change `Flux`
source code for each new use case.
### Goals
- provide a generic solution for user to customize the health check of custom resources
- support non-standard resources in `kustomize-controller`
### Non-Goals
- We do not plan to support custom `healthChecks` for core resources.
## Proposal
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD
The `CustomHealthChecksExprs` field will be a list of `CustomHealthCheck` objects.
Each `CustomHealthChecksExprs` object will have a `apiVersion`, `kind`, `inProgress`,
`failed` and `current` fields.
To give an example, here is how we would declare a custom health check for a `Certificate`
resource:
```yaml
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: app-certificate
namespace: cert-manager
spec:
commonName: cert-manager-tls
dnsNames:
- app.ns.svc.cluster.local
ipAddresses:
- x.x.x.x
isCA: true
issuerRef:
group: cert-manager.io
kind: ClusterIssuer
name: app-issuer
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
secretName: app-tls-certs
subject:
organizations:
- example.com
```
This `Certificate` resource will transition through the following `conditions`:
`Issuing` and `Ready`.
In order to compute the status of this resource, we need to look at both the `Issuing`
and `Ready` conditions.
The resulting `Kustomization` object will look like this:
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
name: application-kustomization
spec:
force: false
interval: 5m0s
path: ./overlays/application
prune: false
sourceRef:
kind: GitRepository
name: application-git
healthChecks:
- apiVersion: cert-manager.io/v1
kind: Certificate
name: service-certificate
namespace: cert-manager
- apiVersion: apps/v1
kind: Deployment
name: app
namespace: app
customHealthChecksExprs:
- apiVersion: cert-manager.io/v1
kind: Certificate
inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')"
current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
```
The `HealthChecks` field still contains the objects that should be included in
the health assessment. The `CustomHealthChecksExprs` field will be used to declare
the `conditions` that need to be met in order to compute the status of the custom resource.
Note that all core resources are discarded from the `CustomHealthChecksExprs` field.
#### Provide an evaluator for `CEL` expressions for users
We will provide a CEL environment that can be used by the user to evaluate `CEL`
expressions. Users will use it to test their expressions before applying them to
their `Kustomization` object.
```shell
$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml
```
### User Stories
#### Configure custom health checks for a custom resource
> As a user of Flux, I want to be able to specify custom health checks for my
> custom resources, so that I can have more control over the status of my
> resources.
#### Enable health checks support in Flux for non-standard resources
> As a user of Flux, I want to be able to use the health check feature for
> non-standard resources, so that I can have more control over the status of my
> resources.
### Alternatives
We need an expression language that is flexible enough to cover all possible use
cases, without having to change `Flux` source code for each new use case.
On alternative that have been considered is to use `cuelang` instead of `CEL`.
`cuelang` is a more powerful expression language, but it is also more complex and
requires more work to integrate with `Flux`. it also does not have any support in
`Kubernetes` yet while `CEL` is already used in `Kubernetes` and libraries are
available to use it.
## Design Details
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD
The `api/v1/kustomization_types.go` file will be updated to add the `CustomHealthChecksExprs`
field to the `KustomizationSpec` struct.
```go
type KustomizationSpec struct {
...
// A list of resources to be included in the health assessment.
// +optional
HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"`
// A list of custom health checks expressed as CEL expressions.
// The CEL expression must evaluate to a boolean value.
// +optional
CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"`
...
}
// CustomHealthCheckExprs defines the CEL expressions for custom health checks.
// The CEL expressions must evaluate to a boolean value. The expressions are used
// to determine the status of the custom resource.
type CustomHealthCheckExprs struct {
// apiVersion of the custom health check.
// +required
APIVersion string `json:"apiVersion"`
// Kind of the custom health check.
// +required
Kind string `json:"kind"`
// InProgress is the CEL expression that verifies that the status
// of the custom resource is in progress.
// +optional
InProgress string `json:"inProgress"`
// Failed is the CEL expression that verifies that the status
// of the custom resource is failed.
// +optional
Failed string `json:"failed"`
// Current is the CEL expression that verifies that the status
// of the custom resource is ready.
// +optional
Current string `json:"current"`
}
```
### Introduce a generic custom status reader
Introduce a generic custom status reader that will be able to compute the status of
a custom resource based on a list of `conditions` that need to be met.
```go
import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine"
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/event"
kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders"
)
type customGenericStatusReader struct {
genericStatusReader engine.StatusReader
gvk schema.GroupVersionKind
}
func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
return &customJobStatusReader{
genericStatusReader: genericStatusReader,
gvk: gvk,
}
}
func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool {
return gk == g.gvk.GroupKind()
}
func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
return g.genericStatusReader.ReadStatus(ctx, reader, resource)
}
func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
}
```
A `genericConditions` closure will takes a `kind` and a map of `CEL` expressions as parameters
and returns a function that takes an `Unstructured` object and returns a `status.Result` object.
````go
import (
"sigs.k8s.io/cli-utils/pkg/kstatus/status"
"github.com/fluxcd/pkg/runtime/cel"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
)
func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) {
return func(u *unstructured.Unstructured) (*status.Result, error) {
obj := u.UnstructuredContent()
for statusKey, expr := range exprs {
// Use CEL to evaluate the expression
result, err := cel.ProcessExpr(expr, obj)
if err != nil {
return nil, err
}
switch statusKey {
case status.CurrentStatus.String():
// If the expression evaluates to true, we return the current status
case status.FailedStatus.String():
// If the expression evaluates to true, we return the failed status
case status.InProgressStatus.String():
// If the expression evaluates to true, we return the reconciling status
}
}
}
}
````
The generic status reader will be used by the `statusPoller` provided to the `reconciler`
to compute the status of the resources for the registered custom resources `kind`.
We will provide a `CEL` environment that will use the Kubernetes CEL library to
evaluate the `CEL` expressions.
### StatusPoller configuration
The `reconciler` holds a `statusPoller` that is used to compute the status of the
resources during the `healthCheck` phase of the reconciliation. The `statusPoller`
is configured with a list of `statusReaders` that are used to compute the status
of the resources.
The `statusPoller` is not configurable once instantiated. This means
that we cannot add new `statusReaders` to the `statusPoller` once it is created.
This is a problem for custom resources because we need to be able to add new
`statusReaders` for each new custom resource that is declared in the `Kustomization`
object's `customHealthChecksExprs` field. Fortunately, the `cli-utils` library has
been forked in the `fluxcd` organization and we can make a change to the `statusPoller`
exposed the `statusReaders` field so that we can add new `statusReaders` to it.
The `statusPoller` used by `kustomize-controller` will be updated for every reconciliation
in order to add new polling options for custom resources that have a `CustomHealthChecksExprs`
field defined in their `Kustomization` object.
### K8s CEL Library
The `K8s CEL Library` is a library that provides `CEL` functions to help in evaluating
`CEL` expressions on `Kubernetes` objects.
Unfortunately, this means that we will need to follow the `K8s CEL Library` releases
in order to make sure that we are using the same version of the `CEL` library as
`Kubernetes`. As of the time of writing this RFC, the `K8s CEL Library` is using the
`v0.16.1` version of the `CEL` library while the latest version of the `CEL` library
is `v0.18.2`. This means that we will need to use the `v0.16.1` version of the `CEL`
library in order to be able to use the `K8s CEL Library`.
## Implementation History
See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health
Loading…
Cancel
Save