mirror of https://github.com/fluxcd/flux2.git
Add RFC - Custom Health Checks for Kustomization using Common Expression Language(CEL)
Signed-off-by: Soule BA <bah.soule@gmail.com>pull/4528/head
parent
20fbcfadac
commit
cc0abd53c3
@ -0,0 +1,330 @@
|
||||
# RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL)
|
||||
|
||||
**Status:** provisional
|
||||
|
||||
**Creation date:** 2024-01-05
|
||||
|
||||
**Last update:** 2024-01-05
|
||||
|
||||
## Summary
|
||||
|
||||
This RFC proposes to support customization of the status readers in `Kustomizations`
|
||||
during the `healthCheck` phase for custom resources. The user will be able to declare
|
||||
the needed `conditions` in order to compute a custom resource status.
|
||||
In order to provide flexibility, we propose to use `CEL` expressions to declare
|
||||
the expected conditions and their status.
|
||||
This will introduce a new field `customHealthChecks` in the `Kustomization` CRD
|
||||
which will be a list of `CustomHealthCheck` objects.
|
||||
|
||||
## Motivation
|
||||
|
||||
Flux uses the `Kstatus` library during the `healthCheck` phase to compute owned
|
||||
resources status. This works just fine for all standard resources and custom resources
|
||||
that comply with `Kstatus` interfaces.
|
||||
|
||||
In the current Kustomization implementation, we have addressed such a problem for
|
||||
kubernetes Jobs. We have implemented a `customJobStatusReader` that computes the
|
||||
status of a Job based on a defined set of conditions. This is a good solution for
|
||||
Jobs, but it is not generic and thus not applicable to other custom resources.
|
||||
|
||||
Another use case is relying on non-standard `conditions` to compute the status of
|
||||
a custom resource. For example, we might want to compute the status of a custom
|
||||
resource based on a condtion other then `Ready`. This is the case for `Resources`
|
||||
that do intermediate patching like `Certificate` where you should look at the `Issued`
|
||||
condition to know if the certificate has been issued or not before looking at the
|
||||
`Ready` condition.
|
||||
|
||||
In order to provide a generic solution for custom resources, that would not imply
|
||||
writing a custom status reader for each new custom resource, we need to provide a
|
||||
way for the user to express the `conditions` that need to be met in order to compute
|
||||
the status of a given custom resource. And we need to do this in a way that is
|
||||
flexible enough to cover all possible use cases, without having to change `Flux`
|
||||
source code for each new use case.
|
||||
|
||||
### Goals
|
||||
|
||||
- provide a generic solution for user to customize the health check of custom resources
|
||||
- support non-standard resources in `kustomize-controller`
|
||||
|
||||
### Non-Goals
|
||||
|
||||
- We do not plan to support custom `healthChecks` for core resources.
|
||||
|
||||
## Proposal
|
||||
|
||||
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD
|
||||
|
||||
The `CustomHealthChecksExprs` field will be a list of `CustomHealthCheck` objects.
|
||||
Each `CustomHealthChecksExprs` object will have a `apiVersion`, `kind`, `inProgress`,
|
||||
`failed` and `current` fields.
|
||||
|
||||
To give an example, here is how we would declare a custom health check for a `Certificate`
|
||||
resource:
|
||||
|
||||
```yaml
|
||||
---
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: app-certificate
|
||||
namespace: cert-manager
|
||||
spec:
|
||||
commonName: cert-manager-tls
|
||||
dnsNames:
|
||||
- app.ns.svc.cluster.local
|
||||
ipAddresses:
|
||||
- x.x.x.x
|
||||
isCA: true
|
||||
issuerRef:
|
||||
group: cert-manager.io
|
||||
kind: ClusterIssuer
|
||||
name: app-issuer
|
||||
privateKey:
|
||||
algorithm: RSA
|
||||
encoding: PKCS1
|
||||
size: 2048
|
||||
secretName: app-tls-certs
|
||||
subject:
|
||||
organizations:
|
||||
- example.com
|
||||
```
|
||||
|
||||
This `Certificate` resource will transition through the following `conditions`:
|
||||
`Issuing` and `Ready`.
|
||||
|
||||
In order to compute the status of this resource, we need to look at both the `Issuing`
|
||||
and `Ready` conditions.
|
||||
|
||||
The resulting `Kustomization` object will look like this:
|
||||
|
||||
```yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: application-kustomization
|
||||
spec:
|
||||
force: false
|
||||
interval: 5m0s
|
||||
path: ./overlays/application
|
||||
prune: false
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: application-git
|
||||
healthChecks:
|
||||
- apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
name: service-certificate
|
||||
namespace: cert-manager
|
||||
- apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: app
|
||||
namespace: app
|
||||
customHealthChecksExprs:
|
||||
- apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
|
||||
failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')"
|
||||
current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
|
||||
```
|
||||
|
||||
The `HealthChecks` field still contains the objects that should be included in
|
||||
the health assessment. The `CustomHealthChecksExprs` field will be used to declare
|
||||
the `conditions` that need to be met in order to compute the status of the custom resource.
|
||||
|
||||
Note that all core resources are discarded from the `CustomHealthChecksExprs` field.
|
||||
|
||||
|
||||
#### Provide an evaluator for `CEL` expressions for users
|
||||
|
||||
We will provide a CEL environment that can be used by the user to evaluate `CEL`
|
||||
expressions. Users will use it to test their expressions before applying them to
|
||||
their `Kustomization` object.
|
||||
|
||||
```shell
|
||||
$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml
|
||||
```
|
||||
|
||||
### User Stories
|
||||
|
||||
#### Configure custom health checks for a custom resource
|
||||
|
||||
> As a user of Flux, I want to be able to specify custom health checks for my
|
||||
> custom resources, so that I can have more control over the status of my
|
||||
> resources.
|
||||
|
||||
#### Enable health checks support in Flux for non-standard resources
|
||||
|
||||
> As a user of Flux, I want to be able to use the health check feature for
|
||||
> non-standard resources, so that I can have more control over the status of my
|
||||
> resources.
|
||||
|
||||
### Alternatives
|
||||
|
||||
We need an expression language that is flexible enough to cover all possible use
|
||||
cases, without having to change `Flux` source code for each new use case.
|
||||
|
||||
On alternative that have been considered is to use `cuelang` instead of `CEL`.
|
||||
`cuelang` is a more powerful expression language, but it is also more complex and
|
||||
requires more work to integrate with `Flux`. it also does not have any support in
|
||||
`Kubernetes` yet while `CEL` is already used in `Kubernetes` and libraries are
|
||||
available to use it.
|
||||
|
||||
## Design Details
|
||||
|
||||
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD
|
||||
|
||||
The `api/v1/kustomization_types.go` file will be updated to add the `CustomHealthChecksExprs`
|
||||
field to the `KustomizationSpec` struct.
|
||||
|
||||
```go
|
||||
type KustomizationSpec struct {
|
||||
...
|
||||
// A list of resources to be included in the health assessment.
|
||||
// +optional
|
||||
HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"`
|
||||
|
||||
// A list of custom health checks expressed as CEL expressions.
|
||||
// The CEL expression must evaluate to a boolean value.
|
||||
// +optional
|
||||
CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"`
|
||||
...
|
||||
}
|
||||
|
||||
// CustomHealthCheckExprs defines the CEL expressions for custom health checks.
|
||||
// The CEL expressions must evaluate to a boolean value. The expressions are used
|
||||
// to determine the status of the custom resource.
|
||||
type CustomHealthCheckExprs struct {
|
||||
// apiVersion of the custom health check.
|
||||
// +required
|
||||
APIVersion string `json:"apiVersion"`
|
||||
// Kind of the custom health check.
|
||||
// +required
|
||||
Kind string `json:"kind"`
|
||||
// InProgress is the CEL expression that verifies that the status
|
||||
// of the custom resource is in progress.
|
||||
// +optional
|
||||
InProgress string `json:"inProgress"`
|
||||
// Failed is the CEL expression that verifies that the status
|
||||
// of the custom resource is failed.
|
||||
// +optional
|
||||
Failed string `json:"failed"`
|
||||
// Current is the CEL expression that verifies that the status
|
||||
// of the custom resource is ready.
|
||||
// +optional
|
||||
Current string `json:"current"`
|
||||
}
|
||||
```
|
||||
|
||||
### Introduce a generic custom status reader
|
||||
|
||||
Introduce a generic custom status reader that will be able to compute the status of
|
||||
a custom resource based on a list of `conditions` that need to be met.
|
||||
|
||||
```go
|
||||
import (
|
||||
"k8s.io/apimachinery/pkg/runtime/schema"
|
||||
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine"
|
||||
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/event"
|
||||
kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders"
|
||||
)
|
||||
type customGenericStatusReader struct {
|
||||
genericStatusReader engine.StatusReader
|
||||
gvk schema.GroupVersionKind
|
||||
}
|
||||
|
||||
func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
|
||||
genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
|
||||
return &customJobStatusReader{
|
||||
genericStatusReader: genericStatusReader,
|
||||
gvk: gvk,
|
||||
}
|
||||
}
|
||||
|
||||
func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool {
|
||||
return gk == g.gvk.GroupKind()
|
||||
}
|
||||
|
||||
func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
|
||||
return g.genericStatusReader.ReadStatus(ctx, reader, resource)
|
||||
}
|
||||
|
||||
func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
|
||||
return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
|
||||
}
|
||||
```
|
||||
|
||||
A `genericConditions` closure will takes a `kind` and a map of `CEL` expressions as parameters
|
||||
and returns a function that takes an `Unstructured` object and returns a `status.Result` object.
|
||||
|
||||
````go
|
||||
import (
|
||||
"sigs.k8s.io/cli-utils/pkg/kstatus/status"
|
||||
"github.com/fluxcd/pkg/runtime/cel"
|
||||
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
|
||||
)
|
||||
|
||||
func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) {
|
||||
return func(u *unstructured.Unstructured) (*status.Result, error) {
|
||||
obj := u.UnstructuredContent()
|
||||
|
||||
for statusKey, expr := range exprs {
|
||||
// Use CEL to evaluate the expression
|
||||
result, err := cel.ProcessExpr(expr, obj)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
switch statusKey {
|
||||
case status.CurrentStatus.String():
|
||||
// If the expression evaluates to true, we return the current status
|
||||
case status.FailedStatus.String():
|
||||
// If the expression evaluates to true, we return the failed status
|
||||
case status.InProgressStatus.String():
|
||||
// If the expression evaluates to true, we return the reconciling status
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
````
|
||||
|
||||
The generic status reader will be used by the `statusPoller` provided to the `reconciler`
|
||||
to compute the status of the resources for the registered custom resources `kind`.
|
||||
|
||||
We will provide a `CEL` environment that will use the Kubernetes CEL library to
|
||||
evaluate the `CEL` expressions.
|
||||
|
||||
### StatusPoller configuration
|
||||
|
||||
The `reconciler` holds a `statusPoller` that is used to compute the status of the
|
||||
resources during the `healthCheck` phase of the reconciliation. The `statusPoller`
|
||||
is configured with a list of `statusReaders` that are used to compute the status
|
||||
of the resources.
|
||||
|
||||
The `statusPoller` is not configurable once instantiated. This means
|
||||
that we cannot add new `statusReaders` to the `statusPoller` once it is created.
|
||||
This is a problem for custom resources because we need to be able to add new
|
||||
`statusReaders` for each new custom resource that is declared in the `Kustomization`
|
||||
object's `customHealthChecksExprs` field. Fortunately, the `cli-utils` library has
|
||||
been forked in the `fluxcd` organization and we can make a change to the `statusPoller`
|
||||
exposed the `statusReaders` field so that we can add new `statusReaders` to it.
|
||||
|
||||
|
||||
The `statusPoller` used by `kustomize-controller` will be updated for every reconciliation
|
||||
in order to add new polling options for custom resources that have a `CustomHealthChecksExprs`
|
||||
field defined in their `Kustomization` object.
|
||||
|
||||
### K8s CEL Library
|
||||
|
||||
The `K8s CEL Library` is a library that provides `CEL` functions to help in evaluating
|
||||
`CEL` expressions on `Kubernetes` objects.
|
||||
|
||||
Unfortunately, this means that we will need to follow the `K8s CEL Library` releases
|
||||
in order to make sure that we are using the same version of the `CEL` library as
|
||||
`Kubernetes`. As of the time of writing this RFC, the `K8s CEL Library` is using the
|
||||
`v0.16.1` version of the `CEL` library while the latest version of the `CEL` library
|
||||
is `v0.18.2`. This means that we will need to use the `v0.16.1` version of the `CEL`
|
||||
library in order to be able to use the `K8s CEL Library`.
|
||||
|
||||
|
||||
## Implementation History
|
||||
|
||||
See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health
|
Loading…
Reference in New Issue