mirror of https://github.com/fluxcd/flux2.git
Add RFC - Custom Health Checks for Kustomization using Common Expression Language(CEL)
Signed-off-by: Soule BA <bah.soule@gmail.com>pull/4528/head
parent
20fbcfadac
commit
cc0abd53c3
@ -0,0 +1,330 @@
|
|||||||
|
# RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL)
|
||||||
|
|
||||||
|
**Status:** provisional
|
||||||
|
|
||||||
|
**Creation date:** 2024-01-05
|
||||||
|
|
||||||
|
**Last update:** 2024-01-05
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
This RFC proposes to support customization of the status readers in `Kustomizations`
|
||||||
|
during the `healthCheck` phase for custom resources. The user will be able to declare
|
||||||
|
the needed `conditions` in order to compute a custom resource status.
|
||||||
|
In order to provide flexibility, we propose to use `CEL` expressions to declare
|
||||||
|
the expected conditions and their status.
|
||||||
|
This will introduce a new field `customHealthChecks` in the `Kustomization` CRD
|
||||||
|
which will be a list of `CustomHealthCheck` objects.
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
|
||||||
|
Flux uses the `Kstatus` library during the `healthCheck` phase to compute owned
|
||||||
|
resources status. This works just fine for all standard resources and custom resources
|
||||||
|
that comply with `Kstatus` interfaces.
|
||||||
|
|
||||||
|
In the current Kustomization implementation, we have addressed such a problem for
|
||||||
|
kubernetes Jobs. We have implemented a `customJobStatusReader` that computes the
|
||||||
|
status of a Job based on a defined set of conditions. This is a good solution for
|
||||||
|
Jobs, but it is not generic and thus not applicable to other custom resources.
|
||||||
|
|
||||||
|
Another use case is relying on non-standard `conditions` to compute the status of
|
||||||
|
a custom resource. For example, we might want to compute the status of a custom
|
||||||
|
resource based on a condtion other then `Ready`. This is the case for `Resources`
|
||||||
|
that do intermediate patching like `Certificate` where you should look at the `Issued`
|
||||||
|
condition to know if the certificate has been issued or not before looking at the
|
||||||
|
`Ready` condition.
|
||||||
|
|
||||||
|
In order to provide a generic solution for custom resources, that would not imply
|
||||||
|
writing a custom status reader for each new custom resource, we need to provide a
|
||||||
|
way for the user to express the `conditions` that need to be met in order to compute
|
||||||
|
the status of a given custom resource. And we need to do this in a way that is
|
||||||
|
flexible enough to cover all possible use cases, without having to change `Flux`
|
||||||
|
source code for each new use case.
|
||||||
|
|
||||||
|
### Goals
|
||||||
|
|
||||||
|
- provide a generic solution for user to customize the health check of custom resources
|
||||||
|
- support non-standard resources in `kustomize-controller`
|
||||||
|
|
||||||
|
### Non-Goals
|
||||||
|
|
||||||
|
- We do not plan to support custom `healthChecks` for core resources.
|
||||||
|
|
||||||
|
## Proposal
|
||||||
|
|
||||||
|
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD
|
||||||
|
|
||||||
|
The `CustomHealthChecksExprs` field will be a list of `CustomHealthCheck` objects.
|
||||||
|
Each `CustomHealthChecksExprs` object will have a `apiVersion`, `kind`, `inProgress`,
|
||||||
|
`failed` and `current` fields.
|
||||||
|
|
||||||
|
To give an example, here is how we would declare a custom health check for a `Certificate`
|
||||||
|
resource:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
apiVersion: cert-manager.io/v1
|
||||||
|
kind: Certificate
|
||||||
|
metadata:
|
||||||
|
name: app-certificate
|
||||||
|
namespace: cert-manager
|
||||||
|
spec:
|
||||||
|
commonName: cert-manager-tls
|
||||||
|
dnsNames:
|
||||||
|
- app.ns.svc.cluster.local
|
||||||
|
ipAddresses:
|
||||||
|
- x.x.x.x
|
||||||
|
isCA: true
|
||||||
|
issuerRef:
|
||||||
|
group: cert-manager.io
|
||||||
|
kind: ClusterIssuer
|
||||||
|
name: app-issuer
|
||||||
|
privateKey:
|
||||||
|
algorithm: RSA
|
||||||
|
encoding: PKCS1
|
||||||
|
size: 2048
|
||||||
|
secretName: app-tls-certs
|
||||||
|
subject:
|
||||||
|
organizations:
|
||||||
|
- example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
This `Certificate` resource will transition through the following `conditions`:
|
||||||
|
`Issuing` and `Ready`.
|
||||||
|
|
||||||
|
In order to compute the status of this resource, we need to look at both the `Issuing`
|
||||||
|
and `Ready` conditions.
|
||||||
|
|
||||||
|
The resulting `Kustomization` object will look like this:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: application-kustomization
|
||||||
|
spec:
|
||||||
|
force: false
|
||||||
|
interval: 5m0s
|
||||||
|
path: ./overlays/application
|
||||||
|
prune: false
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: application-git
|
||||||
|
healthChecks:
|
||||||
|
- apiVersion: cert-manager.io/v1
|
||||||
|
kind: Certificate
|
||||||
|
name: service-certificate
|
||||||
|
namespace: cert-manager
|
||||||
|
- apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: app
|
||||||
|
namespace: app
|
||||||
|
customHealthChecksExprs:
|
||||||
|
- apiVersion: cert-manager.io/v1
|
||||||
|
kind: Certificate
|
||||||
|
inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
|
||||||
|
failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')"
|
||||||
|
current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
|
||||||
|
```
|
||||||
|
|
||||||
|
The `HealthChecks` field still contains the objects that should be included in
|
||||||
|
the health assessment. The `CustomHealthChecksExprs` field will be used to declare
|
||||||
|
the `conditions` that need to be met in order to compute the status of the custom resource.
|
||||||
|
|
||||||
|
Note that all core resources are discarded from the `CustomHealthChecksExprs` field.
|
||||||
|
|
||||||
|
|
||||||
|
#### Provide an evaluator for `CEL` expressions for users
|
||||||
|
|
||||||
|
We will provide a CEL environment that can be used by the user to evaluate `CEL`
|
||||||
|
expressions. Users will use it to test their expressions before applying them to
|
||||||
|
their `Kustomization` object.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
### User Stories
|
||||||
|
|
||||||
|
#### Configure custom health checks for a custom resource
|
||||||
|
|
||||||
|
> As a user of Flux, I want to be able to specify custom health checks for my
|
||||||
|
> custom resources, so that I can have more control over the status of my
|
||||||
|
> resources.
|
||||||
|
|
||||||
|
#### Enable health checks support in Flux for non-standard resources
|
||||||
|
|
||||||
|
> As a user of Flux, I want to be able to use the health check feature for
|
||||||
|
> non-standard resources, so that I can have more control over the status of my
|
||||||
|
> resources.
|
||||||
|
|
||||||
|
### Alternatives
|
||||||
|
|
||||||
|
We need an expression language that is flexible enough to cover all possible use
|
||||||
|
cases, without having to change `Flux` source code for each new use case.
|
||||||
|
|
||||||
|
On alternative that have been considered is to use `cuelang` instead of `CEL`.
|
||||||
|
`cuelang` is a more powerful expression language, but it is also more complex and
|
||||||
|
requires more work to integrate with `Flux`. it also does not have any support in
|
||||||
|
`Kubernetes` yet while `CEL` is already used in `Kubernetes` and libraries are
|
||||||
|
available to use it.
|
||||||
|
|
||||||
|
## Design Details
|
||||||
|
|
||||||
|
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD
|
||||||
|
|
||||||
|
The `api/v1/kustomization_types.go` file will be updated to add the `CustomHealthChecksExprs`
|
||||||
|
field to the `KustomizationSpec` struct.
|
||||||
|
|
||||||
|
```go
|
||||||
|
type KustomizationSpec struct {
|
||||||
|
...
|
||||||
|
// A list of resources to be included in the health assessment.
|
||||||
|
// +optional
|
||||||
|
HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"`
|
||||||
|
|
||||||
|
// A list of custom health checks expressed as CEL expressions.
|
||||||
|
// The CEL expression must evaluate to a boolean value.
|
||||||
|
// +optional
|
||||||
|
CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"`
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
// CustomHealthCheckExprs defines the CEL expressions for custom health checks.
|
||||||
|
// The CEL expressions must evaluate to a boolean value. The expressions are used
|
||||||
|
// to determine the status of the custom resource.
|
||||||
|
type CustomHealthCheckExprs struct {
|
||||||
|
// apiVersion of the custom health check.
|
||||||
|
// +required
|
||||||
|
APIVersion string `json:"apiVersion"`
|
||||||
|
// Kind of the custom health check.
|
||||||
|
// +required
|
||||||
|
Kind string `json:"kind"`
|
||||||
|
// InProgress is the CEL expression that verifies that the status
|
||||||
|
// of the custom resource is in progress.
|
||||||
|
// +optional
|
||||||
|
InProgress string `json:"inProgress"`
|
||||||
|
// Failed is the CEL expression that verifies that the status
|
||||||
|
// of the custom resource is failed.
|
||||||
|
// +optional
|
||||||
|
Failed string `json:"failed"`
|
||||||
|
// Current is the CEL expression that verifies that the status
|
||||||
|
// of the custom resource is ready.
|
||||||
|
// +optional
|
||||||
|
Current string `json:"current"`
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Introduce a generic custom status reader
|
||||||
|
|
||||||
|
Introduce a generic custom status reader that will be able to compute the status of
|
||||||
|
a custom resource based on a list of `conditions` that need to be met.
|
||||||
|
|
||||||
|
```go
|
||||||
|
import (
|
||||||
|
"k8s.io/apimachinery/pkg/runtime/schema"
|
||||||
|
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine"
|
||||||
|
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/event"
|
||||||
|
kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders"
|
||||||
|
)
|
||||||
|
type customGenericStatusReader struct {
|
||||||
|
genericStatusReader engine.StatusReader
|
||||||
|
gvk schema.GroupVersionKind
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
|
||||||
|
genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
|
||||||
|
return &customJobStatusReader{
|
||||||
|
genericStatusReader: genericStatusReader,
|
||||||
|
gvk: gvk,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool {
|
||||||
|
return gk == g.gvk.GroupKind()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
|
||||||
|
return g.genericStatusReader.ReadStatus(ctx, reader, resource)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
|
||||||
|
return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
A `genericConditions` closure will takes a `kind` and a map of `CEL` expressions as parameters
|
||||||
|
and returns a function that takes an `Unstructured` object and returns a `status.Result` object.
|
||||||
|
|
||||||
|
````go
|
||||||
|
import (
|
||||||
|
"sigs.k8s.io/cli-utils/pkg/kstatus/status"
|
||||||
|
"github.com/fluxcd/pkg/runtime/cel"
|
||||||
|
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
|
||||||
|
)
|
||||||
|
|
||||||
|
func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) {
|
||||||
|
return func(u *unstructured.Unstructured) (*status.Result, error) {
|
||||||
|
obj := u.UnstructuredContent()
|
||||||
|
|
||||||
|
for statusKey, expr := range exprs {
|
||||||
|
// Use CEL to evaluate the expression
|
||||||
|
result, err := cel.ProcessExpr(expr, obj)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
switch statusKey {
|
||||||
|
case status.CurrentStatus.String():
|
||||||
|
// If the expression evaluates to true, we return the current status
|
||||||
|
case status.FailedStatus.String():
|
||||||
|
// If the expression evaluates to true, we return the failed status
|
||||||
|
case status.InProgressStatus.String():
|
||||||
|
// If the expression evaluates to true, we return the reconciling status
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
````
|
||||||
|
|
||||||
|
The generic status reader will be used by the `statusPoller` provided to the `reconciler`
|
||||||
|
to compute the status of the resources for the registered custom resources `kind`.
|
||||||
|
|
||||||
|
We will provide a `CEL` environment that will use the Kubernetes CEL library to
|
||||||
|
evaluate the `CEL` expressions.
|
||||||
|
|
||||||
|
### StatusPoller configuration
|
||||||
|
|
||||||
|
The `reconciler` holds a `statusPoller` that is used to compute the status of the
|
||||||
|
resources during the `healthCheck` phase of the reconciliation. The `statusPoller`
|
||||||
|
is configured with a list of `statusReaders` that are used to compute the status
|
||||||
|
of the resources.
|
||||||
|
|
||||||
|
The `statusPoller` is not configurable once instantiated. This means
|
||||||
|
that we cannot add new `statusReaders` to the `statusPoller` once it is created.
|
||||||
|
This is a problem for custom resources because we need to be able to add new
|
||||||
|
`statusReaders` for each new custom resource that is declared in the `Kustomization`
|
||||||
|
object's `customHealthChecksExprs` field. Fortunately, the `cli-utils` library has
|
||||||
|
been forked in the `fluxcd` organization and we can make a change to the `statusPoller`
|
||||||
|
exposed the `statusReaders` field so that we can add new `statusReaders` to it.
|
||||||
|
|
||||||
|
|
||||||
|
The `statusPoller` used by `kustomize-controller` will be updated for every reconciliation
|
||||||
|
in order to add new polling options for custom resources that have a `CustomHealthChecksExprs`
|
||||||
|
field defined in their `Kustomization` object.
|
||||||
|
|
||||||
|
### K8s CEL Library
|
||||||
|
|
||||||
|
The `K8s CEL Library` is a library that provides `CEL` functions to help in evaluating
|
||||||
|
`CEL` expressions on `Kubernetes` objects.
|
||||||
|
|
||||||
|
Unfortunately, this means that we will need to follow the `K8s CEL Library` releases
|
||||||
|
in order to make sure that we are using the same version of the `CEL` library as
|
||||||
|
`Kubernetes`. As of the time of writing this RFC, the `K8s CEL Library` is using the
|
||||||
|
`v0.16.1` version of the `CEL` library while the latest version of the `CEL` library
|
||||||
|
is `v0.18.2`. This means that we will need to use the `v0.16.1` version of the `CEL`
|
||||||
|
library in order to be able to use the `K8s CEL Library`.
|
||||||
|
|
||||||
|
|
||||||
|
## Implementation History
|
||||||
|
|
||||||
|
See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health
|
Loading…
Reference in New Issue