13 KiB
RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL)
Status: provisional
Creation date: 2024-01-05
Last update: 2024-01-05
Summary
This RFC proposes to support customization of the status readers in Kustomizations
during the healthCheck
phase for custom resources. The user will be able to declare
the needed conditions
in order to compute a custom resource status.
In order to provide flexibility, we propose to use CEL
expressions to declare
the expected conditions and their status.
This will introduce a new field customHealthChecks
in the Kustomization
CRD
which will be a list of CustomHealthCheck
objects.
Motivation
Flux uses the Kstatus
library during the healthCheck
phase to compute owned
resources status. This works just fine for all standard resources and custom resources
that comply with Kstatus
interfaces.
In the current Kustomization implementation, we have addressed such a problem for
kubernetes Jobs. We have implemented a customJobStatusReader
that computes the
status of a Job based on a defined set of conditions. This is a good solution for
Jobs, but it is not generic and thus not applicable to other custom resources.
Another use case is relying on non-standard conditions
to compute the status of
a custom resource. For example, we might want to compute the status of a custom
resource based on a condtion other then Ready
. This is the case for Resources
that do intermediate patching like Certificate
where you should look at the Issued
condition to know if the certificate has been issued or not before looking at the
Ready
condition.
In order to provide a generic solution for custom resources, that would not imply
writing a custom status reader for each new custom resource, we need to provide a
way for the user to express the conditions
that need to be met in order to compute
the status of a given custom resource. And we need to do this in a way that is
flexible enough to cover all possible use cases, without having to change Flux
source code for each new use case.
Goals
- provide a generic solution for user to customize the health check of custom resources
- support non-standard resources in
kustomize-controller
Non-Goals
- We do not plan to support custom
healthChecks
for core resources.
Proposal
Introduce a new field CustomHealthChecksExprs
in the Kustomization
CRD
The CustomHealthChecksExprs
field will be a list of CustomHealthCheck
objects.
Each CustomHealthChecksExprs
object will have a apiVersion
, kind
, inProgress
,
failed
and current
fields.
To give an example, here is how we would declare a custom health check for a Certificate
resource:
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: app-certificate
namespace: cert-manager
spec:
commonName: cert-manager-tls
dnsNames:
- app.ns.svc.cluster.local
ipAddresses:
- x.x.x.x
isCA: true
issuerRef:
group: cert-manager.io
kind: ClusterIssuer
name: app-issuer
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
secretName: app-tls-certs
subject:
organizations:
- example.com
This Certificate
resource will transition through the following conditions
:
Issuing
and Ready
.
In order to compute the status of this resource, we need to look at both the Issuing
and Ready
conditions.
The resulting Kustomization
object will look like this:
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
name: application-kustomization
spec:
force: false
interval: 5m0s
path: ./overlays/application
prune: false
sourceRef:
kind: GitRepository
name: application-git
healthChecks:
- apiVersion: cert-manager.io/v1
kind: Certificate
name: service-certificate
namespace: cert-manager
- apiVersion: apps/v1
kind: Deployment
name: app
namespace: app
customHealthChecksExprs:
- apiVersion: cert-manager.io/v1
kind: Certificate
inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')"
current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')"
The HealthChecks
field still contains the objects that should be included in
the health assessment. The CustomHealthChecksExprs
field will be used to declare
the conditions
that need to be met in order to compute the status of the custom resource.
Note that all core resources are discarded from the CustomHealthChecksExprs
field.
Provide an evaluator for CEL
expressions for users
We will provide a CEL environment that can be used by the user to evaluate CEL
expressions. Users will use it to test their expressions before applying them to
their Kustomization
object.
$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml
User Stories
Configure custom health checks for a custom resource
As a user of Flux, I want to be able to specify custom health checks for my custom resources, so that I can have more control over the status of my resources.
Enable health checks support in Flux for non-standard resources
As a user of Flux, I want to be able to use the health check feature for non-standard resources, so that I can have more control over the status of my resources.
Alternatives
We need an expression language that is flexible enough to cover all possible use
cases, without having to change Flux
source code for each new use case.
On alternative that have been considered is to use cuelang
instead of CEL
.
cuelang
is a more powerful expression language, but it is also more complex and
requires more work to integrate with Flux
. it also does not have any support in
Kubernetes
yet while CEL
is already used in Kubernetes
and libraries are
available to use it.
Design Details
Introduce a new field CustomHealthChecksExprs
in the Kustomization
CRD
The api/v1/kustomization_types.go
file will be updated to add the CustomHealthChecksExprs
field to the KustomizationSpec
struct.
type KustomizationSpec struct {
...
// A list of resources to be included in the health assessment.
// +optional
HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"`
// A list of custom health checks expressed as CEL expressions.
// The CEL expression must evaluate to a boolean value.
// +optional
CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"`
...
}
// CustomHealthCheckExprs defines the CEL expressions for custom health checks.
// The CEL expressions must evaluate to a boolean value. The expressions are used
// to determine the status of the custom resource.
type CustomHealthCheckExprs struct {
// apiVersion of the custom health check.
// +required
APIVersion string `json:"apiVersion"`
// Kind of the custom health check.
// +required
Kind string `json:"kind"`
// InProgress is the CEL expression that verifies that the status
// of the custom resource is in progress.
// +optional
InProgress string `json:"inProgress"`
// Failed is the CEL expression that verifies that the status
// of the custom resource is failed.
// +optional
Failed string `json:"failed"`
// Current is the CEL expression that verifies that the status
// of the custom resource is ready.
// +optional
Current string `json:"current"`
}
Introduce a generic custom status reader
Introduce a generic custom status reader that will be able to compute the status of
a custom resource based on a list of conditions
that need to be met.
import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine"
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/event"
kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders"
)
type customGenericStatusReader struct {
genericStatusReader engine.StatusReader
gvk schema.GroupVersionKind
}
func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader {
genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs))
return &customJobStatusReader{
genericStatusReader: genericStatusReader,
gvk: gvk,
}
}
func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool {
return gk == g.gvk.GroupKind()
}
func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) {
return g.genericStatusReader.ReadStatus(ctx, reader, resource)
}
func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) {
return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource)
}
A genericConditions
closure will takes a kind
and a map of CEL
expressions as parameters
and returns a function that takes an Unstructured
object and returns a status.Result
object.
import (
"sigs.k8s.io/cli-utils/pkg/kstatus/status"
"github.com/fluxcd/pkg/runtime/cel"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
)
func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) {
return func(u *unstructured.Unstructured) (*status.Result, error) {
obj := u.UnstructuredContent()
for statusKey, expr := range exprs {
// Use CEL to evaluate the expression
result, err := cel.ProcessExpr(expr, obj)
if err != nil {
return nil, err
}
switch statusKey {
case status.CurrentStatus.String():
// If the expression evaluates to true, we return the current status
case status.FailedStatus.String():
// If the expression evaluates to true, we return the failed status
case status.InProgressStatus.String():
// If the expression evaluates to true, we return the reconciling status
}
}
}
}
The generic status reader will be used by the statusPoller
provided to the reconciler
to compute the status of the resources for the registered custom resources kind
.
We will provide a CEL
environment that will use the Kubernetes CEL library to
evaluate the CEL
expressions.
StatusPoller configuration
The reconciler
holds a statusPoller
that is used to compute the status of the
resources during the healthCheck
phase of the reconciliation. The statusPoller
is configured with a list of statusReaders
that are used to compute the status
of the resources.
The statusPoller
is not configurable once instantiated. This means
that we cannot add new statusReaders
to the statusPoller
once it is created.
This is a problem for custom resources because we need to be able to add new
statusReaders
for each new custom resource that is declared in the Kustomization
object's customHealthChecksExprs
field. Fortunately, the cli-utils
library has
been forked in the fluxcd
organization and we can make a change to the statusPoller
exposed the statusReaders
field so that we can add new statusReaders
to it.
The statusPoller
used by kustomize-controller
will be updated for every reconciliation
in order to add new polling options for custom resources that have a CustomHealthChecksExprs
field defined in their Kustomization
object.
K8s CEL Library
The K8s CEL Library
is a library that provides CEL
functions to help in evaluating
CEL
expressions on Kubernetes
objects.
Unfortunately, this means that we will need to follow the K8s CEL Library
releases
in order to make sure that we are using the same version of the CEL
library as
Kubernetes
. As of the time of writing this RFC, the K8s CEL Library
is using the
v0.16.1
version of the CEL
library while the latest version of the CEL
library
is v0.18.2
. This means that we will need to use the v0.16.1
version of the CEL
library in order to be able to use the K8s CEL Library
.
Implementation History
See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health