mirror of https://github.com/fluxcd/flux2.git
Merge bcbee2da4d
into 912718103c
commit
c297bd8658
@ -0,0 +1,178 @@
|
||||
# RFC-0006 Alternative Suspend Control
|
||||
|
||||
**Status:** provisional
|
||||
|
||||
**Creation date:** 2023-09-20
|
||||
|
||||
**Last update:** 2023-10-18
|
||||
|
||||
|
||||
## Summary
|
||||
|
||||
This RFC proposes an alternative method to indicate the suspended state of
|
||||
suspendable resources to flux controllers through object metadata. It presents
|
||||
an annotation key that can be used to suspend a resource from reconciliation as
|
||||
an alternative to the `.spec.suspend` field. It does not address the
|
||||
deprecation of this field from the resource apis. This annotation can
|
||||
optionally act as a vehicle for communicating contextual information about the
|
||||
suspended resource to users.
|
||||
|
||||
|
||||
## Motivation
|
||||
|
||||
The current implementation of suspending a resource from reconciliation uses
|
||||
the `.spec.suspend` field. A change to this field results in a generation
|
||||
number increase which can be confusing when diffing.
|
||||
|
||||
Teams may wish to communicate information about the suspended resource, such as
|
||||
the reason for the suspension, in the object itself.
|
||||
|
||||
### Goals
|
||||
|
||||
The flux reconciliation loop will support recognizing a resource's suspend
|
||||
status from either the api field or the designated metadata annotation key.
|
||||
The flux cli will similarly recognize this state with `get` commands and but
|
||||
will alter only the metadata under the `suspend` command. The `resume` command
|
||||
will still alter the api field but additionally the metadata. The
|
||||
flux cli will support optionally setting the suspend metadata annotation value
|
||||
with a user supplied string for a contextual message.
|
||||
|
||||
### Non-Goals
|
||||
|
||||
The deprecation plan for the `.spec.suspend` field is out of scope for this
|
||||
RFC.
|
||||
|
||||
|
||||
## Proposal
|
||||
|
||||
Register a flux resource metadata key `reconcile.fluxcd.io/suspended` with a
|
||||
suspend semantic to be interpreted by controllers and manipulated by the cli.
|
||||
The presence of the annotation key is an alternative to the `.spec.suspend` api
|
||||
field setting when considering if a resource is suspended or not. The
|
||||
annotation key is set by a `flux suspend` command and removed by a `flux
|
||||
resume` command. The annotation key value is open for communicating a message
|
||||
or reason for the object's suspension. The value can be set using a
|
||||
`--message` flag to the `suspend` command.
|
||||
|
||||
### User Stories
|
||||
|
||||
#### Suspend/Resume without Generation Roll
|
||||
|
||||
Currently when a resource is set to suspended or resumed the `.spec.suspend`
|
||||
field is mutated which increments the `.metadata.generation` field and after
|
||||
successful reconciliation the `.status.observedGeneration` number. The
|
||||
community believes that the generation change for this reason is not in
|
||||
alignment with gitops principles. In more detail, upon suspension the
|
||||
generation increments but the observed generation lags since reconciliation is
|
||||
not completed successfully.
|
||||
|
||||
The flux controllers should recognize that a resource is suspended or
|
||||
unsuspended from the presence of a special metadata key -- this key can be
|
||||
added, removed or changed without patching the object in such a way that the
|
||||
generation number increments.
|
||||
|
||||
#### Seeing Suspend State
|
||||
|
||||
Users should be able to see the effective suspend state of the resource with a
|
||||
`flux get` command. The display should mirror what the controllers interpret
|
||||
the suspend state to be. This story is included to capture current
|
||||
functionality that should be preserved.
|
||||
|
||||
#### Suspend with a Reason
|
||||
|
||||
Often there is a purpose behind suspending a resource with the flux cli,
|
||||
whether it be during incident response, source manifest cutovers, or various
|
||||
other scenarios. The `flux diff` command provides an illustrative UX for
|
||||
determining what will change if a suspended resource is resumed, but neither it
|
||||
nor `flux get` help explain _why_ something is paused or when it would be ok to
|
||||
resume reconciliation. On distributed teams this can become a point of friction
|
||||
as it needs to be communicated among group stakeholders.
|
||||
|
||||
Flux users should have a way to succinctly signal to other users why a resource
|
||||
is suspended on the resource itself.
|
||||
|
||||
#### Suspend without Cluster Access
|
||||
|
||||
How do these users ensure the application is suspended?
|
||||
|
||||
* A validated spec field `.spec.suspend` is typesafe and can be trusted to to
|
||||
suspend a resource from reconciliation.
|
||||
|
||||
* Logs and metrics can reveal the suspend status for confirmation. Logs are
|
||||
not ideal for this use case. Metrics may be the only safe way to
|
||||
confirm an object is suspended without cluster access.
|
||||
|
||||
What other options are there?
|
||||
|
||||
* The existence of the `reconcile.fluxcd.io/suspended` metadata annotation is
|
||||
not typesafe and not a trustworthy way to suspend. It becomes more valid
|
||||
when reported by the cli, by a controller metric/log/event, or by object
|
||||
status.
|
||||
|
||||
* The emission of an event from a controller upon suspend or resume transition.
|
||||
|
||||
* The update of the object status with indication of suspended status.
|
||||
|
||||
### Alternatives
|
||||
|
||||
#### More `.spec`
|
||||
|
||||
The existing `.spec.suspend` could be expanded with fields for the above
|
||||
semantics. This would drive more generation number changes and would require a
|
||||
change to the apis.
|
||||
|
||||
|
||||
## Design Details
|
||||
|
||||
Implementing this RFC would involve the controllers and the cli.
|
||||
|
||||
This feature would create an alternate path to suspending an object and would
|
||||
not violate the current apis.
|
||||
|
||||
### Common
|
||||
|
||||
The `reconcile.fluxcd.io/suspended` annotation key string and a getter function
|
||||
would be made avaiable for controllers and the cli to recognize and manipulate the
|
||||
suspend object metadata.
|
||||
|
||||
### Controllers
|
||||
|
||||
Flux controllers would skip reconciling a resource based on an `OR` of (1) the
|
||||
api `.spec.suspend` and (2) the existence of the suspend metadata annotation
|
||||
key. This would be implemented in the controller predicates to completely skip
|
||||
any reconciliation cycle of suspended objects.
|
||||
|
||||
### cli
|
||||
|
||||
The `get` command would recognize the suspend state from the union of the
|
||||
`.spec.suspend` and the presence of the suspended annotation.
|
||||
|
||||
The `suspend` command would add the suspend annotation but forgo modifying the
|
||||
`.spec.suspend` field.
|
||||
|
||||
The `resume` command would remove the suspend annotation and modify the
|
||||
`.spec.suspend` field to `false`.
|
||||
|
||||
The suspend annotation would by default be set to a generic value. An optional
|
||||
cli flag (eg `--message`) would support setting the suspended annotation value
|
||||
to a user-specified string.
|
||||
|
||||
## Breaking Changes - Version Skew and Suspend Honoring
|
||||
|
||||
An edge case exists under these proposed changes with regard to suspending
|
||||
objects using a new version of the cli while the controllers are running older
|
||||
versions. Specifically, the user suspends the object with the cli which adds
|
||||
the suspend annotation but leaves the `.spec.suspend` field unmodified. The
|
||||
user sees the object is suspended by the cli output. The controllers however do
|
||||
not recognize the object is suspended.
|
||||
|
||||
A potential scenario where this case becomes very damaging is during git repo
|
||||
refactoring where users suspend objects, relocate the manifest sources and
|
||||
related references, and resume. The operation is meant to be a no-op. However
|
||||
with such a version skew and `Kustomizations` set with `.spec.prune` enabled
|
||||
major workload disruption could occur.
|
||||
|
||||
|
||||
## Implementation History
|
||||
|
||||
tbd
|
Loading…
Reference in New Issue