From a7e41df1e3ce16b4943f12021ea01e804b6f2c38 Mon Sep 17 00:00:00 2001
From: Matheus Pimenta <matheuscscp@gmail.com>
Date: Sat, 22 Feb 2025 20:31:14 +0000
Subject: [PATCH] [RFC-0010] Multi-Tenant Workload Identity

Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>
---
 .../README.md                                 | 1256 +++++++++++++++++
 1 file changed, 1256 insertions(+)
 create mode 100644 rfcs/0010-multi-tenant-workload-identity/README.md

diff --git a/rfcs/0010-multi-tenant-workload-identity/README.md b/rfcs/0010-multi-tenant-workload-identity/README.md
new file mode 100644
index 00000000..152e8cf6
--- /dev/null
+++ b/rfcs/0010-multi-tenant-workload-identity/README.md
@@ -0,0 +1,1256 @@
+# RFC-0010 Multi-Tenant Workload Identity
+
+**Status:** implementable
+
+<!--
+Status represents the current state of the RFC.
+Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`.
+-->
+
+**Creation date:** 2025-02-22
+
+**Last update:** 2025-04-14
+
+## Summary
+
+In this RFC we aim to add support for multi-tenant workload identity in Flux,
+i.e. the ability to specify at the object-level which set of cloud provider
+permissions must be used for interacting with the respective cloud provider
+on behalf of the reconciliation of the object. In this process, credentials
+must be obtained automatically, i.e. this feature must not involve the use
+of secrets. This would be useful in a number of Flux APIs that need to
+interact with cloud providers, spanning all the Flux controllers except
+for helm-controller.
+
+### Multi-Tenancy Model
+
+In the context of this RFC, multi-tenancy refers to the ability of a single
+Flux instance running inside a Kubernetes cluster to manage Flux objects
+belonging to all the tenants in the cluster while still ensuring that each
+tenant has access only to their own resources according to the Least Privilege
+Principle. In this scenario a tenant is often a team inside an organization,
+so the reader can consider the
+[multi-team tenancy model](https://kubernetes.io/docs/concepts/security/multi-tenancy/#multiple-teams).
+Each team has their own namespaces, which are not shared with other teams.
+
+## Motivation
+
+Flux has strong multi-tenancy features. For example, the `Kustomization` and
+`HelmRelease` APIs support the field `spec.serviceAccountName` for specifying
+the Kubernetes `ServiceAccount` to impersonate when interacting with the
+Kubernetes API on behalf of a tenant, e.g. when applying resources. This
+allows tenants to be constrained under the Kubernetes RBAC permissions
+granted to this `ServiceAccount`, and therefore have access only to the
+specific subset of resources they should be allowed to use.
+
+Besides the Kubernetes API, Flux also interacts with cloud providers, e.g.
+container registries, object storage, pub/sub services, etc. In these cases,
+Flux currently supports basically two modes of authentication:
+
+- *Secret-based multi-tenant authentication*: Objects have the field
+  `spec.secretRef` for specifying the Kubernetes `Secret` containing the
+  credentials to use when interacting with the cloud provider. This is
+  similar to the `spec.serviceAccountName` field, but for cloud providers.
+  The problem with this approach is that secrets are a security risk and
+  operational burden, as they must be managed and rotated.
+- *Workload-identity-based single-tenant authentication*: Flux offers
+  single-tenant workload identity support by configuring the `ServiceAccount`
+  of the Flux controllers to impersonate a cloud identity. This eliminates
+  the need for secrets, as the credentials are obtained automatically by
+  the cloud provider Go libraries used by the Flux controllers when they
+  are running inside the respective cloud environment. The problem with
+  this approach is that it is single-tenant, i.e. all objects are reconciled
+  using the same cloud identity, the one associated with the respective controller.
+
+For delivering the high level of security and multi-tenancy support that
+Flux aims for, it is necessary to extend the workload identity support to
+be multi-tenant. This means that each object must be able to specify which
+cloud identity must be impersonated when interacting with the cloud provider
+on behalf of the reconciliation of the object. This would allow tenants to
+be constrained under the cloud provider permissions granted to this identity,
+and therefore have access only to the specific subset of resources they are
+allowed to manage.
+
+### Goals
+
+Provide multi-tenant workload identity support in Flux, i.e. the ability to
+specify at the object-level which cloud identity must be impersonated to
+interact with the respective cloud provider on behalf of the reconciliation
+of the object, without the need for secrets.
+
+### Non-Goals
+
+It's not a goal to provide multi-tenant workload identity *federation* support.
+The (small) difference between workload identity and workload identity federation
+is that the former assumes that the workloads are running inside the cloud
+environment, while the latter assumes that the workloads are running outside
+the cloud environment. All the major cloud providers support both, as the majority
+of the underlying technology is the same, but the configuration is slightly
+different. Because the differences are small we may consider workload identity
+federation support in the future, but it's not a goal for this RFC.
+
+## Proposal
+
+For supporting multi-tenant workload identity at the object-level for the Flux APIs
+we propose associating the Flux objects with Kubernetes `ServiceAccounts`. The
+controller would need to create a token for the `ServiceAccount` associated with
+the object in the Kubernetes API, and then exchange it for a short-lived access
+token for the cloud provider. This would require the controller `ServiceAccount`
+to have RBAC permission to create tokens for any `ServiceAccounts` in the cluster.
+
+### User Stories
+
+#### Story 1
+
+> As a cluster administrator, I want to allow tenant A to pull OCI artifacts
+> from the Amazon ECR repository belonging to tenant A, but only from this
+> repository. At the same time, I want to allow tenant B to pull OCI artifacts
+> from the Amazon ECR repository belonging to tenant B, but only from this
+> repository.
+
+For example, I would like to have the following configuration:
+
+```yaml
+apiVersion: source.toolkit.fluxcd.io/v1beta2
+kind: OCIRepository
+metadata:
+  name: tenant-a-repo
+  namespace: tenant-a
+spec:
+  ...
+  provider: aws
+  serviceAccountName: tenant-a-ecr-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-a-ecr-sa
+  namespace: tenant-a
+  annotations:
+    eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-a-ecr
+---
+apiVersion: source.toolkit.fluxcd.io/v1beta2
+kind: OCIRepository
+metadata:
+  name: tenant-b-repo
+  namespace: tenant-b
+spec:
+  ...
+  provider: aws
+  serviceAccountName: tenant-b-ecr-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-b-ecr-sa
+  namespace: tenant-b
+  annotations:
+    eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-b-ecr
+```
+
+#### Story 2
+
+> As a cluster administrator, I want to allow tenant A to pull and push to the Git
+> repository in Azure DevOps belonging to tenant A, but only this repository. At
+> the same time, I want to allow tenant B to pull and push to the Git repository
+> in Azure DevOps belonging to tenant B, but only this repository.
+
+For example, I would like to have the following configuration:
+
+```yaml
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: GitRepository
+metadata:
+  name: tenant-a-repo
+  namespace: tenant-a
+spec:
+  ...
+  provider: azure
+  serviceAccountName: tenant-a-azure-devops-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-a-azure-devops-sa
+  namespace: tenant-a
+  annotations:
+    azure.workload.identity/client-id: d6e4fc00-c5b2-4a72-9f84-6a92e3f06b08 # client ID for my tenant A
+    azure.workload.identity/tenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 # azure tenant for the cluster (optional, defaults to the env var AZURE_TENANT_ID set in the controller)
+---
+apiVersion: image.toolkit.fluxcd.io/v1beta2
+kind: ImageUpdateAutomation
+metadata:
+  name: tenant-a-image-update
+  namespace: tenant-a
+spec:
+  ...
+  sourceRef:
+    kind: GitRepository
+    name: tenant-a-repo
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: GitRepository
+metadata:
+  name: tenant-b-repo
+  namespace: tenant-b
+spec:
+  ...
+  provider: azure
+  serviceAccountName: tenant-b-azure-devops-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-b-azure-devops-sa
+  namespace: tenant-b
+  annotations:
+    azure.workload.identity/client-id: 4a7272f9-f186-41af-9f84-6a92e32d7cd0 # client ID for my tenant B
+    azure.workload.identity/tenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 # azure tenant for the cluster (optional, defaults to the env var AZURE_TENANT_ID set in the controller)
+---
+apiVersion: image.toolkit.fluxcd.io/v1beta2
+kind: ImageUpdateAutomation
+metadata:
+  name: tenant-b-image-update
+  namespace: tenant-b
+spec:
+  ...
+  sourceRef:
+    kind: GitRepository
+    name: tenant-b-repo
+```
+
+#### Story 3
+
+> As a cluster administrator, I want to allow tenant A to pull manifests from
+> the GCS bucket belonging to tenant A, but only from this bucket. At the same
+> time, I want to allow tenant B to pull manifests from the GCS bucket
+> belonging to tenant B, but only from this bucket.
+
+For example, I would like to have the following configuration:
+
+```yaml
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: Bucket
+metadata:
+  name: tenant-a-bucket
+  namespace: tenant-a
+spec:
+  ...
+  provider: gcp
+  serviceAccountName: tenant-a-gcs-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-a-gcs-sa
+  namespace: tenant-a
+  annotations:
+    iam.gke.io/gcp-service-account: tenant-a-bucket@my-org-project.iam.gserviceaccount.com
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: Bucket
+metadata:
+  name: tenant-b-bucket
+  namespace: tenant-b
+spec:
+  ...
+  provider: gcp
+  serviceAccountName: tenant-b-gcs-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-b-gcs-sa
+  namespace: tenant-b
+  annotations:
+    iam.gke.io/gcp-service-account: tenant-b-bucket@my-org-project.iam.gserviceaccount.com
+```
+
+#### Story 4
+
+> As a cluster administrator, I want to allow tenant A to decrypt secrets using
+> the AWS KMS key belonging to tenant A, but only this key. At the same time,
+> I want to allow tenant B to decrypt secrets using the AWS KMS key belonging
+> to tenant B, but only this key.
+
+For example, I would like to have the following configuration:
+
+```yaml
+apiVersion: kustomize.toolkit.fluxcd.io/v1
+kind: Kustomization
+metadata:
+  name: tenant-a-aws-kms
+  namespace: tenant-a
+spec:
+  ...
+  decryption:
+    provider: sops
+    serviceAccountName: tenant-a-aws-kms-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-a-aws-kms-sa
+  namespace: tenant-a
+  annotations:
+    eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-a-kms
+---
+apiVersion: kustomize.toolkit.fluxcd.io/v1
+kind: Kustomization
+metadata:
+  name: tenant-b-aws-kms
+  namespace: tenant-b
+spec:
+  ...
+  decryption:
+    provider: sops
+    serviceAccountName: tenant-b-aws-kms-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-b-aws-kms-sa
+  namespace: tenant-b
+  annotations:
+    eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-b-kms
+```
+
+#### Story 5
+
+> As a cluster administrator, I want to allow tenant A to publish notifications
+> to the `tenant-a` topic in Google Cloud Pub/Sub, but only to this topic. At
+> the same time, I want to allow tenant B to publish notifications to the
+> `tenant-b` topic in Google Cloud Pub/Sub, but only to this topic. I want
+> to do so without creating any GCP IAM Service Accounts.
+
+For example, I would like to have the following configuration:
+
+```yaml
+apiVersion: notification.toolkit.fluxcd.io/v1beta3
+kind: Provider
+metadata:
+  name: tenant-a-google-pubsub
+  namespace: tenant-a
+spec:
+  ...
+  type: googlepubsub
+  serviceAccountName: tenant-a-google-pubsub-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-a-google-pubsub-sa
+  namespace: tenant-a
+---
+apiVersion: notification.toolkit.fluxcd.io/v1beta3
+kind: Provider
+metadata:
+  name: tenant-b-google-pubsub
+  namespace: tenant-b
+spec:
+  ...
+  type: googlepubsub
+  serviceAccountName: tenant-b-google-pubsub-sa
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: tenant-b-google-pubsub-sa
+  namespace: tenant-b
+```
+
+### Alternatives
+
+#### An alternative for identifying Flux resources in cloud providers
+
+Instead of issuing `ServiceAccount` tokens in the Kubernetes API we could
+come up with a username naming scheme for Flux resources and issue tokens
+for these usernames instead, e.g. `flux:<resource type>:<namespace>:<name>`.
+This would make each Flux object have its own identity instead of using
+`ServiceAccounts` for this purpose. This choice would then prevent cases
+of other Flux objects from malicious actors in the same namespace from
+abusing the permissions granted to the `ServiceAccount` of the object.
+This choice, however, would provide a worse user experience, as Flux and
+Kubernetes users are already used to the `ServiceAccount` resource being
+the identity for resources in the cluster, not only in the context of plain
+RBAC but also in the context of workload identity.
+This choice would also require the introduction of new APIs for configuring
+the respective cloud identities in the Flux objects, when such APIs already
+exist as defined by the cloud providers themselves as annotations in the
+`ServiceAccount` resources. We therefore choose to stick with the well-known
+pattern of using `ServiceAccounts` for configuring the identities of the
+Flux resources. Furthermore, as mentioned in the
+[Multi-Tenancy Model](#multi-tenancy-model) section, the tenant trust domains
+are namespaces, so a tenant is expected to control and have access to all
+the resources `ServiceAccounts` in their namespaces are allowed to access.
+
+#### Alternatives for modifying controller RBAC to create `ServiceAccount` tokens
+
+In this section we discuss alternatives for changing the RBAC of controllers for
+creating `ServiceAccount` tokens cluster-wide, as it has a potential impact on
+the security posture of Flux.
+
+1. We grant RBAC permissions to the `ServiceAccounts` of the Flux controllers
+  (that would implement multi-tenant workload identity) for creating tokens
+  for any other `ServiceAccounts` in the cluster.
+2. We require users to grant "self-impersonation" to the `ServiceAccounts` so they
+  can create tokens for themselves. The controller would then impersonate the
+  `ServiceAccount` when creating a token for it. This operation would then only
+  succeed if the `ServiceAccount` has been correctly granted permission to create
+  a token for itself.
+
+In both alternatives the controller `ServiceAccount` would require some form
+of cluster-wide impersonation permission. Alternative 2 requires impersonation
+permission to be granted directly to the controller `ServiceAccount`, while
+in alternative 1, impersonation permission would be indirectly granted by the
+process of creating a token for another `ServiceAccount`. By creating a token
+for another `ServiceAccount`, the controller `ServiceAccount` effectively has
+the same permissions as the `ServiceAccount` it is creating the token for, as
+it could simply use the token to impersonate the `ServiceAccount`. Therefore
+it is reasonable to affirm that both alternatives are equivalent in terms of
+security.
+
+To break the tie between the two alternatives we introduce the fact that
+alternative 1 eliminates operational burden on users. In fact, native
+workload identity for pods does not require users to grant this
+self-impersonation permission to the `ServiceAccounts` of the pods.
+
+We therefore choose alternative 1.
+
+## Design Details
+
+For detailing the proposal we need to first introduce the technical
+background on how workload identity is implemented by the managed
+Kubernetes services from the cloud providers.
+
+### Technical Background
+
+Workload identity in Kubernetes is based on
+[OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html)
+(OIDC).
+The *Kubernetes `ServiceAccount` token issuer*, included as the `iss` JWT claim in the
+issued tokens, and represented by the default URL `https://kubernetes.default.svc.cluster.local`,
+implements the OIDC discovery protocol. Essentially, this means that the Kubernetes API
+will respond requests to the URL
+`https://kubernetes.default.svc.cluster.local/.well-known/openid-configuration`
+with a JSON document similar to the one below:
+
+```json
+{
+  "issuer": "https://kubernetes.default.svc.cluster.local",
+  "jwks_uri": "https://172.18.0.2:6443/openid/v1/jwks",
+  "response_types_supported": [
+    "id_token"
+  ],
+  "subject_types_supported": [
+    "public"
+  ],
+  "id_token_signing_alg_values_supported": [
+    "RS256"
+  ]
+}
+```
+
+And to the URL `https://172.18.0.2:6443/openid/v1/jwks`, *discovered* through the field
+`.jwks_uri` in the JSON response above, the Kubernetes API will respond a JSON document
+similar to the following:
+
+```json
+{
+  "keys": [
+    {
+      "use": "sig",
+      "kty": "RSA",
+      "kid": "NWm3YKmazJPVP7tttzkmSxUn0w8LGGp7yS2CanEF-A8",
+      "alg": "RS256",
+      "n": "lV2tbw9hnz1mseah2kMQNe5sRju4mPLlK0F7np97lLNC49G8yc5TMjyciLF3qsDNFCfWyYmsuGlcRg2BIBBX_jkpIUUjlsktdHhuqO2RnOqyRtNuljlT_b0QJgpgxCqq0DHI31EBc0JALOVd6EjjlhsVvVzZOw_b9KBXVS3D3RENuT0_FWauDq5NYbyYnjlvk-vUXCRMNDQSDNwx6X6bktwsmeDRXtM_bP3DokmnMYc4n0asTEg14L6VKky0ByF88Wi1-y0Pm0BHdobDGt1cIeUDeThk4E79JCHxkT5urAyYHcNwcfU4q-tnD6bTpNkFVsk3cqqK2nF7R_7ac5arSQ",
+      "e": "AQAB"
+    }
+  ]
+}
+```
+
+This JSON document contains the public keys for verifying the signature of the issued tokens.
+
+By querying these two URLs in sequence, cloud providers are able to fetch the information
+required for verifying and trusting the tokens issued by the Kubernetes API. Most specifically,
+for trusting the `sub` JWT claim, which contains the Kubernetes `ServiceAccount` reference
+(name and namespace) for which the token was issued for, i.e. the `ServiceAccount` properly
+said.
+
+By allowing permissions to be granted to `ServiceAccounts` in the cloud provider,
+the cloud provider is then able to allow Kubernetes `ServiceAccounts` to access its resources.
+This is usually done by a *Security Token Service* (STS) that exchanges the Kubernetes token
+for a short-lived cloud provider access token, which is then used to access the cloud provider
+resources.
+
+It's important to mention that the Kubernetes `ServiceAccount` token issuer URL must be
+trusted by the cloud provider, i.e. users must configure this URL as a trusted identity
+provider.
+
+This process forms the basis for workload identity in Kubernetes. As long as the issuer
+URL can be reached by the cloud provider, this process can take place successfully.
+
+The reachability of the issuer URL by the cloud provider is where the implementation
+of workload identity starts to differ between cloud providers. For example, in GCP
+one can configure the content of the JWKS document directly in the GCP IAM console,
+which eliminates the need for network calls to the Kubernetes API. In AWS, on the
+other hand, this is not possible, the process has to be followed strictly, i.e. the
+issuer URL must be reachable by the AWS STS service.
+
+Furthermore, GKE automatically
+creates the necessary trust relationship between the Kubernetes issuer and the GCP
+STS service (i.e. automatically injects the JWKS document of the GKE cluster in the
+STS database), while in EKS this must be done manually by users (an OIDC provider
+must be created for each EKS cluster).
+
+Another difference is that the issuer URL remains the default/private one in GKE,
+while in EKS it is automatically set to a public one. This is done through
+the `--service-account-issuer` flag in the `kube-apiserver` command line arguments
+([docs](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-issuer-discovery)). This is a nice feature, as it allows external
+systems to federate access for workloads running in EKS clusters, e.g. EKS workloads
+can have federated access to GCP resources.
+
+Yet another difference between cloud providers that sheds light in our proposal is
+how applications running inside pods from the managed Kubernetes services obtain
+the short-lived cloud provider access tokens. In GCP, the GCP libraries used by
+the applications attempt to retrieve tokens from the *metadata server*, which is
+reachable by all pods running in GKE. This server creates a token for the
+`ServiceAccount` of the calling pod in the Kubernetes API, exchanges it for a
+short-lived GCP access token, and returns it to the application. In AKS, on the
+other hand, pods are mutated to include a
+[*token volume projection*](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#serviceaccount-token-volume-projection). The kubelet mounts and automatically
+rotates a volume with a token file inside the pod. The Azure libraries used by
+the applications then read this file periodically to perform the token exchange
+with the Azure STS service.
+
+Another aspect of workload identity that is important for this RFC is how the cloud
+identities are associated with the Kubernetes `ServiceAccounts`. In most cases, an
+identity from the IAM service of the cloud provider (e.g. a GCP IAM Service Account,
+or an AWS IAM Role) is associated with a Kubernetes `ServiceAccount` by the process
+of *impersonation*. Permission to impersonate the cloud identity is granted to the
+`ServiceAccount` through a configuration that points to the fully qualified name of
+the Kubernetes `ServiceAccount`, i.e. the name and namespace of the `ServiceAccount`
+and which cluster it belongs to in the name/address system of the cloud provider.
+
+Because the cloud provider needs to support this impersonation permission, some
+cloud providers go further and even remove the impersonation requirement, by
+allowing permissions to be granted directly to `ServiceAccounts` (if it needs to
+support granting the impersonation permission, then it can probably also easily
+support granting any other permissions depending on the implementation). GCP for
+example has implemented this feature [recently](https://cloud.google.com/blog/products/identity-security/make-iam-for-gke-easier-to-use-with-workload-identity-federation), a GCP IAM
+Service Account is no longer required for workload identity, i.e. GCP IAM
+permissions can now be granted directly to Kubernetes `ServiceAccounts`. This is
+a significant improvement in the user experience, as it significantly reduces
+the required configuration steps. AWS implemented a similar feature called *EKS
+Pod Identity*, but it still requires an IAM Role to be associated with the
+`ServiceAccount`. The minor improvement from the user experience perspective is
+that this association is implemented entirely in the AWS EKS/IAM APIs, no
+annotations are required in the Kubernetes `ServiceAccount`. Another improvement
+from this EKS feature compared to *IAM Roles for Service Accounts* is that users
+no longer need to create an *OIDC Provider* for the EKS cluster in the IAM API.
+
+In sight of the technical background presented above, our proposal becomes simpler.
+The only solution to support multi-tenant workload identity at the object-level for
+the Flux APIs is to associate the Flux objects with Kubernetes `ServiceAccounts`.
+We propose to build the `ServiceAccount` token creation and exchange logic into
+the Flux controllers through a library in the `github.com/fluxcd/pkg` repository.
+
+### API Changes
+
+For all the Flux APIs interacting with cloud providers (except `Kustomization`,
+see the paragraph below), we propose introducing the field `spec.serviceAccountName`
+(if not already present) for specifying the Kubernetes `ServiceAccount` on the same
+namespace of the object that must be used for getting access to the respective cloud
+resources. This field would be optional, and when not present the original behavior
+would be observed, i.e. the feature only activates when the field is present and a
+cloud provider among `aws`, `azure` or `gcp` is specified in the `spec.provider`
+field. So if only the `spec.provider` field is present and set to a cloud provider,
+then the controller would use single-tenant workload identity as it would prior to
+the implementation of this RFC, i.e. it would use its own identity for the operation.
+
+Note that this RFC does not seek to change the behavior when `spec.provider` is set
+to `generic` (or left empty, when it defaults to `generic`), in which case the field
+`spec.secretRef` can be used for specifying the Kubernetes `Secret` containing the
+credentials (or `spec.serviceAccountName` in the case of the APIs dealing with
+container registries, through the `imagePullSecrets` field of the `ServiceAccount`).
+
+The `Kustomization` API uses Key Management Services (KMS) for decrypting
+SOPS-encrypted secrets. We propose adding the dedicated optional field
+`spec.decryption.serviceAccountName` for multi-tenant workload identity
+when intercting with the KMS service. We choose having a dedicated field
+for the `Kustomization` API because the field `spec.serviceAccountName`
+already exists and is used for a major part of the functionality which
+is authenticating with the Kubernetes API when applying resources. If
+we used the same field for both purposes users would be forced to use
+multi-tenancy for both cloud and Kubernetes API interactions. Furthermore,
+the cloud provider in the `Kustomization` API is detected by the SOPS SDK
+itself while decrypting the secrets, so we don't need to introduce a new
+field for this purpose.
+
+### Workload Identity Library
+
+We propose using the Go package `github.com/fluxcd/pkg/auth`
+for implementing a workload identity library that can be
+used by all the Flux controllers that need to interact
+with cloud providers. This library would be responsible
+for creating the `ServiceAccount` tokens in the Kubernetes
+API and exchanging them for short-lived access tokens
+for the cloud provider. The library would also be responsible
+for caching the tokens when configured by users.
+
+The library should support both single-tenant and multi-tenant workload
+identity because single-tenant implementations are already supported in
+GA APIs and hence they must remain available for backwards compatibility.
+Furthermore, it would be easier to support both use cases in a single
+library as opposed to mingling a new library into the currently existing
+ones, so this new library becomes the definitive unified solution for
+workload identity in Flux.
+
+The library should automatically detect whether the workload identity
+is single-tenant or multi-tenant by checking if a `ServiceAccount` was
+configured for the operation. If a `ServiceAccount` was configured, then
+the operation is multi-tenant, otherwise it is single-tenant and the
+granted access token must represent the identity associated with the
+controller.
+
+The directory structure would look like this:
+
+```shell
+.
+└── auth
+    ├── aws
+    │   └── aws.go
+    ├── azure
+    │   └── azure.go
+    ├── gcp
+    │   └── gcp.go
+    ├── get_token.go
+    ├── options.go
+    ├── provider.go
+    └── token.go
+```
+
+The file `auth/get_token.go` would contain the main algorithm:
+
+```go
+package auth
+
+// GetToken returns an access token for accessing resources in the given cloud provider.
+func GetToken(ctx context.Context, provider Provider, opts ...Option) (Token, error) {
+	//  1. Check if a ServiceAccount is configured and return the controller access token if not (single-tenant WI).
+	//  2. Get the provider audience for creating the OIDC token for the ServiceAccount in the Kubernetes API.
+	//  3. Get the ServiceAccount using the configured controller-runtime client.
+	//  4. Get the provider identity from the ServiceAccount annotations and add it to the options.
+	//  5. Build the cache key using the configured options.
+	//  6. Get the token from the cache. If present, return it, otherwise continue.
+	//  7. Create an OIDC token for the ServiceAccount in the Kubernetes API using the provider audience.
+	//  8. Exchange the OIDC token for an access token through the Security Token Service of the provider.
+	//  9. If an image repository is configured, exchange the access token for a registry token.
+	// 10. Add the final token to the cache and return it.
+}
+```
+
+The file `auth/token.go` would contain the token abstractions:
+
+```go
+package auth
+
+// Token is an interface that represents an access token that can be used to
+// authenticate with a cloud provider. The only common method is for getting the
+// duration of the token, because different providers have different ways of
+// representing the token. For example, Azure and GCP use a single string,
+// while AWS uses three strings: access key ID, secret access key and token.
+// Consumers of this interface should know what type to cast it to.
+type Token interface {
+	// GetDuration returns the duration for which the token is valid relative to
+	// approximately time.Now(). This is used to determine when the token should
+	// be refreshed.
+	GetDuration() time.Duration
+}
+
+// RegistryCredentials is a particular type implementing the Token interface
+// for credentials that can be used to authenticate with a container registry
+// from a cloud provider. This type is compatible with all the cloud providers
+// and should be returned when the image repository is configured in the options.
+type RegistryCredentials struct {
+	Username  string
+	Password  string
+	ExpiresAt time.Time
+}
+
+func (r *RegistryCredentials) GetDuration() time.Duration {
+	return time.Until(r.ExpiresAt)
+}
+```
+
+The file `auth/provider.go` would contain the `Provider` interface:
+
+```go
+package auth
+
+// Provider contains the logic to retrieve an access token for a cloud
+// provider from a ServiceAccount (OIDC/JWT) token.
+type Provider interface {
+	// GetName returns the name of the provider.
+	GetName() string
+
+	// NewDefaultToken returns a token that can be used to authenticate with the
+	// cloud provider retrieved from the default source, i.e. from the pod's
+	// environment, e.g. files mounted in the pod, environment variables,
+	// local metadata services, etc. In this case the method would implicitly
+	// use the ServiceAccount associated with the controller pod, and not one
+	// specified in the options.
+	NewDefaultToken(ctx context.Context, opts ...Option) (Token, error)
+
+	// GetAudience returns the audience the OIDC tokens issued representing
+	// ServiceAccounts should have. This is usually a string that represents
+	// the cloud provider's STS service, or some entity in the provider for
+	// which the OIDC tokens are targeted to.
+	GetAudience(ctx context.Context, sa corev1.ServiceAccount) (string, error)
+
+	// GetIdentity takes a ServiceAccount and returns the identity which the
+	// ServiceAccount wants to impersonate, by looking at annotations.
+	GetIdentity(sa corev1.ServiceAccount) (string, error)
+
+	// NewToken takes a ServiceAccount and its OIDC token and returns a token
+	// that can be used to authenticate with the cloud provider. The OIDC token is
+	// the JWT token that was issued for the ServiceAccount by the Kubernetes API.
+	// The implementation should exchange this token for a cloud provider access
+	// token through the provider's STS service.
+	NewTokenForServiceAccount(ctx context.Context, oidcToken string,
+		sa corev1.ServiceAccount, opts ...Option) (Token, error)
+
+	// GetImageCacheKey extracts the part of the image repository that must be
+	// included in cache keys when caching registry credentials for the provider.
+	GetImageCacheKey(imageRepository string) string
+
+	// NewRegistryToken takes an image repository and a Token and returns a token
+	// that can be used to authenticate with the container registry of the image.
+	NewRegistryToken(ctx context.Context, imageRepository string,
+		token Token, opts ...Option) (Token, error)
+}
+```
+
+The file `auth/options.go` would contain the following options:
+
+```go
+package auth
+
+// Options contains options for configuring the behavior of the provider methods.
+// Not all providers/methods support all options.
+type Options struct {
+	ServiceAccount  *client.ObjectKey
+	Client          client.Client
+	Cache           *cache.TokenCache
+	InvolvedObject  *cache.InvolvedObject
+	Scopes          []string
+	ImageRepository string
+	STSEndpoint     string
+	ProxyURL        *url.URL
+}
+
+// WithServiceAccount sets the ServiceAccount reference for the token
+// and a controller-runtime client to fetch the ServiceAccount and
+// create an OIDC token for it in the Kubernetes API.
+func WithServiceAccount(saRef client.ObjectKey, client client.Client) Option {
+	// ...
+}
+
+// WithCache sets the token cache and the involved object for recording events.
+func WithCache(cache cache.TokenCache, involvedObject cache.InvolvedObject) Option {
+	// ...
+}
+
+// WithScopes sets the scopes for the token.
+func WithScopes(scopes ...string) Option {
+	// ...
+}
+
+// WithImageRepository sets the image repository the token will be used for.
+// In most cases container registry credentials require an additional
+// token exchange at the end. This option allows the library to implement
+// this exchange and cache the final token.
+func WithImageRepository(imageRepository string) Option {
+	// ...
+}
+
+// WithSTSEndpoint sets the endpoint for the STS service.
+func WithSTSEndpoint(stsEndpoint string) Option {
+	// ...
+}
+
+// WithProxyURL sets a *url.URL for an HTTP/S proxy for acquiring the token.
+func WithProxyURL(proxyURL url.URL) Option {
+	// ...
+}
+```
+
+The `auth/aws/aws.go`, `auth/azure/azure.go` and
+`auth/gcp/gcp.go` files would contain the implementations for
+the respective cloud providers:
+
+```go
+package aws
+
+import (
+	"github.com/aws/aws-sdk-go-v2/aws"
+	"github.com/aws/aws-sdk-go-v2/credentials"
+	"github.com/aws/aws-sdk-go-v2/service/sts/types"
+)
+
+const ProviderName = "aws"
+
+type Provider struct{}
+
+type Token struct{ types.Credentials }
+
+// GetDuration implements auth.Token.
+func (t *Token) GetDuration() time.Duration {
+	return time.Until(*t.Expiration)
+}
+
+type credentialsProvider struct {
+	opts []auth.Option
+}
+
+// NewCredentialsProvider creates an aws.CredentialsProvider for the aws provider.
+func NewCredentialsProvider(opts ...auth.Option) aws.CredentialsProvider {
+	return &credentialsProvider{opts}
+}
+
+// Retrieve implements aws.CredentialsProvider.
+func (c *credentialsProvider) Retrieve(ctx context.Context) (aws.Credentials, error) {
+	// Use auth.GetToken() to get the token.
+}
+```
+
+```go
+package azure
+
+import (
+	"github.com/Azure/azure-sdk-for-go/sdk/azcore"
+	"github.com/Azure/azure-sdk-for-go/sdk/azcore/policy"
+)
+
+const ProviderName = "azure"
+
+type Provider struct{}
+
+type Token struct{ azcore.AccessToken }
+
+// GetDuration implements auth.Token.
+func (t *Token) GetDuration() time.Duration {
+	return time.Until(t.ExpiresOn)
+}
+
+type tokenCredential struct {
+	opts []auth.Option
+}
+
+// NewTokenCredential creates an azcore.TokenCredential for the azure provider.
+func NewTokenCredential(opts ...auth.Option) azcore.TokenCredential {
+	return &tokenCredential{opts}
+}
+
+// GetToken implements azcore.TokenCredential.
+// The options argument is ignored, any options should be
+// specified in the constructor.
+func (t *tokenCredential) GetToken(ctx context.Context, _ policy.TokenRequestOptions) (azcore.AccessToken, error) {
+	// Use auth.GetToken() to get the token.
+}
+```
+
+```go
+package gcp
+
+import (
+	"golang.org/x/oauth2"
+)
+
+const ProviderName = "gcp"
+
+type Provider struct {}
+
+type Token struct{ oauth2.Token }
+
+// GetDuration implements auth.Token.
+func (t *Token) GetDuration() time.Duration {
+	return time.Until(t.Expiry)
+}
+
+type tokenSource struct {
+	ctx context.Context
+	opts []auth.Option
+}
+
+// NewTokenSource creates an oauth2.TokenSource for the gcp provider.
+func NewTokenSource(ctx context.Context, opts ...auth.Option) oauth2.TokenSource {
+	return &tokenSource{ctx, opts}
+}
+
+// Token implements oauth2.TokenSource.
+func (t *tokenSource) Token() (*oauth2.Token, error) {
+	// Use auth.GetToken() to get the token.
+}
+
+var gkeMetadata struct {
+	projectID      string
+	location       string
+	name           string
+	mu             sync.Mutex
+	loaded         bool
+}
+```
+
+As detailed above, each cloud provider implementation defines a simple wrapper
+around the cloud provider access token type. This wrapper implements the
+`auth.Token` interface, which is essentially the method `GetDuration()`
+for the cache library to manage the token lifetime. The wrappers also contain
+a helper function to create a token source for the respective cloud provider
+SDKs. These methods have different names and signatures because the cloud provider
+SDKs are different and have different types, but they all implement the same
+concept of a token source.
+
+The `aws` provider needs to read the environment variable `AWS_REGION` for
+configuring the STS client. Even though a specific STS endpoint may be
+configured, the AWS SDKs require the region to be set regardless. This
+variable is usually set automatically in EKS pods, and can be manually set
+by users otherwise (e.g. in Fargate pods).
+
+An important detail to take into account in the `azure` provider implementation
+is using our custom implementation of `azidentity.NewDefaultAzureCredential()`
+found in kustomize-controller for SOPS decryption. This custom implementation
+avoids shelling out to the Azure CLI, which is something we strive to avoid in
+the Flux codebase. This is important because today we are doing this in a few
+APIs but not others, so it will be a significant improvement to implement this
+in a single place and use it everywhere.
+
+The `gcp` provider needs to load the cluster metadata from the `gke-metadata-server`
+in order to create tokens. This must be done lazily when the first token is
+requested, and there's a very important reason for this: if this was done on
+the controller startup, the controller would crash when running outside GKE and
+enter `CrashLoopBackOff` because the `gke-metadata-server` would never be
+available. This is a very important detail that must be taken into account when
+implementing the `gcp` provider. The cluster metadata doesn't change during the
+lifetime of the controller pod, so we use a `sync.Mutex` and `bool` to load it
+only once into a package variable.
+
+#### Cache Key
+
+The cache key must include the following components:
+
+* The cloud provider name.
+* The provider audience used for issuing the Kubernetes `ServiceAccount` token.
+* The optional `ServiceAccount` reference and cloud provider identity.
+  The identity is the string representing the identity which the `ServiceAccount`
+  is impersonating, e.g. for `gcp` this would be a GCP IAM Service Account email,
+  for `aws` this would be an AWS IAM Role ARN, etc. When there is no identity
+  configured for impersonation, only the `ServiceAccount` reference is included.
+* The optional scopes added to the token.
+* The cache key extracted from the optional image repository.
+* The optional STS endpoint used for issuing the token.
+* The optional proxy URL when the STS endpoint is present.
+
+##### Justification
+
+When single-tenant workload identity is being used, the identity associated with
+the controller is the one represented by the token, so there is no identity or
+`ServiceAccount` to identify in the cache key besides the implicit ones associated
+with the controller. In this case, including only the cloud provider name in the
+cache key is enough.
+
+The provider audience used for issuing the `ServiceAccount` token is included
+in the cache key because it may depend on the `ServiceAccount` annotations.
+For example, in AWS if an IAM Role ARN is not specified we assume that users
+are attempting to use EKS Pod Identity instead of IAM Roles for Service
+Accounts. Each feature has its own audience string and its own way of issuing
+tokens, so the audience string must be included in the cache key.
+
+In multi-tenant workload identity, the reason for including both the `ServiceAccount`
+and the identity in the cache key is to establish the fact that the `ServiceAccount`
+had permission to impersonate the identity at the time when the token was issued.
+This is very important. For the sake of the argument, suppose we include only the
+identity. Then a malicious actor could specify any identity in their `ServiceAccount`
+and get a token cached for that identity even if their `ServiceAccount` did not have
+permission to impersonate that identity. We also need to include the identity in the
+cache key because, otherwise, if including only the `ServiceAccount`, changes to the
+`ServiceAccount` annotations to impersonate a different identity would not cause a
+new token impersonating the new identity to be created since the cache key did not
+change.
+
+In most cases container registry credentials require an additional token exchange
+at the end. In order to benefit from caching the final token and freeing the
+library consumers from this responsibility, we allow an image repository to
+be included in the options and implement the exchange. Depending on the cloud
+provider, a part of the image repository string is extracted and used to issue
+the token, e.g. for ECR the region is extracted and used to configure the client,
+and in the case of ACR the registry host is included in the resulting token.
+Those parts of the image repository must be included in the cache key. This is
+accomplished by the `Provider.GetImageCacheKey()` method. In the case of GCP
+container registries the image repository does not influence how the token is
+issued.
+
+The scopes are included in the cache key because they delimit the permissions that
+the token has. They don't *grant* the permissions, they just set an upper bound for
+the permissions that the token can have. Providers requiring scopes unfortunately
+benefit less from caching, e.g. a token issued for an Azure identity can't be
+seamlessly used for both Azure DevOps and the Azure Container Registry, because the
+respective scopes are different, so the issued tokens are different.
+
+The STS endpoint and proxy URL are included in the cache key because they could
+influence how the token is fetched and ultimately issued. The proxy URL is included
+only when the STS endpoint is present, because all the default STS endpoints are
+HTTPS and belong to cloud providers, so they are all well-known, unique, and the
+proxy is guaranteed not to tamper with the issuance of the token since it only
+sees an opaque TLS session passing through.
+
+##### Format
+
+The cache key would be the SHA256 hash of the following string (breaking lines
+after commas for readability):
+
+Single-tenant/controller-level:
+
+```
+provider=<cloud-provider-name>,
+scopes=<comma-separated-scopes>,
+imageRepositoryKey=<'gcp'-for-gcp|registry-region-for-aws|registry-host-for-azure>,
+stsEndpoint=<sts-endpoint>,
+proxyURL=<proxy-url>
+```
+
+Multi-tenant/object-level:
+
+```
+provider=<cloud-provider-name>,
+providerAudience=<cloud-provider-audience>,
+serviceAccountName=<service-account-name>,
+serviceAccountNamespace=<service-account-namespace>,
+cloudProviderIdentity=<cloud-provider-identity>,
+scopes=<comma-separated-scopes>,
+imageRepositoryKey=<'gcp'-for-gcp|registry-region-for-aws|registry-host-for-azure>,
+stsEndpoint=<sts-endpoint>,
+proxyURL=<proxy-url>
+```
+
+##### Security Considerations and Controls
+
+As mentioned previously, a `ServiceAccount` must have permission to impersonate the
+identity it is configured to impersonate. Once a token for the impersonated identity
+is issued, that token would be valid for a while even if immediately after issuing it
+the `ServiceAccount` loses permission to impersonate that identity. In our cache key
+design, the token would remain available for the `ServiceAccount` to use until it
+expires. If the impersonation permission was revoked to mitigate an attack, the
+attacker could still get a valid token from the cache for a while after the
+revocation, and hence still exercise the permissions they had prior to the revocation.
+
+There are a few mitigations for this scenario:
+
+* Users that revoke impersonation permissions for a `ServiceAccount` must also
+  change the annotations of the `ServiceAccount` to impersonate a different identity,
+  or delete the `ServiceAccount` altogether, or restart the Flux controllers so the
+  cache is purged. Any of these actions would effectively prevent the attack, but
+  they represent an additional step after revoking the impersonation permission.
+
+* In the Flux controllers users can specify the `--token-cache-max-duration` flag,
+  which can be used to limit the maximum duration for which a token can be cached.
+  By reducing the default maximum duration of one hour to a smaller value, users can
+  limit the time window during which a token would be available for a `ServiceAccount`
+  to use after losing permission to impersonate the identity.
+
+* Disable cache entirely by setting the flag `--token-cache-max-size=0`, or removing
+  this flag altogether since the default is already zero i.e. no tokens are cached
+  in the Flux controller. This mitigation is in case your security requirements are
+  extreme and you want to avoid any risk of such an attack. This mitigation is the
+  most effective, but it comes with the cost of many API calls to issue tokens in
+  the cloud provider, which could result in a performance bottleneck and/or
+  throttling/rate-limiting, as tokens would have to be issued for every
+  reconciliation.
+
+A similar situation could occur in the single-tenant scenario, when the permission
+to impersonate the configured identity is revoked from the controller `ServiceAccount`.
+In this case, the attacker would have access to the cloud provider resources that
+the controller had access to prior to the revocation of the impersonation permission.
+Most of the mitigations mentioned above apply to this scenario as well, except for
+the one that involves changing the annotations of the `ServiceAccount` to impersonate
+a different identity or deleting the `ServiceAccount` altogether, as the controller
+`ServiceAccount` should not be deleted. The best mitigation in this case is to restart
+the Flux controllers so the cache is purged.
+
+**EKS Pod Identity**: In EKS Pod Identity the association between a `ServiceAccount`
+and an IAM Role is not configured on the `ServiceAccount` annotations, nor anywhere
+else inside the Kubernetes cluster. The association is established entirely through
+the EKS/IAM APIs. In this case, all the mitigations mentioned above apply, except
+for the one that involves changing the annotations of the `ServiceAccount`, as there
+are no annotations to change.
+
+### Library Integration
+
+When reconciling an object, the controller must use the `auth.GetToken()`
+function passing a `controller-runtime` client that has permission to create
+`ServiceAccount` tokens in the Kubernetes API, the desired cloud provider by name,
+and all the remaining options according to the configuration of the controller and
+of the object. The provider names match the ones used for `spec.provider` in the Flux
+APIs, i.e. `aws`, `azure` and `gcp`.
+
+Because different cloud providers have different ways of representing their access
+tokens (e.g. Azure and GCP tokens are a single opaque string while AWS has three
+strings: access key ID, secret access key and token), consumers of the
+`auth.Token` interface would need to cast it to `*<provider>.Token`.
+
+The following subsections show details of how the integration would look like.
+
+#### `GitRepository` and `ImageUpdateAutomation` APIs
+
+For these APIs the only provider we have so far that supports workload identity
+is `azure`. In this case we would simply replace `AzureOpts []azure.OptFunc` in
+the `fluxcd/pkg/git.ProviderOptions` struct with `[]fluxcd/pkg/auth.Option`
+and would modify `fluxcd/pkg/git.GetCredentials()` to use `auth.GetToken()`.
+The token interface would be cast to `*azure.Token` and the token string would be
+assigned to `fluxcd/pkg/git.Credentials.BearerToken`. A `GitRepository` object
+configured with the `azure` provider and a `ServiceAccount` would then go through
+this code path.
+
+#### `OCIRepository`, `ImageRepository`, `HelmRepository` and `HelmChart` APIs
+
+The `HelmRepository` API only supports a cloud provider for OCI repositories, so
+for all these APIs we would only need to support OCI authentication.
+
+All these APIs currently use `*fluxcd/pkg/oci/auth/login.Manager` to get the
+container registry credentials. The new library would replace this library
+entirely, as it mostly handles single-tenant workload identity. The new library
+covers both single-tenant and multi-tenant workload identity, so it would be
+a drop-in replacement for the `login.Manager`.
+
+In the case of the source-controller APIs, all of them use the function `OIDCAuth()`
+from the internal package `internal/oci`. We would replace the use of `login.Manager`
+with `auth.GetToken()` in this function. The token interface would
+be cast to `*auth.RegistryCredentials` and then fed to `authn.FromConfig()`
+from the package `github.com/google/go-containerregistry/pkg/authn`.
+
+In the case of `ImageRepository`, we would replace `login.Manager` with
+`auth.GetToken()` in the `setAuthOptions()` method of the
+`ImageRepositoryReconciler`, cast the token to `*auth.RegistryCredentials`
+and then feed it to `authn.FromConfig()`.
+
+The beauty of this particular integration is that here we no longer require
+branching code paths for each cloud provider, we would just need to configure
+the options for the `auth.GetToken()` function and the library would take
+care of the rest.
+
+#### `Bucket` API
+
+##### Provider `aws`
+
+A `Bucket` object configured with the `aws` provider and a `ServiceAccount` would
+cause the internal `minio.MinioClient` of source-controller to be created with the
+following new options:
+
+* `minio.WithTokenClient(controller-runtime/pkg/client.Client)`
+* `minio.WithTokenCache(*fluxcd/pkg/cache.TokenCache)`
+
+The constructor would then use `auth.GetToken()` to get the
+cloud provider access token. When doing so, the `minio.MinioClient` would
+cast the token interface to `*aws.Token` and feed it to `credentials.NewStatic()`
+from the package `github.com/minio/minio-go/v7/pkg/credentials`.
+
+##### Provider `azure`
+
+A `Bucket` object configured with the `azure` provider and a `ServiceAccount`
+would cause the internal `azure.BlobClient` of source-controller to be created
+with the following new options:
+
+* `azure.WithTokenClient(controller-runtime/pkg/client.Client)`
+* `azure.WithTokenCache(*fluxcd/pkg/cache.TokenCache)`
+* `azure.WithServiceAccount(controller-runtime/pkg/client.ObjectKey)`
+* `azure.WithInvolvedObject(*fluxcd/pkg/cache.InvolvedObject)`
+
+The constructor would then use `azure.NewTokenCredential()` to feed this
+token credential to `azblob.NewClient()`.
+
+##### Provider `gcp`
+
+A `Bucket` object configured with the `gcp` provider and a `ServiceAccount`
+would cause the internal `gcp.GCSClient` of source-controller to be created
+with the following new options:
+
+* `gcp.WithTokenClient(controller-runtime/pkg/client.Client)`
+* `gcp.WithTokenCache(*fluxcd/pkg/cache.TokenCache)`
+* `gcp.WithServiceAccount(controller-runtime/pkg/client.ObjectKey)`
+* `gcp.WithInvolvedObject(*fluxcd/pkg/cache.InvolvedObject)`
+
+The constructor would then use `gcp.NewTokenSource()` to feed this token
+source to the `option.WithTokenSource()` and pass it to
+`cloud.google.com/go/storage.NewClient()`.
+
+#### `Kustomization` API
+
+The `Kustomization` API uses Key Management Services (KMS) for decrypting
+SOPS secrets. The internal packages `internal/decryptor` and `internal/sops`
+of kustomize-controller already use interfaces compatible with the new
+library in the case of `aws` and `azure`, i.e. `*awskms.CredentialsProvider`
+and `*azkv.TokenCredential` respectively, so we could easily use the helper
+functions for creating the respective token sources to configure the KMS
+credentials for SOPS. This is thanks to the respective SOPS libraries
+`github.com/getsops/sops/v3/kms` and `github.com/getsops/sops/v3/azkv`.
+For GCP we can introduce the equivalent interface that was recently added
+in [this](https://github.com/getsops/sops/pull/1794/files) pull request.
+This new interface introduced in SOPS upstream can also be used for the
+current JSON credentials method that we use via
+`google.CredentialsFromJSON().TokenSource`. This would allow us to use only
+the respective token source interfaces for all three providers when using
+either workload identity or secrets.
+
+#### `Provider` API
+
+The constructor of the internal `notifier.Factory` of notification-controller
+would now accept the following new options:
+
+* `notifier.WithTokenClient(controller-runtime/pkg/client.Client)`
+* `notifier.WithTokenCache(*fluxcd/pkg/cache.TokenCache)`
+* `notifier.WithServiceAccount(controller-runtime/pkg/client.ObjectKey)`
+* `notifier.WithInvolvedObject(*fluxcd/pkg/cache.InvolvedObject)`
+
+The cloud provider types that support workload identity would then use these
+options. See the following subsections for details.
+
+##### Type `azuredevops`
+
+The `notifier.NewAzureDevOps()` constructor would use the existing and new
+options to call `auth.GetToken()` and use it to get the cloud
+provider access token. When doing so, the `notifier.AzureDevOps` would cast
+the token interface to `*azure.Token` and feed the token string to
+`NewPatConnection()` from the package
+`github.com/microsoft/azure-devops-go-api/azuredevops/v6`.
+
+##### Type `azureeventhub`
+
+The `notifier.NewAzureEventHub()` constructor would use the existing and new
+options to call `auth.GetToken()` and use it to get the cloud
+provider access token. When doing so, the `notifier.AzureEventHub` would cast
+the token interface to `*azure.Token` and feed the token string to `newJWTHub()`.
+
+##### Type `googlepubsub`
+
+The `notifier.NewGooglePubSub()` constructor would use the existing and new
+options to call `gcp.NewTokenSource()` and feed this token source to the
+`option.WithTokenSource()` and pass it to `cloud.google.com/go/pubsub.NewClient()`.
+
+## Implementation History
+
+A realistic estimate for implementing this proposal would be from two to
+three Flux minor releases. This is so we can work on more pressing priorities
+while still making progress towards this milestone. The implementation of
+the core library would be done in the first release, and the integration
+with the Flux APIs would be spread across all these releases. All the three
+cloud providers should be implemented for each API getting this feature in
+any given release. Our first priority should be `Kustomization`, as it is
+where security is most important since it deals with secrets.
+
+<!--
+Major milestones in the lifecycle of the RFC such as:
+- The first Flux release where an initial version of the RFC was available.
+- The version of Flux where the RFC graduated to general availability.
+- The version of Flux where the RFC was retired or superseded.
+-->