From a7e41df1e3ce16b4943f12021ea01e804b6f2c38 Mon Sep 17 00:00:00 2001 From: Matheus Pimenta Date: Sat, 22 Feb 2025 20:31:14 +0000 Subject: [PATCH] [RFC-0010] Multi-Tenant Workload Identity Signed-off-by: Matheus Pimenta --- .../README.md | 1256 +++++++++++++++++ 1 file changed, 1256 insertions(+) create mode 100644 rfcs/0010-multi-tenant-workload-identity/README.md diff --git a/rfcs/0010-multi-tenant-workload-identity/README.md b/rfcs/0010-multi-tenant-workload-identity/README.md new file mode 100644 index 00000000..152e8cf6 --- /dev/null +++ b/rfcs/0010-multi-tenant-workload-identity/README.md @@ -0,0 +1,1256 @@ +# RFC-0010 Multi-Tenant Workload Identity + +**Status:** implementable + + + +**Creation date:** 2025-02-22 + +**Last update:** 2025-04-14 + +## Summary + +In this RFC we aim to add support for multi-tenant workload identity in Flux, +i.e. the ability to specify at the object-level which set of cloud provider +permissions must be used for interacting with the respective cloud provider +on behalf of the reconciliation of the object. In this process, credentials +must be obtained automatically, i.e. this feature must not involve the use +of secrets. This would be useful in a number of Flux APIs that need to +interact with cloud providers, spanning all the Flux controllers except +for helm-controller. + +### Multi-Tenancy Model + +In the context of this RFC, multi-tenancy refers to the ability of a single +Flux instance running inside a Kubernetes cluster to manage Flux objects +belonging to all the tenants in the cluster while still ensuring that each +tenant has access only to their own resources according to the Least Privilege +Principle. In this scenario a tenant is often a team inside an organization, +so the reader can consider the +[multi-team tenancy model](https://kubernetes.io/docs/concepts/security/multi-tenancy/#multiple-teams). +Each team has their own namespaces, which are not shared with other teams. + +## Motivation + +Flux has strong multi-tenancy features. For example, the `Kustomization` and +`HelmRelease` APIs support the field `spec.serviceAccountName` for specifying +the Kubernetes `ServiceAccount` to impersonate when interacting with the +Kubernetes API on behalf of a tenant, e.g. when applying resources. This +allows tenants to be constrained under the Kubernetes RBAC permissions +granted to this `ServiceAccount`, and therefore have access only to the +specific subset of resources they should be allowed to use. + +Besides the Kubernetes API, Flux also interacts with cloud providers, e.g. +container registries, object storage, pub/sub services, etc. In these cases, +Flux currently supports basically two modes of authentication: + +- *Secret-based multi-tenant authentication*: Objects have the field + `spec.secretRef` for specifying the Kubernetes `Secret` containing the + credentials to use when interacting with the cloud provider. This is + similar to the `spec.serviceAccountName` field, but for cloud providers. + The problem with this approach is that secrets are a security risk and + operational burden, as they must be managed and rotated. +- *Workload-identity-based single-tenant authentication*: Flux offers + single-tenant workload identity support by configuring the `ServiceAccount` + of the Flux controllers to impersonate a cloud identity. This eliminates + the need for secrets, as the credentials are obtained automatically by + the cloud provider Go libraries used by the Flux controllers when they + are running inside the respective cloud environment. The problem with + this approach is that it is single-tenant, i.e. all objects are reconciled + using the same cloud identity, the one associated with the respective controller. + +For delivering the high level of security and multi-tenancy support that +Flux aims for, it is necessary to extend the workload identity support to +be multi-tenant. This means that each object must be able to specify which +cloud identity must be impersonated when interacting with the cloud provider +on behalf of the reconciliation of the object. This would allow tenants to +be constrained under the cloud provider permissions granted to this identity, +and therefore have access only to the specific subset of resources they are +allowed to manage. + +### Goals + +Provide multi-tenant workload identity support in Flux, i.e. the ability to +specify at the object-level which cloud identity must be impersonated to +interact with the respective cloud provider on behalf of the reconciliation +of the object, without the need for secrets. + +### Non-Goals + +It's not a goal to provide multi-tenant workload identity *federation* support. +The (small) difference between workload identity and workload identity federation +is that the former assumes that the workloads are running inside the cloud +environment, while the latter assumes that the workloads are running outside +the cloud environment. All the major cloud providers support both, as the majority +of the underlying technology is the same, but the configuration is slightly +different. Because the differences are small we may consider workload identity +federation support in the future, but it's not a goal for this RFC. + +## Proposal + +For supporting multi-tenant workload identity at the object-level for the Flux APIs +we propose associating the Flux objects with Kubernetes `ServiceAccounts`. The +controller would need to create a token for the `ServiceAccount` associated with +the object in the Kubernetes API, and then exchange it for a short-lived access +token for the cloud provider. This would require the controller `ServiceAccount` +to have RBAC permission to create tokens for any `ServiceAccounts` in the cluster. + +### User Stories + +#### Story 1 + +> As a cluster administrator, I want to allow tenant A to pull OCI artifacts +> from the Amazon ECR repository belonging to tenant A, but only from this +> repository. At the same time, I want to allow tenant B to pull OCI artifacts +> from the Amazon ECR repository belonging to tenant B, but only from this +> repository. + +For example, I would like to have the following configuration: + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1beta2 +kind: OCIRepository +metadata: + name: tenant-a-repo + namespace: tenant-a +spec: + ... + provider: aws + serviceAccountName: tenant-a-ecr-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-a-ecr-sa + namespace: tenant-a + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-a-ecr +--- +apiVersion: source.toolkit.fluxcd.io/v1beta2 +kind: OCIRepository +metadata: + name: tenant-b-repo + namespace: tenant-b +spec: + ... + provider: aws + serviceAccountName: tenant-b-ecr-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-b-ecr-sa + namespace: tenant-b + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-b-ecr +``` + +#### Story 2 + +> As a cluster administrator, I want to allow tenant A to pull and push to the Git +> repository in Azure DevOps belonging to tenant A, but only this repository. At +> the same time, I want to allow tenant B to pull and push to the Git repository +> in Azure DevOps belonging to tenant B, but only this repository. + +For example, I would like to have the following configuration: + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: GitRepository +metadata: + name: tenant-a-repo + namespace: tenant-a +spec: + ... + provider: azure + serviceAccountName: tenant-a-azure-devops-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-a-azure-devops-sa + namespace: tenant-a + annotations: + azure.workload.identity/client-id: d6e4fc00-c5b2-4a72-9f84-6a92e3f06b08 # client ID for my tenant A + azure.workload.identity/tenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 # azure tenant for the cluster (optional, defaults to the env var AZURE_TENANT_ID set in the controller) +--- +apiVersion: image.toolkit.fluxcd.io/v1beta2 +kind: ImageUpdateAutomation +metadata: + name: tenant-a-image-update + namespace: tenant-a +spec: + ... + sourceRef: + kind: GitRepository + name: tenant-a-repo +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: GitRepository +metadata: + name: tenant-b-repo + namespace: tenant-b +spec: + ... + provider: azure + serviceAccountName: tenant-b-azure-devops-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-b-azure-devops-sa + namespace: tenant-b + annotations: + azure.workload.identity/client-id: 4a7272f9-f186-41af-9f84-6a92e32d7cd0 # client ID for my tenant B + azure.workload.identity/tenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 # azure tenant for the cluster (optional, defaults to the env var AZURE_TENANT_ID set in the controller) +--- +apiVersion: image.toolkit.fluxcd.io/v1beta2 +kind: ImageUpdateAutomation +metadata: + name: tenant-b-image-update + namespace: tenant-b +spec: + ... + sourceRef: + kind: GitRepository + name: tenant-b-repo +``` + +#### Story 3 + +> As a cluster administrator, I want to allow tenant A to pull manifests from +> the GCS bucket belonging to tenant A, but only from this bucket. At the same +> time, I want to allow tenant B to pull manifests from the GCS bucket +> belonging to tenant B, but only from this bucket. + +For example, I would like to have the following configuration: + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: tenant-a-bucket + namespace: tenant-a +spec: + ... + provider: gcp + serviceAccountName: tenant-a-gcs-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-a-gcs-sa + namespace: tenant-a + annotations: + iam.gke.io/gcp-service-account: tenant-a-bucket@my-org-project.iam.gserviceaccount.com +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: tenant-b-bucket + namespace: tenant-b +spec: + ... + provider: gcp + serviceAccountName: tenant-b-gcs-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-b-gcs-sa + namespace: tenant-b + annotations: + iam.gke.io/gcp-service-account: tenant-b-bucket@my-org-project.iam.gserviceaccount.com +``` + +#### Story 4 + +> As a cluster administrator, I want to allow tenant A to decrypt secrets using +> the AWS KMS key belonging to tenant A, but only this key. At the same time, +> I want to allow tenant B to decrypt secrets using the AWS KMS key belonging +> to tenant B, but only this key. + +For example, I would like to have the following configuration: + +```yaml +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: tenant-a-aws-kms + namespace: tenant-a +spec: + ... + decryption: + provider: sops + serviceAccountName: tenant-a-aws-kms-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-a-aws-kms-sa + namespace: tenant-a + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-a-kms +--- +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: tenant-b-aws-kms + namespace: tenant-b +spec: + ... + decryption: + provider: sops + serviceAccountName: tenant-b-aws-kms-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-b-aws-kms-sa + namespace: tenant-b + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::123456789123:role/tenant-b-kms +``` + +#### Story 5 + +> As a cluster administrator, I want to allow tenant A to publish notifications +> to the `tenant-a` topic in Google Cloud Pub/Sub, but only to this topic. At +> the same time, I want to allow tenant B to publish notifications to the +> `tenant-b` topic in Google Cloud Pub/Sub, but only to this topic. I want +> to do so without creating any GCP IAM Service Accounts. + +For example, I would like to have the following configuration: + +```yaml +apiVersion: notification.toolkit.fluxcd.io/v1beta3 +kind: Provider +metadata: + name: tenant-a-google-pubsub + namespace: tenant-a +spec: + ... + type: googlepubsub + serviceAccountName: tenant-a-google-pubsub-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-a-google-pubsub-sa + namespace: tenant-a +--- +apiVersion: notification.toolkit.fluxcd.io/v1beta3 +kind: Provider +metadata: + name: tenant-b-google-pubsub + namespace: tenant-b +spec: + ... + type: googlepubsub + serviceAccountName: tenant-b-google-pubsub-sa +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: tenant-b-google-pubsub-sa + namespace: tenant-b +``` + +### Alternatives + +#### An alternative for identifying Flux resources in cloud providers + +Instead of issuing `ServiceAccount` tokens in the Kubernetes API we could +come up with a username naming scheme for Flux resources and issue tokens +for these usernames instead, e.g. `flux:::`. +This would make each Flux object have its own identity instead of using +`ServiceAccounts` for this purpose. This choice would then prevent cases +of other Flux objects from malicious actors in the same namespace from +abusing the permissions granted to the `ServiceAccount` of the object. +This choice, however, would provide a worse user experience, as Flux and +Kubernetes users are already used to the `ServiceAccount` resource being +the identity for resources in the cluster, not only in the context of plain +RBAC but also in the context of workload identity. +This choice would also require the introduction of new APIs for configuring +the respective cloud identities in the Flux objects, when such APIs already +exist as defined by the cloud providers themselves as annotations in the +`ServiceAccount` resources. We therefore choose to stick with the well-known +pattern of using `ServiceAccounts` for configuring the identities of the +Flux resources. Furthermore, as mentioned in the +[Multi-Tenancy Model](#multi-tenancy-model) section, the tenant trust domains +are namespaces, so a tenant is expected to control and have access to all +the resources `ServiceAccounts` in their namespaces are allowed to access. + +#### Alternatives for modifying controller RBAC to create `ServiceAccount` tokens + +In this section we discuss alternatives for changing the RBAC of controllers for +creating `ServiceAccount` tokens cluster-wide, as it has a potential impact on +the security posture of Flux. + +1. We grant RBAC permissions to the `ServiceAccounts` of the Flux controllers + (that would implement multi-tenant workload identity) for creating tokens + for any other `ServiceAccounts` in the cluster. +2. We require users to grant "self-impersonation" to the `ServiceAccounts` so they + can create tokens for themselves. The controller would then impersonate the + `ServiceAccount` when creating a token for it. This operation would then only + succeed if the `ServiceAccount` has been correctly granted permission to create + a token for itself. + +In both alternatives the controller `ServiceAccount` would require some form +of cluster-wide impersonation permission. Alternative 2 requires impersonation +permission to be granted directly to the controller `ServiceAccount`, while +in alternative 1, impersonation permission would be indirectly granted by the +process of creating a token for another `ServiceAccount`. By creating a token +for another `ServiceAccount`, the controller `ServiceAccount` effectively has +the same permissions as the `ServiceAccount` it is creating the token for, as +it could simply use the token to impersonate the `ServiceAccount`. Therefore +it is reasonable to affirm that both alternatives are equivalent in terms of +security. + +To break the tie between the two alternatives we introduce the fact that +alternative 1 eliminates operational burden on users. In fact, native +workload identity for pods does not require users to grant this +self-impersonation permission to the `ServiceAccounts` of the pods. + +We therefore choose alternative 1. + +## Design Details + +For detailing the proposal we need to first introduce the technical +background on how workload identity is implemented by the managed +Kubernetes services from the cloud providers. + +### Technical Background + +Workload identity in Kubernetes is based on +[OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) +(OIDC). +The *Kubernetes `ServiceAccount` token issuer*, included as the `iss` JWT claim in the +issued tokens, and represented by the default URL `https://kubernetes.default.svc.cluster.local`, +implements the OIDC discovery protocol. Essentially, this means that the Kubernetes API +will respond requests to the URL +`https://kubernetes.default.svc.cluster.local/.well-known/openid-configuration` +with a JSON document similar to the one below: + +```json +{ + "issuer": "https://kubernetes.default.svc.cluster.local", + "jwks_uri": "https://172.18.0.2:6443/openid/v1/jwks", + "response_types_supported": [ + "id_token" + ], + "subject_types_supported": [ + "public" + ], + "id_token_signing_alg_values_supported": [ + "RS256" + ] +} +``` + +And to the URL `https://172.18.0.2:6443/openid/v1/jwks`, *discovered* through the field +`.jwks_uri` in the JSON response above, the Kubernetes API will respond a JSON document +similar to the following: + +```json +{ + "keys": [ + { + "use": "sig", + "kty": "RSA", + "kid": "NWm3YKmazJPVP7tttzkmSxUn0w8LGGp7yS2CanEF-A8", + "alg": "RS256", + "n": "lV2tbw9hnz1mseah2kMQNe5sRju4mPLlK0F7np97lLNC49G8yc5TMjyciLF3qsDNFCfWyYmsuGlcRg2BIBBX_jkpIUUjlsktdHhuqO2RnOqyRtNuljlT_b0QJgpgxCqq0DHI31EBc0JALOVd6EjjlhsVvVzZOw_b9KBXVS3D3RENuT0_FWauDq5NYbyYnjlvk-vUXCRMNDQSDNwx6X6bktwsmeDRXtM_bP3DokmnMYc4n0asTEg14L6VKky0ByF88Wi1-y0Pm0BHdobDGt1cIeUDeThk4E79JCHxkT5urAyYHcNwcfU4q-tnD6bTpNkFVsk3cqqK2nF7R_7ac5arSQ", + "e": "AQAB" + } + ] +} +``` + +This JSON document contains the public keys for verifying the signature of the issued tokens. + +By querying these two URLs in sequence, cloud providers are able to fetch the information +required for verifying and trusting the tokens issued by the Kubernetes API. Most specifically, +for trusting the `sub` JWT claim, which contains the Kubernetes `ServiceAccount` reference +(name and namespace) for which the token was issued for, i.e. the `ServiceAccount` properly +said. + +By allowing permissions to be granted to `ServiceAccounts` in the cloud provider, +the cloud provider is then able to allow Kubernetes `ServiceAccounts` to access its resources. +This is usually done by a *Security Token Service* (STS) that exchanges the Kubernetes token +for a short-lived cloud provider access token, which is then used to access the cloud provider +resources. + +It's important to mention that the Kubernetes `ServiceAccount` token issuer URL must be +trusted by the cloud provider, i.e. users must configure this URL as a trusted identity +provider. + +This process forms the basis for workload identity in Kubernetes. As long as the issuer +URL can be reached by the cloud provider, this process can take place successfully. + +The reachability of the issuer URL by the cloud provider is where the implementation +of workload identity starts to differ between cloud providers. For example, in GCP +one can configure the content of the JWKS document directly in the GCP IAM console, +which eliminates the need for network calls to the Kubernetes API. In AWS, on the +other hand, this is not possible, the process has to be followed strictly, i.e. the +issuer URL must be reachable by the AWS STS service. + +Furthermore, GKE automatically +creates the necessary trust relationship between the Kubernetes issuer and the GCP +STS service (i.e. automatically injects the JWKS document of the GKE cluster in the +STS database), while in EKS this must be done manually by users (an OIDC provider +must be created for each EKS cluster). + +Another difference is that the issuer URL remains the default/private one in GKE, +while in EKS it is automatically set to a public one. This is done through +the `--service-account-issuer` flag in the `kube-apiserver` command line arguments +([docs](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-issuer-discovery)). This is a nice feature, as it allows external +systems to federate access for workloads running in EKS clusters, e.g. EKS workloads +can have federated access to GCP resources. + +Yet another difference between cloud providers that sheds light in our proposal is +how applications running inside pods from the managed Kubernetes services obtain +the short-lived cloud provider access tokens. In GCP, the GCP libraries used by +the applications attempt to retrieve tokens from the *metadata server*, which is +reachable by all pods running in GKE. This server creates a token for the +`ServiceAccount` of the calling pod in the Kubernetes API, exchanges it for a +short-lived GCP access token, and returns it to the application. In AKS, on the +other hand, pods are mutated to include a +[*token volume projection*](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#serviceaccount-token-volume-projection). The kubelet mounts and automatically +rotates a volume with a token file inside the pod. The Azure libraries used by +the applications then read this file periodically to perform the token exchange +with the Azure STS service. + +Another aspect of workload identity that is important for this RFC is how the cloud +identities are associated with the Kubernetes `ServiceAccounts`. In most cases, an +identity from the IAM service of the cloud provider (e.g. a GCP IAM Service Account, +or an AWS IAM Role) is associated with a Kubernetes `ServiceAccount` by the process +of *impersonation*. Permission to impersonate the cloud identity is granted to the +`ServiceAccount` through a configuration that points to the fully qualified name of +the Kubernetes `ServiceAccount`, i.e. the name and namespace of the `ServiceAccount` +and which cluster it belongs to in the name/address system of the cloud provider. + +Because the cloud provider needs to support this impersonation permission, some +cloud providers go further and even remove the impersonation requirement, by +allowing permissions to be granted directly to `ServiceAccounts` (if it needs to +support granting the impersonation permission, then it can probably also easily +support granting any other permissions depending on the implementation). GCP for +example has implemented this feature [recently](https://cloud.google.com/blog/products/identity-security/make-iam-for-gke-easier-to-use-with-workload-identity-federation), a GCP IAM +Service Account is no longer required for workload identity, i.e. GCP IAM +permissions can now be granted directly to Kubernetes `ServiceAccounts`. This is +a significant improvement in the user experience, as it significantly reduces +the required configuration steps. AWS implemented a similar feature called *EKS +Pod Identity*, but it still requires an IAM Role to be associated with the +`ServiceAccount`. The minor improvement from the user experience perspective is +that this association is implemented entirely in the AWS EKS/IAM APIs, no +annotations are required in the Kubernetes `ServiceAccount`. Another improvement +from this EKS feature compared to *IAM Roles for Service Accounts* is that users +no longer need to create an *OIDC Provider* for the EKS cluster in the IAM API. + +In sight of the technical background presented above, our proposal becomes simpler. +The only solution to support multi-tenant workload identity at the object-level for +the Flux APIs is to associate the Flux objects with Kubernetes `ServiceAccounts`. +We propose to build the `ServiceAccount` token creation and exchange logic into +the Flux controllers through a library in the `github.com/fluxcd/pkg` repository. + +### API Changes + +For all the Flux APIs interacting with cloud providers (except `Kustomization`, +see the paragraph below), we propose introducing the field `spec.serviceAccountName` +(if not already present) for specifying the Kubernetes `ServiceAccount` on the same +namespace of the object that must be used for getting access to the respective cloud +resources. This field would be optional, and when not present the original behavior +would be observed, i.e. the feature only activates when the field is present and a +cloud provider among `aws`, `azure` or `gcp` is specified in the `spec.provider` +field. So if only the `spec.provider` field is present and set to a cloud provider, +then the controller would use single-tenant workload identity as it would prior to +the implementation of this RFC, i.e. it would use its own identity for the operation. + +Note that this RFC does not seek to change the behavior when `spec.provider` is set +to `generic` (or left empty, when it defaults to `generic`), in which case the field +`spec.secretRef` can be used for specifying the Kubernetes `Secret` containing the +credentials (or `spec.serviceAccountName` in the case of the APIs dealing with +container registries, through the `imagePullSecrets` field of the `ServiceAccount`). + +The `Kustomization` API uses Key Management Services (KMS) for decrypting +SOPS-encrypted secrets. We propose adding the dedicated optional field +`spec.decryption.serviceAccountName` for multi-tenant workload identity +when intercting with the KMS service. We choose having a dedicated field +for the `Kustomization` API because the field `spec.serviceAccountName` +already exists and is used for a major part of the functionality which +is authenticating with the Kubernetes API when applying resources. If +we used the same field for both purposes users would be forced to use +multi-tenancy for both cloud and Kubernetes API interactions. Furthermore, +the cloud provider in the `Kustomization` API is detected by the SOPS SDK +itself while decrypting the secrets, so we don't need to introduce a new +field for this purpose. + +### Workload Identity Library + +We propose using the Go package `github.com/fluxcd/pkg/auth` +for implementing a workload identity library that can be +used by all the Flux controllers that need to interact +with cloud providers. This library would be responsible +for creating the `ServiceAccount` tokens in the Kubernetes +API and exchanging them for short-lived access tokens +for the cloud provider. The library would also be responsible +for caching the tokens when configured by users. + +The library should support both single-tenant and multi-tenant workload +identity because single-tenant implementations are already supported in +GA APIs and hence they must remain available for backwards compatibility. +Furthermore, it would be easier to support both use cases in a single +library as opposed to mingling a new library into the currently existing +ones, so this new library becomes the definitive unified solution for +workload identity in Flux. + +The library should automatically detect whether the workload identity +is single-tenant or multi-tenant by checking if a `ServiceAccount` was +configured for the operation. If a `ServiceAccount` was configured, then +the operation is multi-tenant, otherwise it is single-tenant and the +granted access token must represent the identity associated with the +controller. + +The directory structure would look like this: + +```shell +. +└── auth + ├── aws + │ └── aws.go + ├── azure + │ └── azure.go + ├── gcp + │ └── gcp.go + ├── get_token.go + ├── options.go + ├── provider.go + └── token.go +``` + +The file `auth/get_token.go` would contain the main algorithm: + +```go +package auth + +// GetToken returns an access token for accessing resources in the given cloud provider. +func GetToken(ctx context.Context, provider Provider, opts ...Option) (Token, error) { + // 1. Check if a ServiceAccount is configured and return the controller access token if not (single-tenant WI). + // 2. Get the provider audience for creating the OIDC token for the ServiceAccount in the Kubernetes API. + // 3. Get the ServiceAccount using the configured controller-runtime client. + // 4. Get the provider identity from the ServiceAccount annotations and add it to the options. + // 5. Build the cache key using the configured options. + // 6. Get the token from the cache. If present, return it, otherwise continue. + // 7. Create an OIDC token for the ServiceAccount in the Kubernetes API using the provider audience. + // 8. Exchange the OIDC token for an access token through the Security Token Service of the provider. + // 9. If an image repository is configured, exchange the access token for a registry token. + // 10. Add the final token to the cache and return it. +} +``` + +The file `auth/token.go` would contain the token abstractions: + +```go +package auth + +// Token is an interface that represents an access token that can be used to +// authenticate with a cloud provider. The only common method is for getting the +// duration of the token, because different providers have different ways of +// representing the token. For example, Azure and GCP use a single string, +// while AWS uses three strings: access key ID, secret access key and token. +// Consumers of this interface should know what type to cast it to. +type Token interface { + // GetDuration returns the duration for which the token is valid relative to + // approximately time.Now(). This is used to determine when the token should + // be refreshed. + GetDuration() time.Duration +} + +// RegistryCredentials is a particular type implementing the Token interface +// for credentials that can be used to authenticate with a container registry +// from a cloud provider. This type is compatible with all the cloud providers +// and should be returned when the image repository is configured in the options. +type RegistryCredentials struct { + Username string + Password string + ExpiresAt time.Time +} + +func (r *RegistryCredentials) GetDuration() time.Duration { + return time.Until(r.ExpiresAt) +} +``` + +The file `auth/provider.go` would contain the `Provider` interface: + +```go +package auth + +// Provider contains the logic to retrieve an access token for a cloud +// provider from a ServiceAccount (OIDC/JWT) token. +type Provider interface { + // GetName returns the name of the provider. + GetName() string + + // NewDefaultToken returns a token that can be used to authenticate with the + // cloud provider retrieved from the default source, i.e. from the pod's + // environment, e.g. files mounted in the pod, environment variables, + // local metadata services, etc. In this case the method would implicitly + // use the ServiceAccount associated with the controller pod, and not one + // specified in the options. + NewDefaultToken(ctx context.Context, opts ...Option) (Token, error) + + // GetAudience returns the audience the OIDC tokens issued representing + // ServiceAccounts should have. This is usually a string that represents + // the cloud provider's STS service, or some entity in the provider for + // which the OIDC tokens are targeted to. + GetAudience(ctx context.Context, sa corev1.ServiceAccount) (string, error) + + // GetIdentity takes a ServiceAccount and returns the identity which the + // ServiceAccount wants to impersonate, by looking at annotations. + GetIdentity(sa corev1.ServiceAccount) (string, error) + + // NewToken takes a ServiceAccount and its OIDC token and returns a token + // that can be used to authenticate with the cloud provider. The OIDC token is + // the JWT token that was issued for the ServiceAccount by the Kubernetes API. + // The implementation should exchange this token for a cloud provider access + // token through the provider's STS service. + NewTokenForServiceAccount(ctx context.Context, oidcToken string, + sa corev1.ServiceAccount, opts ...Option) (Token, error) + + // GetImageCacheKey extracts the part of the image repository that must be + // included in cache keys when caching registry credentials for the provider. + GetImageCacheKey(imageRepository string) string + + // NewRegistryToken takes an image repository and a Token and returns a token + // that can be used to authenticate with the container registry of the image. + NewRegistryToken(ctx context.Context, imageRepository string, + token Token, opts ...Option) (Token, error) +} +``` + +The file `auth/options.go` would contain the following options: + +```go +package auth + +// Options contains options for configuring the behavior of the provider methods. +// Not all providers/methods support all options. +type Options struct { + ServiceAccount *client.ObjectKey + Client client.Client + Cache *cache.TokenCache + InvolvedObject *cache.InvolvedObject + Scopes []string + ImageRepository string + STSEndpoint string + ProxyURL *url.URL +} + +// WithServiceAccount sets the ServiceAccount reference for the token +// and a controller-runtime client to fetch the ServiceAccount and +// create an OIDC token for it in the Kubernetes API. +func WithServiceAccount(saRef client.ObjectKey, client client.Client) Option { + // ... +} + +// WithCache sets the token cache and the involved object for recording events. +func WithCache(cache cache.TokenCache, involvedObject cache.InvolvedObject) Option { + // ... +} + +// WithScopes sets the scopes for the token. +func WithScopes(scopes ...string) Option { + // ... +} + +// WithImageRepository sets the image repository the token will be used for. +// In most cases container registry credentials require an additional +// token exchange at the end. This option allows the library to implement +// this exchange and cache the final token. +func WithImageRepository(imageRepository string) Option { + // ... +} + +// WithSTSEndpoint sets the endpoint for the STS service. +func WithSTSEndpoint(stsEndpoint string) Option { + // ... +} + +// WithProxyURL sets a *url.URL for an HTTP/S proxy for acquiring the token. +func WithProxyURL(proxyURL url.URL) Option { + // ... +} +``` + +The `auth/aws/aws.go`, `auth/azure/azure.go` and +`auth/gcp/gcp.go` files would contain the implementations for +the respective cloud providers: + +```go +package aws + +import ( + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/credentials" + "github.com/aws/aws-sdk-go-v2/service/sts/types" +) + +const ProviderName = "aws" + +type Provider struct{} + +type Token struct{ types.Credentials } + +// GetDuration implements auth.Token. +func (t *Token) GetDuration() time.Duration { + return time.Until(*t.Expiration) +} + +type credentialsProvider struct { + opts []auth.Option +} + +// NewCredentialsProvider creates an aws.CredentialsProvider for the aws provider. +func NewCredentialsProvider(opts ...auth.Option) aws.CredentialsProvider { + return &credentialsProvider{opts} +} + +// Retrieve implements aws.CredentialsProvider. +func (c *credentialsProvider) Retrieve(ctx context.Context) (aws.Credentials, error) { + // Use auth.GetToken() to get the token. +} +``` + +```go +package azure + +import ( + "github.com/Azure/azure-sdk-for-go/sdk/azcore" + "github.com/Azure/azure-sdk-for-go/sdk/azcore/policy" +) + +const ProviderName = "azure" + +type Provider struct{} + +type Token struct{ azcore.AccessToken } + +// GetDuration implements auth.Token. +func (t *Token) GetDuration() time.Duration { + return time.Until(t.ExpiresOn) +} + +type tokenCredential struct { + opts []auth.Option +} + +// NewTokenCredential creates an azcore.TokenCredential for the azure provider. +func NewTokenCredential(opts ...auth.Option) azcore.TokenCredential { + return &tokenCredential{opts} +} + +// GetToken implements azcore.TokenCredential. +// The options argument is ignored, any options should be +// specified in the constructor. +func (t *tokenCredential) GetToken(ctx context.Context, _ policy.TokenRequestOptions) (azcore.AccessToken, error) { + // Use auth.GetToken() to get the token. +} +``` + +```go +package gcp + +import ( + "golang.org/x/oauth2" +) + +const ProviderName = "gcp" + +type Provider struct {} + +type Token struct{ oauth2.Token } + +// GetDuration implements auth.Token. +func (t *Token) GetDuration() time.Duration { + return time.Until(t.Expiry) +} + +type tokenSource struct { + ctx context.Context + opts []auth.Option +} + +// NewTokenSource creates an oauth2.TokenSource for the gcp provider. +func NewTokenSource(ctx context.Context, opts ...auth.Option) oauth2.TokenSource { + return &tokenSource{ctx, opts} +} + +// Token implements oauth2.TokenSource. +func (t *tokenSource) Token() (*oauth2.Token, error) { + // Use auth.GetToken() to get the token. +} + +var gkeMetadata struct { + projectID string + location string + name string + mu sync.Mutex + loaded bool +} +``` + +As detailed above, each cloud provider implementation defines a simple wrapper +around the cloud provider access token type. This wrapper implements the +`auth.Token` interface, which is essentially the method `GetDuration()` +for the cache library to manage the token lifetime. The wrappers also contain +a helper function to create a token source for the respective cloud provider +SDKs. These methods have different names and signatures because the cloud provider +SDKs are different and have different types, but they all implement the same +concept of a token source. + +The `aws` provider needs to read the environment variable `AWS_REGION` for +configuring the STS client. Even though a specific STS endpoint may be +configured, the AWS SDKs require the region to be set regardless. This +variable is usually set automatically in EKS pods, and can be manually set +by users otherwise (e.g. in Fargate pods). + +An important detail to take into account in the `azure` provider implementation +is using our custom implementation of `azidentity.NewDefaultAzureCredential()` +found in kustomize-controller for SOPS decryption. This custom implementation +avoids shelling out to the Azure CLI, which is something we strive to avoid in +the Flux codebase. This is important because today we are doing this in a few +APIs but not others, so it will be a significant improvement to implement this +in a single place and use it everywhere. + +The `gcp` provider needs to load the cluster metadata from the `gke-metadata-server` +in order to create tokens. This must be done lazily when the first token is +requested, and there's a very important reason for this: if this was done on +the controller startup, the controller would crash when running outside GKE and +enter `CrashLoopBackOff` because the `gke-metadata-server` would never be +available. This is a very important detail that must be taken into account when +implementing the `gcp` provider. The cluster metadata doesn't change during the +lifetime of the controller pod, so we use a `sync.Mutex` and `bool` to load it +only once into a package variable. + +#### Cache Key + +The cache key must include the following components: + +* The cloud provider name. +* The provider audience used for issuing the Kubernetes `ServiceAccount` token. +* The optional `ServiceAccount` reference and cloud provider identity. + The identity is the string representing the identity which the `ServiceAccount` + is impersonating, e.g. for `gcp` this would be a GCP IAM Service Account email, + for `aws` this would be an AWS IAM Role ARN, etc. When there is no identity + configured for impersonation, only the `ServiceAccount` reference is included. +* The optional scopes added to the token. +* The cache key extracted from the optional image repository. +* The optional STS endpoint used for issuing the token. +* The optional proxy URL when the STS endpoint is present. + +##### Justification + +When single-tenant workload identity is being used, the identity associated with +the controller is the one represented by the token, so there is no identity or +`ServiceAccount` to identify in the cache key besides the implicit ones associated +with the controller. In this case, including only the cloud provider name in the +cache key is enough. + +The provider audience used for issuing the `ServiceAccount` token is included +in the cache key because it may depend on the `ServiceAccount` annotations. +For example, in AWS if an IAM Role ARN is not specified we assume that users +are attempting to use EKS Pod Identity instead of IAM Roles for Service +Accounts. Each feature has its own audience string and its own way of issuing +tokens, so the audience string must be included in the cache key. + +In multi-tenant workload identity, the reason for including both the `ServiceAccount` +and the identity in the cache key is to establish the fact that the `ServiceAccount` +had permission to impersonate the identity at the time when the token was issued. +This is very important. For the sake of the argument, suppose we include only the +identity. Then a malicious actor could specify any identity in their `ServiceAccount` +and get a token cached for that identity even if their `ServiceAccount` did not have +permission to impersonate that identity. We also need to include the identity in the +cache key because, otherwise, if including only the `ServiceAccount`, changes to the +`ServiceAccount` annotations to impersonate a different identity would not cause a +new token impersonating the new identity to be created since the cache key did not +change. + +In most cases container registry credentials require an additional token exchange +at the end. In order to benefit from caching the final token and freeing the +library consumers from this responsibility, we allow an image repository to +be included in the options and implement the exchange. Depending on the cloud +provider, a part of the image repository string is extracted and used to issue +the token, e.g. for ECR the region is extracted and used to configure the client, +and in the case of ACR the registry host is included in the resulting token. +Those parts of the image repository must be included in the cache key. This is +accomplished by the `Provider.GetImageCacheKey()` method. In the case of GCP +container registries the image repository does not influence how the token is +issued. + +The scopes are included in the cache key because they delimit the permissions that +the token has. They don't *grant* the permissions, they just set an upper bound for +the permissions that the token can have. Providers requiring scopes unfortunately +benefit less from caching, e.g. a token issued for an Azure identity can't be +seamlessly used for both Azure DevOps and the Azure Container Registry, because the +respective scopes are different, so the issued tokens are different. + +The STS endpoint and proxy URL are included in the cache key because they could +influence how the token is fetched and ultimately issued. The proxy URL is included +only when the STS endpoint is present, because all the default STS endpoints are +HTTPS and belong to cloud providers, so they are all well-known, unique, and the +proxy is guaranteed not to tamper with the issuance of the token since it only +sees an opaque TLS session passing through. + +##### Format + +The cache key would be the SHA256 hash of the following string (breaking lines +after commas for readability): + +Single-tenant/controller-level: + +``` +provider=, +scopes=, +imageRepositoryKey=<'gcp'-for-gcp|registry-region-for-aws|registry-host-for-azure>, +stsEndpoint=, +proxyURL= +``` + +Multi-tenant/object-level: + +``` +provider=, +providerAudience=, +serviceAccountName=, +serviceAccountNamespace=, +cloudProviderIdentity=, +scopes=, +imageRepositoryKey=<'gcp'-for-gcp|registry-region-for-aws|registry-host-for-azure>, +stsEndpoint=, +proxyURL= +``` + +##### Security Considerations and Controls + +As mentioned previously, a `ServiceAccount` must have permission to impersonate the +identity it is configured to impersonate. Once a token for the impersonated identity +is issued, that token would be valid for a while even if immediately after issuing it +the `ServiceAccount` loses permission to impersonate that identity. In our cache key +design, the token would remain available for the `ServiceAccount` to use until it +expires. If the impersonation permission was revoked to mitigate an attack, the +attacker could still get a valid token from the cache for a while after the +revocation, and hence still exercise the permissions they had prior to the revocation. + +There are a few mitigations for this scenario: + +* Users that revoke impersonation permissions for a `ServiceAccount` must also + change the annotations of the `ServiceAccount` to impersonate a different identity, + or delete the `ServiceAccount` altogether, or restart the Flux controllers so the + cache is purged. Any of these actions would effectively prevent the attack, but + they represent an additional step after revoking the impersonation permission. + +* In the Flux controllers users can specify the `--token-cache-max-duration` flag, + which can be used to limit the maximum duration for which a token can be cached. + By reducing the default maximum duration of one hour to a smaller value, users can + limit the time window during which a token would be available for a `ServiceAccount` + to use after losing permission to impersonate the identity. + +* Disable cache entirely by setting the flag `--token-cache-max-size=0`, or removing + this flag altogether since the default is already zero i.e. no tokens are cached + in the Flux controller. This mitigation is in case your security requirements are + extreme and you want to avoid any risk of such an attack. This mitigation is the + most effective, but it comes with the cost of many API calls to issue tokens in + the cloud provider, which could result in a performance bottleneck and/or + throttling/rate-limiting, as tokens would have to be issued for every + reconciliation. + +A similar situation could occur in the single-tenant scenario, when the permission +to impersonate the configured identity is revoked from the controller `ServiceAccount`. +In this case, the attacker would have access to the cloud provider resources that +the controller had access to prior to the revocation of the impersonation permission. +Most of the mitigations mentioned above apply to this scenario as well, except for +the one that involves changing the annotations of the `ServiceAccount` to impersonate +a different identity or deleting the `ServiceAccount` altogether, as the controller +`ServiceAccount` should not be deleted. The best mitigation in this case is to restart +the Flux controllers so the cache is purged. + +**EKS Pod Identity**: In EKS Pod Identity the association between a `ServiceAccount` +and an IAM Role is not configured on the `ServiceAccount` annotations, nor anywhere +else inside the Kubernetes cluster. The association is established entirely through +the EKS/IAM APIs. In this case, all the mitigations mentioned above apply, except +for the one that involves changing the annotations of the `ServiceAccount`, as there +are no annotations to change. + +### Library Integration + +When reconciling an object, the controller must use the `auth.GetToken()` +function passing a `controller-runtime` client that has permission to create +`ServiceAccount` tokens in the Kubernetes API, the desired cloud provider by name, +and all the remaining options according to the configuration of the controller and +of the object. The provider names match the ones used for `spec.provider` in the Flux +APIs, i.e. `aws`, `azure` and `gcp`. + +Because different cloud providers have different ways of representing their access +tokens (e.g. Azure and GCP tokens are a single opaque string while AWS has three +strings: access key ID, secret access key and token), consumers of the +`auth.Token` interface would need to cast it to `*.Token`. + +The following subsections show details of how the integration would look like. + +#### `GitRepository` and `ImageUpdateAutomation` APIs + +For these APIs the only provider we have so far that supports workload identity +is `azure`. In this case we would simply replace `AzureOpts []azure.OptFunc` in +the `fluxcd/pkg/git.ProviderOptions` struct with `[]fluxcd/pkg/auth.Option` +and would modify `fluxcd/pkg/git.GetCredentials()` to use `auth.GetToken()`. +The token interface would be cast to `*azure.Token` and the token string would be +assigned to `fluxcd/pkg/git.Credentials.BearerToken`. A `GitRepository` object +configured with the `azure` provider and a `ServiceAccount` would then go through +this code path. + +#### `OCIRepository`, `ImageRepository`, `HelmRepository` and `HelmChart` APIs + +The `HelmRepository` API only supports a cloud provider for OCI repositories, so +for all these APIs we would only need to support OCI authentication. + +All these APIs currently use `*fluxcd/pkg/oci/auth/login.Manager` to get the +container registry credentials. The new library would replace this library +entirely, as it mostly handles single-tenant workload identity. The new library +covers both single-tenant and multi-tenant workload identity, so it would be +a drop-in replacement for the `login.Manager`. + +In the case of the source-controller APIs, all of them use the function `OIDCAuth()` +from the internal package `internal/oci`. We would replace the use of `login.Manager` +with `auth.GetToken()` in this function. The token interface would +be cast to `*auth.RegistryCredentials` and then fed to `authn.FromConfig()` +from the package `github.com/google/go-containerregistry/pkg/authn`. + +In the case of `ImageRepository`, we would replace `login.Manager` with +`auth.GetToken()` in the `setAuthOptions()` method of the +`ImageRepositoryReconciler`, cast the token to `*auth.RegistryCredentials` +and then feed it to `authn.FromConfig()`. + +The beauty of this particular integration is that here we no longer require +branching code paths for each cloud provider, we would just need to configure +the options for the `auth.GetToken()` function and the library would take +care of the rest. + +#### `Bucket` API + +##### Provider `aws` + +A `Bucket` object configured with the `aws` provider and a `ServiceAccount` would +cause the internal `minio.MinioClient` of source-controller to be created with the +following new options: + +* `minio.WithTokenClient(controller-runtime/pkg/client.Client)` +* `minio.WithTokenCache(*fluxcd/pkg/cache.TokenCache)` + +The constructor would then use `auth.GetToken()` to get the +cloud provider access token. When doing so, the `minio.MinioClient` would +cast the token interface to `*aws.Token` and feed it to `credentials.NewStatic()` +from the package `github.com/minio/minio-go/v7/pkg/credentials`. + +##### Provider `azure` + +A `Bucket` object configured with the `azure` provider and a `ServiceAccount` +would cause the internal `azure.BlobClient` of source-controller to be created +with the following new options: + +* `azure.WithTokenClient(controller-runtime/pkg/client.Client)` +* `azure.WithTokenCache(*fluxcd/pkg/cache.TokenCache)` +* `azure.WithServiceAccount(controller-runtime/pkg/client.ObjectKey)` +* `azure.WithInvolvedObject(*fluxcd/pkg/cache.InvolvedObject)` + +The constructor would then use `azure.NewTokenCredential()` to feed this +token credential to `azblob.NewClient()`. + +##### Provider `gcp` + +A `Bucket` object configured with the `gcp` provider and a `ServiceAccount` +would cause the internal `gcp.GCSClient` of source-controller to be created +with the following new options: + +* `gcp.WithTokenClient(controller-runtime/pkg/client.Client)` +* `gcp.WithTokenCache(*fluxcd/pkg/cache.TokenCache)` +* `gcp.WithServiceAccount(controller-runtime/pkg/client.ObjectKey)` +* `gcp.WithInvolvedObject(*fluxcd/pkg/cache.InvolvedObject)` + +The constructor would then use `gcp.NewTokenSource()` to feed this token +source to the `option.WithTokenSource()` and pass it to +`cloud.google.com/go/storage.NewClient()`. + +#### `Kustomization` API + +The `Kustomization` API uses Key Management Services (KMS) for decrypting +SOPS secrets. The internal packages `internal/decryptor` and `internal/sops` +of kustomize-controller already use interfaces compatible with the new +library in the case of `aws` and `azure`, i.e. `*awskms.CredentialsProvider` +and `*azkv.TokenCredential` respectively, so we could easily use the helper +functions for creating the respective token sources to configure the KMS +credentials for SOPS. This is thanks to the respective SOPS libraries +`github.com/getsops/sops/v3/kms` and `github.com/getsops/sops/v3/azkv`. +For GCP we can introduce the equivalent interface that was recently added +in [this](https://github.com/getsops/sops/pull/1794/files) pull request. +This new interface introduced in SOPS upstream can also be used for the +current JSON credentials method that we use via +`google.CredentialsFromJSON().TokenSource`. This would allow us to use only +the respective token source interfaces for all three providers when using +either workload identity or secrets. + +#### `Provider` API + +The constructor of the internal `notifier.Factory` of notification-controller +would now accept the following new options: + +* `notifier.WithTokenClient(controller-runtime/pkg/client.Client)` +* `notifier.WithTokenCache(*fluxcd/pkg/cache.TokenCache)` +* `notifier.WithServiceAccount(controller-runtime/pkg/client.ObjectKey)` +* `notifier.WithInvolvedObject(*fluxcd/pkg/cache.InvolvedObject)` + +The cloud provider types that support workload identity would then use these +options. See the following subsections for details. + +##### Type `azuredevops` + +The `notifier.NewAzureDevOps()` constructor would use the existing and new +options to call `auth.GetToken()` and use it to get the cloud +provider access token. When doing so, the `notifier.AzureDevOps` would cast +the token interface to `*azure.Token` and feed the token string to +`NewPatConnection()` from the package +`github.com/microsoft/azure-devops-go-api/azuredevops/v6`. + +##### Type `azureeventhub` + +The `notifier.NewAzureEventHub()` constructor would use the existing and new +options to call `auth.GetToken()` and use it to get the cloud +provider access token. When doing so, the `notifier.AzureEventHub` would cast +the token interface to `*azure.Token` and feed the token string to `newJWTHub()`. + +##### Type `googlepubsub` + +The `notifier.NewGooglePubSub()` constructor would use the existing and new +options to call `gcp.NewTokenSource()` and feed this token source to the +`option.WithTokenSource()` and pass it to `cloud.google.com/go/pubsub.NewClient()`. + +## Implementation History + +A realistic estimate for implementing this proposal would be from two to +three Flux minor releases. This is so we can work on more pressing priorities +while still making progress towards this milestone. The implementation of +the core library would be done in the first release, and the integration +with the Flux APIs would be spread across all these releases. All the three +cloud providers should be implemented for each API getting this feature in +any given release. Our first priority should be `Kustomization`, as it is +where security is most important since it deals with secrets. + +