flux2/tests/integration/README.md

16 KiB

E2E Tests

The goal is to verify that Flux integration with cloud providers are actually working now and in the future. Currently, we only have tests for Azure and GCP.

General requirements

These CLI tools need to be installed for each of the tests to run successfully.

  • Docker CLI for registry login.
  • SOPS CLI for encrypting files
  • kubectl for applying certain install manifests.

Azure

Architecture

The azure Terraform creates the AKS cluster and related resources to run the tests. It creates:

  • An Azure Container Registry
  • An Azure Kubernetes Cluster
  • Two Azure DevOps repositories
  • Azure EventHub for sending notifications
  • An Azure Key Vault

Requirements

  • Azure account with an active subscription to be able to create AKS and ACR, and permission to assign roles. Role assignment is required for allowing AKS workloads to access ACR.
  • Azure CLI, need to be logged in using az login as a User or as a Service Principal
  • An Azure DevOps organization, personal access token and ssh keys for accessing repositories within the organization. The scope required for the personal access token is:
    • Project and Team - read, write and manage access
    • Code - Full
    • Please take a look at the terraform provider for more explanation.
    • Azure DevOps only supports RSA keys. Please see documentation for how to set up SSH key authentication.
    • When using in CI, create a test user and use the test user's PAT and SSH key for all Azure DevOps interactions. To grant the test user access in Azure DevOps:
      • Go to Organization Settings on the sidebar of the organization page.
      • Under General > Users, click on Add User and input the user's email, select Access Level of Basic.
      • Go to Security > Permissions, click on the User tab.
      • For the invited user, set the following permissions to Allow:
        • General: Create new project.
      • The user will get an email invitation and would need to create a Microsoft account if they don't have one yet.

NOTE: To use Service Principal (for example in CI environment), set the ARM-* variables in .env, source it and authenticate Azure CLI with:

$ az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID

Permissions

Following permissions are needed for provisioning the infrastructure and running the tests:

  • Microsoft.Kubernetes/*
  • Microsoft.Resources/*
  • Microsoft.Authorization/roleAssignments/{Read,Write,Delete}
  • Microsoft.ContainerRegistry/*
  • Microsoft.ContainerService/*
  • Microsoft.KeyVault/*
  • Microsoft.EventHub/*

IAM and CI setup

To create the necessary IAM role with all the permissions, set up CI secrets and variables using azure-gh-actions use the terraform configuration below. Please make sure all the requirements of azure-gh-actions are followed before running it.

NOTE: When running the following for a repo under an organization, set the environment variable GITHUB_ORGANIZATION if setting the owner in the github provider doesn't work.

provider "github" {
  owner = "fluxcd"
}

resource "tls_private_key" "privatekey" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

module "azure_gh_actions" {
  source = "git::https://github.com/fluxcd/test-infra.git//tf-modules/azure/github-actions"

  azure_owners          = ["owner-id-1", "owner-id-2"]
  azure_app_name        = "flux2-e2e"
  azure_app_description = "flux2 e2e"
  azure_app_secret_name = "flux2-e2e"
  azure_permissions = [
    "Microsoft.Kubernetes/*",
    "Microsoft.Resources/*",
    "Microsoft.Authorization/roleAssignments/Read",
    "Microsoft.Authorization/roleAssignments/Write",
    "Microsoft.Authorization/roleAssignments/Delete",
    "Microsoft.ContainerRegistry/*",
    "Microsoft.ContainerService/*",
    "Microsoft.KeyVault/*",
    "Microsoft.EventHub/*"
  ]
  azure_location = "eastus"

  github_project = "flux2"

  github_secret_client_id_name       = "AZ_ARM_CLIENT_ID"
  github_secret_client_secret_name   = "AZ_ARM_CLIENT_SECRET"
  github_secret_subscription_id_name = "AZ_ARM_SUBSCRIPTION_ID"
  github_secret_tenant_id_name       = "AZ_ARM_TENANT_ID"

  github_secret_custom = {
    "TF_VAR_azuredevops_org"         = "<azuredevops-org-name>",
    "TF_VAR_azuredevops_pat"         = "<azuredevops-pat>",
    "AZURE_GITREPO_SSH_CONTENTS"     = base64encode(tls_private_key.privatekey.private_key_openssh),
    "AZURE_GITREPO_SSH_PUB_CONTENTS" = base64encode(tls_private_key.privatekey.public_key_openssh)
  }
}

output "publickey" {
  value = tls_private_key.privatekey.public_key_openssh
}

Copy the publickey output printed after applying, or run terraform output to print it again, and add it in the Azure DevOps SSH public keys under the user account that'll be used by flux in the tests.

NOTE: The environment variables used above are for the GitHub workflow that runs the tests. Change the variable names if needed accordingly.

GCP

Architecture

The gcp terraform files create the GKE cluster and related resources to run the tests. It creates:

  • A Google Container Registry and Artifact Registry
  • A Google Kubernetes Cluster
  • Two Google Cloud Source Repositories
  • A Google Pub/Sub Topic and a subscription to the service that would be used in the tests

Note: It doesn't create Google KMS keyrings and crypto keys because these cannot be destroyed. Instead, you have to pass in the crypto key and keyring that would be used to test the sops encryption in Flux. Please see .env.sample for the terraform variables

Requirements

  • GCP account with an active project to be able to create GKE and GCR, and permission to assign roles.

  • Existing GCP KMS keyring and crypto key.

  • gcloud CLI, need to be logged in using gcloud auth login as a User (not a Service Account), configure application default credentials with gcloud auth application-default login and docker credential helper with gcloud auth configure-docker.

    NOTE: To use Service Account (for example in CI environment), set GOOGLE_APPLICATION_CREDENTIALS variable in .env with the path to the JSON key file, source it and authenticate gcloud CLI with:

    $ gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
    

    Depending on the Container/Artifact Registry host used in the test, authenticate docker accordingly

    $ gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://us-central1-docker.pkg.dev
    

    In this case, the GCP client in terraform uses the Service Account to authenticate and the gcloud CLI is used only to authenticate with Google Container Registry and Google Artifact Registry.

    NOTE FOR CI USAGE: When saving the JSON key file as a CI secret, compress the file content with

    $ cat key.json | jq -r tostring
    

    to prevent aggressive masking in the logs. Refer aggressive replacement in logs for more details.

  • Register SSH Keys with Google Cloud

    • Google Cloud supports these three SSH key types: RSA (only for keys with more than 2048 bits), ECDSA and ED25519.
    • The SSH user doesn't have to be a member of the GCP project. The terraform setup will grant the user permissions to the repository. Visit https://source.cloud.google.com, login or create a GCP account with the SSH user's email address and add SSH keys in the account. Set this email as the value for the environment variable TF_VAR_gcp_email in .env file to be used as a terraform variable.

    Note: Google doesn't allow a SSH key to be associated with a service account email address. Therefore, there has to be an actual user that the SSH key is registered to.

Permissions

Following roles are needed for provisioning the infrastructure and running the tests:

  • Compute Instance Admin (v1) - roles/compute.instanceAdmin.v1
  • Kubernetes Engine Admin - roles/container.admin
  • Service Account User - roles/iam.serviceAccountUser
  • Service Account Token Creator - roles/iam.serviceAccountTokenCreator
  • Artifact Registry Administrator - roles/artifactregistry.admin
  • Artifact Registry Repository Administrator - roles/artifactregistry.repoAdmin
  • Cloud KMS Admin - roles/cloudkms.admin
  • Cloud KMS CryptoKey Encrypter - roles/cloudkms.cryptoKeyEncrypter
  • Source Repository Administrator - roles/source.admin
  • Pub/Sub Admin - roles/pubsub.admin

IAM and CI setup

To create the necessary IAM role with all the permissions, set up CI secrets and variables using gcp-gh-actions use the terraform configuration below. Please make sure all the requirements of gcp-gh-actions are followed before running it.

NOTE: When running the following for a repo under an organization, set the environment variable GITHUB_ORGANIZATION if setting the owner in the github provider doesn't work.

provider "google" {}

provider "github" {
  owner = "fluxcd"
}

resource "tls_private_key" "privatekey" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

module "gcp_gh_actions" {
  source = "git::https://github.com/fluxcd/test-infra.git//tf-modules/gcp/github-actions"

  gcp_service_account_id          = "flux2-e2e-test"
  gcp_service_account_name        = "flux2-e2e-test"
  gcp_service_account_description = "For running fluxcd/flux2 e2e tests."
  gcp_roles = [
    "roles/compute.instanceAdmin.v1",
    "roles/container.admin",
    "roles/iam.serviceAccountUser",
    "roles/iam.serviceAccountTokenCreator",
    "roles/artifactregistry.admin",
    "roles/artifactregistry.repoAdmin",
    "roles/cloudkms.admin",
    "roles/cloudkms.cryptoKeyEncrypter",
    "roles/source.admin",
    "roles/pubsub.admin"
  ]

  github_project = "flux2"

  github_secret_credentials_name = "FLUX2_E2E_GOOGLE_CREDENTIALS"

  github_secret_custom = {
    "TF_VAR_gcp_keyring"           = "<keyring-name>",
    "TF_VAR_gcp_crypto_key"        = "<key-name>",
    "TF_VAR_gcp_email"             = "<email>",
    "GCP_GITREPO_SSH_CONTENTS"     = base64encode(tls_private_key.privatekey.private_key_openssh),
    "GCP_GITREPO_SSH_PUB_CONTENTS" = base64encode(tls_private_key.privatekey.public_key_openssh)
  }
}

output "publickey" {
  value = tls_private_key.privatekey.public_key_openssh
}

Copy the publickey output printed after applying, or run terraform output to print it again, and add it in the Google Source Repository SSH public keys under the user account with email address referred in TF_VAR_gcp_email above.

NOTE: The environment variables used above are for the GitHub workflow that runs the tests. Change the variable names if needed accordingly.

Tests

Each test run is initiated by running terraform apply in the provider's terraform directory e.g terraform apply, it does this by using the tftestenv package within the fluxcd/test-infra repository. It then reads the output of the Terraform to get information needed for the tests like the kubernetes client ID, the cloud repository urls, the key vault ID etc. This means that a lot of the communication with the cloud provider API is offset to Terraform instead of requiring it to be implemented in the test.

The following tests are currently implemented:

  • Flux can be successfully installed on the cluster using the Flux CLI
  • source-controller can clone cloud provider repositories (Azure DevOps, Google Cloud Source Repositories) (https+ssh)
  • image-reflector-controller can list tags from provider container Registry image repositories
  • kustomize-controller can decrypt secrets using SOPS and provider key vault
  • image-automation-controller can create branches and push to cloud repositories (https+ssh)
  • source-controller can pull charts from cloud provider container registry Helm repositories
  • notification-controller can forward events to cloud Events Service(EventHub for Azure and Google Pub/Sub)

The following tests are run only for Azure since it is supported in the notification-controller:

  • notification-controller can send commit status to Azure DevOps

Running tests locally

  1. Ensure that you have the Flux CLI binary that is to be tested built and ready. You can build it by running make build at the root of this repository. The binary is located at ./bin directory at the root and by default this is where the Makefile copies the binary for the tests from. If you have it in a different location, you can set it with the FLUX_BINARY variable
  2. Copy .env.sample to .env and add the values for the different variables for the provider that you are running the tests for.
  3. Run make test-<provider>, setting the location of the flux binary with FLUX_BINARY variable
$ make test-azure
make test PROVIDER_ARG="-provider azure"
# These two versions of podinfo are pushed to the cloud registry and used in tests for ImageUpdateAutomation
mkdir -p build
cp ../../bin/flux build/flux
docker pull ghcr.io/stefanprodan/podinfo:6.0.0
6.0.0: Pulling from stefanprodan/podinfo
Digest: sha256:e7eeab287181791d36c82c904206a845e30557c3a4a66a8143fa1a15655dae97
Status: Image is up to date for ghcr.io/stefanprodan/podinfo:6.0.0
ghcr.io/stefanprodan/podinfo:6.0.0
docker pull ghcr.io/stefanprodan/podinfo:6.0.1
6.0.1: Pulling from stefanprodan/podinfo
Digest: sha256:1169f220a670cf640e45e1a7ac42dc381a441e9d4b7396432cadb75beb5b5d68
Status: Image is up to date for ghcr.io/stefanprodan/podinfo:6.0.1
ghcr.io/stefanprodan/podinfo:6.0.1
go test -timeout 60m -v ./ -existing -provider azure --tags=integration
2023/03/24 02:32:25 Setting up azure e2e test infrastructure
2023/03/24 02:32:25 Terraform binary:  /usr/local/bin/terraform
2023/03/24 02:32:25 Init Terraform
....[some output has been cut out]
2023/03/24 02:39:33 helm repository condition not ready
--- PASS: TestACRHelmRelease (15.31s)
=== RUN   TestKeyVaultSops
--- PASS: TestKeyVaultSops (15.98s)
PASS
2023/03/24 02:40:12 Destroying environment...
ok      github.com/fluxcd/flux2/tests/integration       947.341s

In the above, the test created a build directory build/ and the flux cli binary is copied build/flux. It would be used to bootstrap Flux on the cluster. You can configure the location of the Flux CLI binary by setting the FLUX_BINARY variable. We also pull two version of ghcr.io/stefanprodan/podinfo image. These images are pushed to the cloud provider's Container Registry and used to test ImageRepository and ImageUpdateAutomation. The terraform resources get created and the tests are run.

If not configured explicitly to retain the infrastructure, at the end of the test, the test infrastructure is deleted. In case of any failure due to which the resources don't get deleted, the make destroy-* commands can be run for the respective provider. This will run terraform destroy in the respective provider's terraform configuration directory. This can be used to quickly destroy the infrastructure without going through the provision-test-destroy steps.

Debugging the tests

For debugging environment provisioning, enable verbose output with -verbose test flag.

make test-azure GO_TEST_ARGS="-verbose"

The test environment is destroyed at the end by default. Run the tests with -retain flag to retain the created test infrastructure.

make test-azure GO_TEST_ARGS="-retain"

The tests require the infrastructure state to be clean. For re-running the tests with a retained infrastructure, set -existing flag.

make test-azure GO_TEST_ARGS="-retain -existing"

To delete an existing infrastructure created with -retain flag:

make test-azure GO_TEST_ARGS="-existing"

To debug issues on the cluster created by the test (provided you passed in the -retain flag):

export KUBECONFIG=./build/kubeconfig
kubectl get pods