mirror of https://github.com/fluxcd/flux2.git
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
356 lines
12 KiB
Markdown
356 lines
12 KiB
Markdown
# Setup Notifications
|
|
|
|
When operating a cluster, different teams may wish to receive notifications about
|
|
the status of their GitOps pipelines.
|
|
For example, the on-call team would receive alerts about reconciliation
|
|
failures in the cluster, while the dev team may wish to be alerted when a new version
|
|
of an app was deployed and if the deployment is healthy.
|
|
|
|
## Prerequisites
|
|
|
|
To follow this guide you'll need a Kubernetes cluster with the GitOps
|
|
toolkit controllers installed on it.
|
|
Please see the [get started guide](../get-started/index.md)
|
|
or the [installation guide](installation.md).
|
|
|
|
The GitOps toolkit controllers emit Kubernetes events whenever a resource status changes.
|
|
You can use the [notification-controller](../components/notification/controller.md)
|
|
to forward these events to Slack, Microsoft Teams, Discord or Rocket chart.
|
|
The notification controller is part of the default toolkit installation.
|
|
|
|
## Define a provider
|
|
|
|
First create a secret with your Slack incoming webhook:
|
|
|
|
```sh
|
|
kubectl -n flux-system create secret generic slack-url \
|
|
--from-literal=address=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
|
|
```
|
|
|
|
Note that the secret must contain an `address` field,
|
|
it can be a Slack, Microsoft Teams, Discord or Rocket webhook URL.
|
|
|
|
Create a notification provider for Slack by referencing the above secret:
|
|
|
|
```yaml
|
|
apiVersion: notification.toolkit.fluxcd.io/v1beta1
|
|
kind: Provider
|
|
metadata:
|
|
name: slack
|
|
namespace: flux-system
|
|
spec:
|
|
type: slack
|
|
channel: general
|
|
secretRef:
|
|
name: slack-url
|
|
```
|
|
|
|
The provider type can be `slack`, `msteams`, `discord`, `rocket`, `github`, `gitlab` or `generic`.
|
|
|
|
When type `generic` is specified, the notification controller will post the incoming
|
|
[event](../components/notification/event.md) in JSON format to the webhook address.
|
|
This way you can create custom handlers that can store the events in
|
|
Elasticsearch, CloudWatch, Stackdriver, etc.
|
|
|
|
## Define an alert
|
|
|
|
Create an alert definition for all repositories and kustomizations:
|
|
|
|
```yaml
|
|
apiVersion: notification.toolkit.fluxcd.io/v1beta1
|
|
kind: Alert
|
|
metadata:
|
|
name: on-call-webapp
|
|
namespace: flux-system
|
|
spec:
|
|
providerRef:
|
|
name: slack
|
|
eventSeverity: info
|
|
eventSources:
|
|
- kind: GitRepository
|
|
name: '*'
|
|
- kind: Kustomization
|
|
name: '*'
|
|
```
|
|
|
|
Apply the above files or commit them to the `fleet-infra` repository.
|
|
|
|
To verify that the alert has been acknowledge by the notification controller do:
|
|
|
|
```console
|
|
$ kubectl -n flux-system get alerts
|
|
|
|
NAME READY STATUS AGE
|
|
on-call-webapp True Initialized 1m
|
|
```
|
|
|
|
Multiple alerts can be used to send notifications to different channels or Slack organizations.
|
|
|
|
The event severity can be set to `info` or `error`.
|
|
When the severity is set to `error`, the kustomize controller will alert on any error
|
|
encountered during the reconciliation process.
|
|
This includes kustomize build and validation errors,
|
|
apply errors and health check failures.
|
|
|
|
![error alert](../_files/slack-error-alert.png)
|
|
|
|
When the verbosity is set to `info`, the controller will alert if:
|
|
|
|
* a Kubernetes object was created, updated or deleted
|
|
* heath checks are passing
|
|
* a dependency is delaying the execution
|
|
* an error occurs
|
|
|
|
![info alert](../_files/slack-info-alert.png)
|
|
|
|
## Git commit status
|
|
|
|
The GitHub, GitLab, Bitbucket, and Azure DevOps providers are slightly different to the other providers. Instead of
|
|
a stateless stream of events, the git notification providers will link the event with accompanying git commit which
|
|
triggered the event. The linking is done by updating the commit status of a specific commit.
|
|
|
|
- [GitHub](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks)
|
|
- [GitLab](https://docs.gitlab.com/ee/api/commits.html)
|
|
- [Bitbucket](https://developer.atlassian.com/server/bitbucket/how-tos/updating-build-status-for-commits/)
|
|
- [Azure DevOps](https://docs.microsoft.com/en-us/rest/api/azure/devops/git/statuses?view=azure-devops-rest-6.0)
|
|
|
|
In GitHub the commit status set by notification-controller will result in a green checkmark or red cross next to the commit hash.
|
|
Clicking the icon will show more detailed information about the status.
|
|
![commit status GitHub overview](../_files/commit-status-github-overview.png)
|
|
|
|
Receiving an event in the form of a commit status rather than a message in a chat conversation has the benefit
|
|
that it closes the deployment loop giving quick and visible feedback if a commit has reconciled and if it succeeded.
|
|
This means that a deployment will work in a similar manner that people are used to with "traditional" push based CD pipelines.
|
|
Additionally the status can be fetched from the git providers API for a specific commit. Allowing for custom automation tools
|
|
that can automatically promote, commit to a new directory, after receiving a successful commit status. This can all be
|
|
done without requiring any access to the Kubernetes cluster.
|
|
|
|
As stated before the provider works by referencing the same git repository as the Kustomization controller does.
|
|
When a new commit is pushed to the repository, source-controller will sync the commit, triggering the kustomize-controller
|
|
to reconcile the new commit. After this is done the kustomize-controller sends an event to the notification-controller
|
|
with the result and the commit hash it reconciled. Then notification-controller can update the correct commit and repository
|
|
when receiving the event.
|
|
![commit status flow](../_files/commit-status-flow.png)
|
|
|
|
!!! hint "Limitations"
|
|
The git notification providers require that a commit hash present in the meta data
|
|
of the event. There for the the providers will only work with `Kustomization` as an
|
|
event source, as it is the only resource which includes this data.
|
|
|
|
First follow the [get started guide](../../get-started) if you do not have a Kubernetes cluster with Flux installed in it.
|
|
You will need a authentication token to communicate with the API. The authentication method depends on
|
|
the git provider used, refer to the [Provider CRD](../../components/notification/provider/#git-commit-status)
|
|
for details about how to get the correct token. The guide will use GitHub, but the other providers will work in a very similar manner.
|
|
The token will need to have write access to the repository it is going to update the commit status in.
|
|
Store the generated token in a Secret with the following data format in the cluster.
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: github
|
|
namespace: flux-system
|
|
data:
|
|
token: <token>
|
|
```
|
|
|
|
When sending notification events the kustomization-controller will include the commit hash related to the event.
|
|
Note that the commit hash in the event does not come from the git repository the `Kustomization` resource
|
|
comes from but rather the kustomization source ref. This mean that commit status notifications will not work
|
|
if the manifests comes from a repository which the API token is not allowed to write to.
|
|
|
|
Copy the manifest content in the "[kustomize](https://github.com/stefanprodan/podinfo/tree/master/kustomize)" directory
|
|
into the directory "staging-cluster/flux-system/podinfo" in your fleet-infra repository. Make sure that you also add the
|
|
namepsace podinfo.
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: podinfo
|
|
```
|
|
|
|
Then create a Kustomization to deploy podinfo.
|
|
```yaml
|
|
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
|
|
kind: Kustomization
|
|
metadata:
|
|
name: podinfo
|
|
namespace: flux-system
|
|
spec:
|
|
interval: 5m
|
|
targetNamespace: podinfo
|
|
path: ./staging-cluster/podinfo
|
|
prune: true
|
|
sourceRef:
|
|
kind: GitRepository
|
|
name: flux-system
|
|
healthChecks:
|
|
- apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: podinfo
|
|
namespace: podinfo
|
|
timeout: 1m
|
|
```
|
|
|
|
Creating a git provider is very similar to creating other types of providers.
|
|
The only caveat being that the provider address needs to point to the same
|
|
git repository as the event source originates from.
|
|
```yaml
|
|
apiVersion: notification.toolkit.fluxcd.io/v1beta1
|
|
kind: Provider
|
|
metadata:
|
|
name: flux-system
|
|
namespace: flux-system
|
|
spec:
|
|
type: github
|
|
address: https://github.com/<username>/fleet-infra
|
|
secretRef:
|
|
name: github
|
|
---
|
|
apiVersion: notification.toolkit.fluxcd.io/v1beta1
|
|
kind: Alert
|
|
metadata:
|
|
name: podinfo
|
|
namespace: flux-system
|
|
spec:
|
|
providerRef:
|
|
name: flux-system
|
|
eventSeverity: info
|
|
eventSources:
|
|
- kind: Kustomization
|
|
name: podinfo
|
|
namespace: flux-system
|
|
```
|
|
|
|
By now the fleet-infra repository should have a similar directory structure.
|
|
```
|
|
fleet-infra
|
|
└── staging-cluster/
|
|
├── flux-system/
|
|
│ ├── gotk-components.yaml
|
|
│ ├── gotk-sync.yaml
|
|
│ └── kustomization.yaml
|
|
├── podinfo/
|
|
│ ├── namespace.yaml
|
|
│ ├── deployment.yaml
|
|
│ ├── hpa.yaml
|
|
│ ├── service.yaml
|
|
│ └── kustomization.yaml
|
|
├── podinfo-kustomization.yaml
|
|
└── podinfo-notification.yaml
|
|
```
|
|
|
|
If podinfo is deployed and the health checks pass you should get a successful status in
|
|
your forked podinfo repository.
|
|
|
|
If everything is setup correctly there should now be a green checkmark next to the lastest commit.
|
|
Clicking the checkmark should show a detailed view.
|
|
|
|
![commit status github successful](../_files/commit-status-github-success.png)
|
|
|
|
Generate error
|
|
|
|
A deployment failure can be fored by setting an invalid image tag in the podinfo deployment.
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: podinfo
|
|
spec:
|
|
minReadySeconds: 3
|
|
revisionHistoryLimit: 5
|
|
progressDeadlineSeconds: 60
|
|
strategy:
|
|
rollingUpdate:
|
|
maxUnavailable: 0
|
|
type: RollingUpdate
|
|
selector:
|
|
matchLabels:
|
|
app: podinfo
|
|
template:
|
|
metadata:
|
|
annotations:
|
|
prometheus.io/scrape: "true"
|
|
prometheus.io/port: "9797"
|
|
labels:
|
|
app: podinfo
|
|
spec:
|
|
containers:
|
|
- name: podinfod
|
|
image: ghcr.io/stefanprodan/podinfo:fake
|
|
imagePullPolicy: IfNotPresent
|
|
ports:
|
|
- name: http
|
|
containerPort: 9898
|
|
protocol: TCP
|
|
- name: http-metrics
|
|
containerPort: 9797
|
|
protocol: TCP
|
|
- name: grpc
|
|
containerPort: 9999
|
|
protocol: TCP
|
|
command:
|
|
- ./podinfo
|
|
- --port=9898
|
|
- --port-metrics=9797
|
|
- --grpc-port=9999
|
|
- --grpc-service-name=podinfo
|
|
- --level=info
|
|
- --random-delay=false
|
|
- --random-error=false
|
|
env:
|
|
- name: PODINFO_UI_COLOR
|
|
value: "#34577c"
|
|
livenessProbe:
|
|
exec:
|
|
command:
|
|
- podcli
|
|
- check
|
|
- http
|
|
- localhost:9898/healthz
|
|
initialDelaySeconds: 5
|
|
timeoutSeconds: 5
|
|
readinessProbe:
|
|
exec:
|
|
command:
|
|
- podcli
|
|
- check
|
|
- http
|
|
- localhost:9898/readyz
|
|
initialDelaySeconds: 5
|
|
timeoutSeconds: 5
|
|
resources:
|
|
limits:
|
|
cpu: 2000m
|
|
memory: 512Mi
|
|
requests:
|
|
cpu: 100m
|
|
memory: 64Mi
|
|
```
|
|
|
|
After the commit has been reconciled it should return a failed commit status.
|
|
This is where the health check in the Kustomization comes into play together
|
|
with the timeout. The health check is used to asses the health of the Kustomization.
|
|
A failed commit status will not be sent until the health check timesout. Setting
|
|
a lower timeout will give feedback faster, but may sometimes not allow enough time
|
|
for a new application to deploy.
|
|
|
|
![commit status github failure](../_files/commit-status-github-failure.png)
|
|
|
|
### Status changes
|
|
|
|
The provider will continuously receive events as they happen, and multiple events may
|
|
be received for the same commit hash. The git providers are configured to only update
|
|
the status if the status has changed. This is to avoid spamming the commit status
|
|
history with the same status over and over again.
|
|
|
|
There is an aspect of state fullness that needs to be considered, compared to the other
|
|
notification providers, as the events are stored by the git provider. This means that
|
|
the status of a commit can change over time. Initially a deployment may be healthy, resulting
|
|
in a successful status. Down the line the application, and the health check, may start failing
|
|
due to the amount of traffic it receives or external dependencies no longer being available.
|
|
The change in the health check would cause the status to go from successful to failed.
|
|
It is important to keep this in mind when building any automation tools that deals with the
|
|
status, and consider the fact that receiving a successful status once does not mean it will
|
|
always be successful.
|
|
|