SRE: GitOps with ArgoCD
Support this blog
If you find this content useful, consider supporting the blog.
Introduction
Throughout this SRE series we have covered SLIs and SLOs, incident management, observability, chaos engineering, and capacity planning. All of those practices assume that when you change something, the change is tracked, reviewed, auditable, and easy to roll back. That is exactly what GitOps gives you.
If you have been deploying to Kubernetes with kubectl apply or CI pipelines that push directly to the cluster,
you probably know the pain: someone applies a hotfix manually, another person runs a different version of a
manifest, and before you know it the cluster state has drifted from what is in your repository. Nobody knows
what is actually running. GitOps solves this by making Git the single source of truth and using a controller
to continuously reconcile the cluster state with what is declared in your repository.
Let’s get into it.
What is GitOps?
GitOps is an operational model where the desired state of your infrastructure and applications is declared in Git. A controller running in your cluster watches the Git repository and ensures the live state matches the declared state. If something drifts, the controller corrects it automatically.
The core principles are:
- Declarative configuration: Everything is described as YAML or JSON manifests in Git. No imperative scripts, no manual steps.
- Git as the single source of truth: The Git repository is the only place where changes are made. What is in Git is what runs in the cluster.
- Pull-based reconciliation: Instead of CI pushing to the cluster, a controller inside the cluster pulls the desired state from Git. This is more secure because the cluster credentials never leave the cluster.
- Continuous reconciliation: The controller does not just apply changes once. It continuously compares the live state with the desired state and corrects any drift.
This is fundamentally different from traditional push-based CI/CD where a pipeline runs kubectl apply after
a build. With push-based CD, if someone changes something in the cluster manually, your CI does not know about
it. With GitOps, the controller detects the drift and fixes it.
# Push-based CI/CD (traditional):
# Developer → Git push → CI builds → CI runs kubectl apply → Cluster
# (CI needs cluster credentials)
# (drift goes undetected)
#
# Pull-based GitOps:
# Developer → Git push → Controller detects change → Controller applies → Cluster
# (controller lives in cluster, watches Git continuously)
# (drift is detected and corrected automatically)
ArgoCD architecture
ArgoCD is the most popular GitOps controller for Kubernetes. It is a CNCF graduated project with a well-defined architecture.
- API Server: The gRPC/REST server that powers the web UI, CLI, and external integrations. Handles authentication, RBAC, and serves the application state.
- Repository Server: Clones Git repositories and generates Kubernetes manifests. Supports plain YAML, Kustomize, Helm, Jsonnet, and custom plugins.
- Application Controller: The brain of ArgoCD. Watches Application resources, compares desired state (from Git) with live state (from the cluster), and performs sync operations when they differ.
- Redis: Caching layer for the repository server and application controller.
- ApplicationSet Controller: Manages ApplicationSet resources that generate multiple Applications from a single definition.
# ArgoCD reconciliation loop (runs every 3 minutes by default):
# 1. Application Controller reads the Application CRD
# 2. Asks Repo Server to fetch and render manifests from Git
# 3. Controller compares rendered manifests with live cluster state
# 4. If they differ:
# - With auto-sync: Controller applies the changes
# - Without auto-sync: Controller marks the app as OutOfSync
# 5. Controller updates Application status, loop repeats
Installing ArgoCD
The recommended approach is using Helm. Create the namespace and install:
kubectl create namespace argocd
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
Here is a production-ready values file:
# argocd-values.yaml
configs:
params:
server.insecure: true
timeout.reconciliation: 180s
cm:
statusbadge.enabled: "true"
kustomize.buildOptions: "--enable-helm"
server:
replicas: 2
ingress:
enabled: true
ingressClassName: nginx
hostname: argocd.example.com
tls: true
controller:
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
memory: 1Gi
repoServer:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 512Mi
helm install argocd argo/argo-cd \
--namespace argocd \
--values argocd-values.yaml \
--wait
# Get the initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# Login and change password
argocd login argocd.example.com --username admin --password <your-password>
argocd account update-password
Application CRDs
The Application CRD is the fundamental building block. It defines what to deploy, where, and how to keep it in sync:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/kainlite/my-app-manifests
targetRevision: main
path: overlays/production
destination:
server: https://kubernetes.default.svc
namespace: my-app
syncPolicy:
automated:
prune: true # Delete resources no longer in Git
selfHeal: true # Revert manual changes
syncOptions:
- CreateNamespace=true
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas # Ignore if HPA manages replicas
- source: Where to find the manifests.
repoURL,targetRevision(branch/tag), andpath(directory within the repo).- destination: Where to deploy.
serveris the Kubernetes API endpoint,namespaceis the target namespace.- syncPolicy: How to keep things in sync.
automatedenables auto-sync,prunedeletes removed resources,selfHealreverts manual changes.- ignoreDifferences: Fields to ignore when comparing desired vs. live state, useful for fields set dynamically by the cluster.
The App of Apps pattern
When you have many applications, managing each Application resource individually becomes tedious. The App of Apps pattern creates a parent Application that manages child Application manifests.
# Repository structure
gitops-repo/
├── apps/ # Parent app points here
│ ├── my-app.yaml # Child Application manifests
│ ├── monitoring.yaml
│ ├── cert-manager.yaml
│ └── ingress-nginx.yaml
├── my-app/
│ ├── base/
│ └── overlays/
│ ├── staging/
│ └── production/
└── monitoring/
The parent Application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-of-apps
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/kainlite/gitops-repo
targetRevision: main
path: apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
Child applications use sync-wave annotations to control deployment order. Infrastructure components get
wave 0, application workloads get wave 2:
# apps/cert-manager.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "0" # Deploy infrastructure first
spec:
project: default
source:
repoURL: https://charts.jetstack.io
chart: cert-manager
targetRevision: v1.16.3
helm:
releaseName: cert-manager
values: |
installCRDs: true
destination:
server: https://kubernetes.default.svc
namespace: cert-manager
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Sync strategies
ArgoCD gives you fine-grained control over how and when syncs happen. A common pattern is auto-sync for staging and manual for production:
# Auto-sync: changes applied automatically
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual cluster changes
# Manual sync: omit the automated section
syncPolicy:
syncOptions:
- CreateNamespace=true
Retry policies handle transient failures:
syncPolicy:
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
Sync windows restrict when ArgoCD can sync, useful for change freezes:
# In the AppProject spec
spec:
syncWindows:
- kind: allow
schedule: "0 9 * * 1-5" # Mon-Fri at 9am
duration: 8h
applications: ["*"]
- kind: deny
schedule: "0 0 20 12 *" # Holiday freeze
duration: 336h
applications: ["*"]
clusters: ["production"]
Health checks and custom health
ArgoCD has built-in health checks for standard Kubernetes resources. For custom resources (CRDs), you can write Lua health check scripts:
- Healthy: The resource is operating correctly
- Progressing: Not yet healthy but making progress
- Degraded: The resource has an error
- Suspended: The resource is paused
- Missing: The resource does not exist
# Custom health check for cert-manager Certificate (in argocd-cm ConfigMap)
resource.customizations.health.cert-manager.io_Certificate: |
hs = {}
if obj.status ~= nil then
if obj.status.conditions ~= nil then
for i, condition in ipairs(obj.status.conditions) do
if condition.type == "Ready" and condition.status == "False" then
hs.status = "Degraded"
hs.message = condition.message
return hs
end
if condition.type == "Ready" and condition.status == "True" then
hs.status = "Healthy"
hs.message = condition.message
return hs
end
end
end
end
hs.status = "Progressing"
hs.message = "Waiting for certificate"
return hs
Rollback patterns
One of the biggest advantages of GitOps is that rollback is just a git revert. ArgoCD also provides
its own rollback mechanisms for emergencies.
# The GitOps way: revert the commit in Git
git revert HEAD --no-edit
git push
# ArgoCD detects the change and syncs automatically
# ArgoCD history-based rollback
argocd app history my-app
argocd app rollback my-app 2
# Note: this does not revert Git, so auto-sync will eventually re-apply
# Disable auto-sync first or also revert in Git
Multi-cluster management with ApplicationSets
ApplicationSets generate multiple Applications from a template using generators. Instead of manually creating an Application for each cluster, you define a template and a generator that produces the variations.
List generator: Provide explicit parameter sets:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: my-app
namespace: argocd
spec:
generators:
- list:
elements:
- cluster: staging
url: https://staging-api.example.com
- cluster: production
url: https://production-api.example.com
template:
metadata:
name: "my-app-{{cluster}}"
spec:
project: default
source:
repoURL: https://github.com/kainlite/gitops-repo
targetRevision: main
path: "my-app/overlays/{{cluster}}"
destination:
server: "{{url}}"
namespace: my-app
syncPolicy:
automated:
prune: true
selfHeal: true
Cluster generator: Automatically creates Applications for every matching cluster:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: monitoring-stack
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
environment: production
template:
metadata:
name: "monitoring-{{name}}"
spec:
project: monitoring
source:
repoURL: https://github.com/kainlite/gitops-repo
targetRevision: main
path: monitoring
destination:
server: "{{server}}"
namespace: monitoring
Git generator: Creates Applications based on directory structure or config files:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: team-apps
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/kainlite/gitops-repo
revision: main
directories:
- path: "teams/*/apps/*"
template:
metadata:
name: "{{path.basename}}"
spec:
project: default
source:
repoURL: https://github.com/kainlite/gitops-repo
targetRevision: main
path: "{{path}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{path.basename}}"
Kustomize and Helm integration
ArgoCD natively supports both Kustomize and Helm. It renders manifests at sync time, so you do not need to run these tools in your CI pipeline.
For Kustomize, just point the Application source to the overlay directory:
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- target:
kind: Deployment
name: my-app
patch: |
- op: replace
path: /spec/replicas
value: 3
images:
- name: kainlite/my-app
newTag: v1.2.3
For Helm charts from a chart repository:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus
namespace: argocd
spec:
source:
repoURL: https://prometheus-community.github.io/helm-charts
chart: kube-prometheus-stack
targetRevision: 67.9.0
helm:
releaseName: prometheus
values: |
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 50Gi
grafana:
enabled: true
destination:
server: https://kubernetes.default.svc
namespace: monitoring
syncPolicy:
syncOptions:
- ServerSideApply=true
RBAC and SSO
Projects are the primary mechanism for restricting access. Each project defines which repositories, clusters, and namespaces an application can use:
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: payments-team
namespace: argocd
spec:
description: "Payments team project"
sourceRepos:
- "https://github.com/kainlite/payments-*"
destinations:
- server: https://kubernetes.default.svc
namespace: "payments-*"
namespaceResourceWhitelist:
- group: "apps"
kind: Deployment
- group: ""
kind: Service
- group: ""
kind: ConfigMap
- group: ""
kind: Secret
roles:
- name: developer
policies:
- p, proj:payments-team:developer, applications, get, payments-team/*, allow
- p, proj:payments-team:developer, applications, sync, payments-team/*, allow
groups:
- payments-developers
- name: admin
policies:
- p, proj:payments-team:admin, applications, *, payments-team/*, allow
groups:
- payments-admins
SSO with OIDC:
# In argocd-cm ConfigMap
oidc.config: |
name: Keycloak
issuer: https://keycloak.example.com/realms/engineering
clientID: argocd
clientSecret: $oidc.keycloak.clientSecret
requestedScopes: ["openid", "profile", "email", "groups"]
# In argocd-rbac-cm ConfigMap
policy.default: role:readonly
policy.csv: |
g, platform-admins, role:admin
g, payments-developers, proj:payments-team:developer
p, role:readonly, applications, get, */*, allow
Notifications
ArgoCD Notifications sends alerts on sync events. It is included in the Helm chart since ArgoCD 2.6+:
# argocd-notifications-cm ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
service.slack: |
token: $slack-token
template.app-sync-succeeded: |
slack:
attachments: |
[{"color": "#18be52", "title": "{{.app.metadata.name}} synced successfully",
"fields": [
{"title": "Revision", "value": "{{.app.status.sync.revision}}", "short": true},
{"title": "Namespace", "value": "{{.app.spec.destination.namespace}}", "short": true}
]}]
template.app-sync-failed: |
slack:
attachments: |
[{"color": "#E96D76", "title": "{{.app.metadata.name}} sync FAILED",
"fields": [
{"title": "Error", "value": "{{range .app.status.conditions}}{{.message}}{{end}}"}
]}]
trigger.on-sync-succeeded: |
- when: app.status.operationState.phase in ['Succeeded']
send: [app-sync-succeeded]
trigger.on-sync-failed: |
- when: app.status.operationState.phase in ['Error', 'Failed']
send: [app-sync-failed]
Subscribe applications to notifications with annotations:
metadata:
annotations:
notifications.argoproj.io/subscribe.on-sync-succeeded.slack: deployments
notifications.argoproj.io/subscribe.on-sync-failed.slack: deployments-alerts
Monitoring ArgoCD itself
ArgoCD exposes Prometheus metrics out of the box. Here are the key metrics to watch:
- argocd_app_info: Gauge with sync status and health per application
- argocd_app_sync_total: Counter of sync operations (track deployment frequency)
- argocd_app_reconcile_bucket: Histogram of reconciliation duration
- argocd_git_request_total: Counter of Git requests (failures mean ArgoCD cannot reach your repos)
- argocd_cluster_api_resource_objects: Gauge of tracked objects per cluster (memory planning)
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: monitoring
spec:
groups:
- name: argocd.rules
rules:
- alert: ArgoCDAppOutOfSync
expr: argocd_app_info{sync_status="OutOfSync"} == 1
for: 30m
labels:
severity: warning
annotations:
summary: "ArgoCD app {{ $labels.name }} out of sync for 30m+"
- alert: ArgoCDAppUnhealthy
expr: argocd_app_info{health_status!~"Healthy|Progressing"} == 1
for: 15m
labels:
severity: critical
annotations:
summary: "ArgoCD app {{ $labels.name }} is {{ $labels.health_status }}"
- alert: ArgoCDSyncFailing
expr: increase(argocd_app_sync_total{phase!="Succeeded"}[1h]) > 3
labels:
severity: critical
annotations:
summary: "More than 3 failed syncs in 1h for {{ $labels.name }}"
- alert: ArgoCDGitFetchErrors
expr: increase(argocd_git_request_total{request_type="fetch", result="error"}[10m]) > 5
labels:
severity: warning
annotations:
summary: "ArgoCD cannot fetch from Git repositories"
Make sure Prometheus scrapes ArgoCD metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-metrics
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/part-of: argocd
namespaceSelector:
matchNames: [argocd]
endpoints:
- port: metrics
interval: 30s
Closing notes
GitOps with ArgoCD gives you a deployment workflow that is auditable, repeatable, and self-healing. By treating Git as the single source of truth and letting a controller handle reconciliation, you eliminate an entire class of problems related to configuration drift and manual deployments. The combination of Application CRDs, the App of Apps pattern, ApplicationSets, and proper RBAC gives you a solid foundation for managing anything from a single cluster to a fleet of clusters across multiple environments.
This article continues the SRE series where we have been building up the practices and tools needed to run reliable systems. GitOps is the glue that ties everything together, because all the SLO definitions, monitoring configurations, and infrastructure changes we covered in previous articles should flow through Git and be reconciled by ArgoCD.
Hope you found this useful and enjoyed reading it, until next time!
Errata
If you spot any error or have any suggestion, please send me a message so it gets fixed.
Also, you can check the source code and changes in the sources here
$ Comments
Online: 0Please sign in to be able to write comments.