SRE: SLIs, SLOs, and Automations That Actually Help
We will explore how to define SLIs and SLOs as code, deploy them with ArgoCD, and use MCP servers to automate SRE workflows...
SRE: Incident Management, On-Call, and Postmortems as Code
We will explore how to build an effective incident management workflow, set up on-call rotations that don't burn people out, write runbooks as code, and run blameless postmortems...
SRE: Observability Deep Dive: Traces, Logs, and Metrics
We will explore the three pillars of observability, how to instrument your applications with OpenTelemetry, build useful dashboards in Grafana, and set up log aggregation that actually helps during incidents...
SRE: Chaos Engineering, Breaking Things on Purpose
We will explore chaos engineering in Kubernetes using Litmus and Chaos Mesh, how to plan and run game days, and why breaking things on purpose is the best way to build reliable systems...
SRE: Capacity Planning, Autoscaling, and Load Testing
We will explore how to right-size your Kubernetes workloads, configure HPA and VPA for automatic scaling, use KEDA for event-driven scaling, and load test with k6 to validate your capacity...
SRE: Secrets Management in Kubernetes
We will explore secrets management in Kubernetes, from Sealed Secrets and External Secrets Operator to HashiCorp Vault integration, secret rotation strategies, and SOPS for encrypting secrets in Git...
SRE: GitOps with ArgoCD
We will explore GitOps principles with ArgoCD, from Application CRDs and App of Apps patterns to sync strategies, multi-cluster management with ApplicationSets, and monitoring your GitOps workflows...
SRE: Cost Optimization in the Cloud
We will explore FinOps principles and cost optimization strategies for Kubernetes and cloud infrastructure, from right-sizing workloads and spot instances to Kubecost visibility and cost-aware SLOs...
SRE: Dependency Management and Graceful Degradation
We will explore how to manage service dependencies reliably, from circuit breakers and bulkhead patterns to graceful degradation strategies and dependency SLOs with practical Elixir and Kubernetes examples...
SRE: Release Engineering and Progressive Delivery
We will explore release engineering practices for reliable deployments, from canary releases with Argo Rollouts and blue-green deployments to feature flags, rollback automation, and deployment SLOs...
SRE: Database Reliability
We will explore database reliability patterns for PostgreSQL in Kubernetes, from connection pooling and backup strategies to zero-downtime migrations, CloudNativePG operator, and failover automation...