SRE: Chaos Engineering, Breaking Things on Purpose
We will explore chaos engineering in Kubernetes using Litmus and Chaos Mesh, how to plan and run game days, and why breaking things on purpose is the best way to build reliable systems...
SRE: Dependency Management and Graceful Degradation
We will explore how to manage service dependencies reliably, from circuit breakers and bulkhead patterns to graceful degradation strategies and dependency SLOs with practical Elixir and Kubernetes examples...
SRE: Database Reliability
We will explore database reliability patterns for PostgreSQL in Kubernetes, from connection pooling and backup strategies to zero-downtime migrations, CloudNativePG operator, and failover automation...
SRE: Disaster Recovery and Business Continuity
We will explore disaster recovery planning for Kubernetes, from RPO and RTO targets to Velero backups, etcd recovery, multi-region strategies, DR drills, and runbooks for full cluster recovery...