DevOps from Zero to Hero: Infrastructure as Code with Terraform

2026-05-09 | Gabriel Garrido | 9 min read
Share:

Support this blog

If you find this content useful, consider supporting the blog.

Introduction

Welcome to article seven of the DevOps from Zero to Hero series. In the previous article we explored AWS networking: VPCs, subnets, route tables, and security groups. Now it is time to stop clicking around in the AWS console and start defining infrastructure the same way we define application code: in files, under version control, with repeatable results.


This is Infrastructure as Code (IaC), and it is one of the most important practices in modern DevOps. If you have ever manually created an EC2 instance, realized you forgot a tag, created another one differently, and then had no idea which was “the right one,” you already understand the problem IaC solves.


We will cover what IaC is, walk through the core Terraform workflow, learn how to manage state safely, and build a real VPC with public and private subnets using HCL files. If you want to go deeper after this, check out Getting started with Terraform modules and Brief introduction to Terratest.


Let’s get into it.


What is Infrastructure as Code?

IaC means defining your infrastructure (servers, networks, databases, load balancers, DNS records) in declarative configuration files rather than creating them manually through a web console.


  • Reproducibility: Recreate your entire infrastructure from scratch with a single command. No more “it works in staging but not in production” because someone configured something differently.
  • Version control: Every change is tracked in Git. You can see who changed what, when, and why.
  • Collaboration: Infrastructure changes go through pull requests just like code changes.
  • Drift detection: IaC tools detect when real state drifts from declared state and bring it back in line.
  • Documentation: Your code IS your documentation. Always up to date because it is the source of truth.

IaC vs ClickOps

“ClickOps” is the term for managing infrastructure by clicking through a cloud console. It is fine for learning but falls apart in teams:


  • No audit trail: Someone changes a security group rule. Three months later, nobody remembers who or why.
  • Snowflake servers: Each environment is slightly different because different people configured them at different times.
  • No reproducibility: Could you recreate your production environment from scratch? How long would it take?
  • Human error: At 2 AM you accidentally delete a production database because you were in the wrong tab.
  • Knowledge silos: Only one person knows how the network is configured because they set it up manually.

IaC eliminates all of these problems. Infrastructure defined in code, reviewed by the team, tracked in Git, reproducible at any time.


Why Terraform?

Several IaC tools exist:


  • CloudFormation: AWS-native, JSON/YAML. AWS-only, verbose, but deep AWS integration.
  • Pulumi: Infrastructure in real programming languages (TypeScript, Python, Go). Great DX, smaller community.
  • AWS CDK: Generates CloudFormation using TypeScript or Python. AWS-only, nicer than raw CloudFormation.
  • Terraform: HashiCorp’s tool using HCL. Works across AWS, GCP, Azure, Kubernetes, and hundreds of providers.

We use Terraform because it works across clouds, has the largest ecosystem, and is what most teams use. The concepts (state, plans, declarative config) transfer to any IaC tool.


Terraform basics: the building blocks

Terraform uses HCL (HashiCorp Configuration Language), a declarative language for describing infrastructure.


Providers are plugins that let Terraform talk to a cloud or service:


terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

Resources describe a piece of infrastructure:


resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  tags = { Name = "web-server" }
}

Data sources read information without creating anything:


data "aws_ami" "ubuntu" {
  most_recent = true
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
  owners = ["099720109477"]
}

Variables parameterize your configuration:


variable "environment" {
  description = "Deployment environment"
  type        = string
  default     = "dev"
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.micro"
}

Outputs extract values after creation:


output "instance_public_ip" {
  description = "Public IP of the web server"
  value       = aws_instance.web.public_ip
}

The Terraform workflow: init, plan, apply, destroy

terraform init downloads providers and sets up the backend:


$ terraform init
Initializing provider plugins...
- Installing hashicorp/aws v5.82.1...
Terraform has been successfully initialized!

terraform plan shows what would change without changing anything:


$ terraform plan
  # aws_instance.web will be created
  + resource "aws_instance" "web" {
      + ami           = "ami-0c55b159cbfafe1f0"
      + instance_type = "t3.micro"
    }
Plan: 1 to add, 0 to change, 0 to destroy.

The symbols: + create, ~ modify, - destroy, -/+ replace. Always read the plan before applying.


terraform apply makes changes real (asks for confirmation):


$ terraform apply
aws_instance.web: Creating...
aws_instance.web: Creation complete after 32s [id=i-0abc123def456789]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

terraform destroy tears everything down when you no longer need it.


State management

Terraform records what it created in a state file. By default this is local (terraform.tfstate), which breaks in teams:


  • No sharing: Teammates cannot run Terraform without the state file.
  • No locking: Two concurrent applies can corrupt state or create duplicates.
  • Risk of loss: Laptop dies, state is gone, Terraform forgets your infrastructure.

The solution is remote state with S3 + DynamoDB locking:


# Create S3 bucket for state (one-time setup)
aws s3api create-bucket --bucket my-terraform-state --region us-east-1
aws s3api put-bucket-versioning --bucket my-terraform-state \
  --versioning-configuration Status=Enabled

# Create DynamoDB table for locking
aws dynamodb create-table --table-name terraform-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST --region us-east-1

Then configure the backend:


terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-lock"
    encrypt        = true
  }
}

Now state is shared, versioned, encrypted, and locked during applies.


Practical example: provisioning a VPC

Let’s build a VPC with public and private subnets, an internet gateway, route tables, and a security group, the same architecture from the networking article, but as code.


variables.tf


variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "dev"
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "public_subnet_cidrs" {
  type    = list(string)
  default = ["10.0.1.0/24", "10.0.2.0/24"]
}

variable "private_subnet_cidrs" {
  type    = list(string)
  default = ["10.0.10.0/24", "10.0.11.0/24"]
}

variable "allowed_ssh_cidr" {
  type    = string
  default = "0.0.0.0/0"
}

main.tf


data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_support   = true
  enable_dns_hostnames = true
  tags = {
    Name = "${var.environment}-vpc"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.environment}-igw" }
}

resource "aws_subnet" "public" {
  count                   = length(var.public_subnet_cidrs)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true
  tags = { Name = "${var.environment}-public-${count.index + 1}", Tier = "public" }
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]
  tags = { Name = "${var.environment}-private-${count.index + 1}", Tier = "private" }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  tags = { Name = "${var.environment}-public-rt" }
}

resource "aws_route_table_association" "public" {
  count          = length(var.public_subnet_cidrs)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.environment}-private-rt" }
}

resource "aws_route_table_association" "private" {
  count          = length(var.private_subnet_cidrs)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}

resource "aws_security_group" "web" {
  name        = "${var.environment}-web-sg"
  description = "Allow HTTP, HTTPS, and SSH"
  vpc_id      = aws_vpc.main.id
  tags = { Name = "${var.environment}-web-sg" }
}

resource "aws_vpc_security_group_ingress_rule" "http" {
  security_group_id = aws_security_group.web.id
  cidr_ipv4 = "0.0.0.0/0"
  from_port = 80
  to_port   = 80
  ip_protocol = "tcp"
}

resource "aws_vpc_security_group_ingress_rule" "https" {
  security_group_id = aws_security_group.web.id
  cidr_ipv4 = "0.0.0.0/0"
  from_port = 443
  to_port   = 443
  ip_protocol = "tcp"
}

resource "aws_vpc_security_group_ingress_rule" "ssh" {
  security_group_id = aws_security_group.web.id
  cidr_ipv4 = var.allowed_ssh_cidr
  from_port = 22
  to_port   = 22
  ip_protocol = "tcp"
}

resource "aws_vpc_security_group_egress_rule" "all_outbound" {
  security_group_id = aws_security_group.web.id
  cidr_ipv4   = "0.0.0.0/0"
  ip_protocol = "-1"
}

outputs.tf


output "vpc_id" {
  value = aws_vpc.main.id
}

output "public_subnet_ids" {
  value = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

output "security_group_id" {
  value = aws_security_group.web.id
}

What this creates:


  • VPC with DNS support using the configured CIDR block
  • Internet Gateway attached to the VPC for public internet access
  • Public subnets across availability zones with automatic public IP assignment
  • Private subnets with no internet route, keeping resources isolated
  • Route tables directing public traffic through the gateway
  • Security group allowing HTTP, HTTPS, SSH inbound and all outbound

Variables and tfvars

Use .tfvars files for environment-specific values:


# terraform.tfvars (dev defaults)
aws_region  = "us-east-1"
environment = "dev"
vpc_cidr    = "10.0.0.0/16"

# prod.tfvars
# aws_region           = "us-east-1"
# environment          = "prod"
# vpc_cidr             = "10.1.0.0/16"
# public_subnet_cidrs  = ["10.1.1.0/24", "10.1.2.0/24", "10.1.3.0/24"]
# private_subnet_cidrs = ["10.1.10.0/24", "10.1.11.0/24", "10.1.12.0/24"]
# allowed_ssh_cidr     = "203.0.113.0/24"

# Uses terraform.tfvars automatically
terraform plan

# Uses a specific file
terraform plan -var-file="prod.tfvars"

# Or pass directly
terraform plan -var="environment=staging"

# Or use environment variables
export TF_VAR_environment="staging"

Precedence (lowest to highest): defaults, terraform.tfvars, *.auto.tfvars, -var-file, -var, TF_VAR_ env vars.


Running the example
terraform init       # Download providers
terraform fmt        # Format code
terraform validate   # Check syntax
terraform plan       # Preview changes
terraform apply      # Create infrastructure
terraform output     # Show outputs
terraform state list # List managed resources
terraform destroy    # Clean up when done

Best practices

A few things to keep in mind:


  • Never commit state files to Git. They contain sensitive data. Use remote state.
  • Do commit .terraform.lock.hcl. It pins provider versions like package-lock.json.
  • Be careful with .tfvars. If they contain secrets, use environment variables or a secrets manager instead.
  • Tag everything with ManagedBy = "terraform" so you can distinguish IaC-managed resources from manual ones.
  • Use plan -out=tfplan in CI/CD to save a plan file and apply exactly what was reviewed.

Closing notes

Infrastructure as Code changes how you think about infrastructure. Instead of fragile, manually configured environments, you get reproducible, version-controlled definitions that anyone on the team can read and modify.


Terraform is not the only tool, but it is a great starting point. The declarative approach (describe what you want, Terraform figures out how to get there) makes it accessible, and the plan-before-apply workflow gives you a safety net that clicking through a console never could.


Start small. One resource, one plan, one apply. Then add more. Before long your entire infrastructure lives in a handful of files and you will wonder how you ever managed without it.


Hope you found this useful and enjoyed reading it, until next time!


Errata

If you spot any error or have any suggestion, please send me a message so it gets fixed.

Also, you can check the source code and changes in the sources here



$ Comments

Online: 0

Please sign in to be able to write comments.

2026-05-09 | Gabriel Garrido