A client came to us with a $14k/month AWS bill on a workload that should have been closer to $4k. Three weeks of audit and quiet refactors later, the bill landed at $5.3k — a 62% cut, with the same SLOs and the same engineering team.

The audit framework

We start with Cost Explorer grouped by service, then by usage type. The 80/20 rule applies aggressively: usually 3 line items are 70% of the bill. Don't optimize the long tail until the head is fixed.

Easy wins we hit first

  • S3 lifecycle policies — moved cold backup data to Glacier Deep Archive. Cut storage line item by 78%.
  • NAT Gateway egress — added VPC endpoints for S3, DynamoDB, and SQS. Removed a fixed cost we'd been bleeding for two years.
  • RDS right-sizing — staging was on the same instance class as prod for no defensible reason.

The architectural change that mattered

The biggest win was nothing flashy: replacing always-on ECS tasks with Fargate Spot for the background workers. The workloads were idempotent and could tolerate restarts, so Spot was a free lunch — about 70% off the compute for that workload, which was a quarter of the total bill.

Reserved Instances came last. They're real money, but they bind you to a usage shape — and if you're about to refactor that workload, you've just paid for a year of the old shape. Always optimize architecture first, then commit.