The challenge
The client's AWS bill had grown to €47,000/month — roughly 3× what it should have been for their workload. Over-provisioned EC2 instances, unused RDS replicas, unoptimized S3 lifecycle policies, and no cost attribution per service. Their CTO knew they were overspending but had no clear picture of where or how to fix it.
The hard constraint: The client processes real-time sensor data from 10,000+ IoT devices. Latency spikes would directly impact their SLA with enterprise customers. Any optimization had to be performance-neutral.
Our approach
We started with a one-week infrastructure audit: mapped every AWS resource, tagged untagged resources (we found 34% of resources had no cost allocation tags), and identified unused and over-provisioned instances.
The next step was building visibility. We created a cost attribution model using AWS Cost Explorer combined with custom Datadog dashboards, broken down by team and service. Then we executed a phased optimization plan over six weeks.
Key technical decisions:
- Migrated stateless workloads from EC2 to ECS Fargate — right-sizing plus scale-to-zero during off-peak
- Implemented S3 Intelligent Tiering for 4TB of sensor data
- Chose NOT to move to Kubernetes — their workload didn't justify the operational complexity
- Reserved Instances for predictable baseline (RDS, core ECS services)
What we built
A leaner, fully observed infrastructure with clear cost ownership:
- Per-team cost dashboards with weekly automated reports
- Terraform-managed infrastructure with cost-aware resource sizing
- S3 lifecycle policies reducing storage costs by 40%
- Autoscaling policies tuned to actual traffic patterns
- Comprehensive runbook for ongoing FinOps practices