The 2026 FinOps Roadmap: From Cost
My first AWS bill was $23,000. I had been working at the company for three weeks. Nobody told me. The bill just grew quietly in the background while I was proud of the feature I shipped. A Lambda function that called an external enrichment API on every user event. Clean code. Solid tests. Thirty-two million events that month. At $0.0007 per API call. My engineering manager forwarded the invoice with two words: "Please explain." That was the moment I discovered FinOps — not from a conference talk or a certification course, but from the specific shame of having written expensive code and not knowing it until the damage was done. This roadmap is what I needed that day. A complete, honest guide to transforming from an engineer who builds things that work into an engineer who builds things that work andcost what they should. By the end of this guide, you'll have the skills, the scripts, and the vocabulary to talk about cloud spend the way a CFO and a CTO both want to hear. What You'll Learn Prerequisites The Four Stages Overview Stage 1: The Cost-Aware Engineer — Months 1 to 3 Stage 2: The Optimisation Specialist — Months 4 to 8 Stage 3: The Automation Architect — Months 9 to 15 Stage 4: The Cloud Financial Manager — Months 16 to 24 Essential Tools and Certifications Your 90-Day Action Plan Best Practices Summary Resources How to read your AWS bill as an engineer, not as a passive observer The exact tagging strategy that makes cost attribution possible How to right-size EC2 and RDS instances using CloudWatch data you already have The correct sequence for purchasing Savings Plans — and why sequence matters more than the discount percentage How to build automated cleanup systems for orphaned resources How to present cloud cost findings to engineering leadership with data that drives decisions The chargeback and showback models that make cost accountability stick Let's begin. Before following this roadmap, you should have some skills and tools ready to go. Knowledge: You can deploy an application to AWS (EC2, Lambda, or containers) You understand basic AWS services: S3, RDS, EC2, VPC, IAM You're comfortable reading Python and writing simple bash scripts You know what a pull request is and have gone through at least one code review Access: Read-only access to your AWS billing console and Cost Explorer AWS CLI v2 configured with at least Python 3.9 or later for running the audit scripts in this guide Mindset:You don't need to be a finance expert. But you do need to be willing to look at numbers that might be uncomfortable. Every engineer I've worked with who became excellent at FinOps had one thing in common: they were willing to be the person who asked "but what does this cost?" in a room where nobody else wanted to. Estimated time:This roadmap covers 24 months of deliberate skill-building. You can absorb the reading in a few evenings. The practice is the 24 months. Before going deep, here's the complete picture of where you're going: The reason this is a 24-month journey and not a weekend project: each stage builds on the previous one. Engineers who jump straight to Savings Plans without rightsizing first end up paying discounted prices for waste. Engineers who build dashboards before tagging get beautiful charts with no actionable data. The sequence isn't arbitrary. The default AWS Cost Explorer view shows you service-level totals. That's accounting. What you need is engineering-level decomposition: which specific resources cost money, what business function they serve, and whether each dollar is justified. Start by pulling a proper breakdown: Save the output. Name the file Most engineers stop at "what is this service?" and never reach the useful question. Here's the framework I use when I first audit an account: The first question is whether you know what specific business function this service is performing. Not the product name, the function. "S3" isn't an answer. "Storing unprocessed video uploads that sit for 90 days before anyone watches them" is an answer. The second question is whether the cost is growing, stable, or shrinking when you look at the past three months. A stable \(12,000/month is a different problem from a \)12,000/month line that was $4,000 six months ago. The third question is what percentage of your total bill this service represents. Optimising a 1% line item while a 40% line item runs unchecked is a common time-wasting trap. Here's the honest truth about tagging: most tagging strategies die within six months because they're designed for reporting rather than for engineers. Engineers don't tag things well when they're moving fast. The solution isn't to demand more discipline. Instead, it's to make tagging enforced at the infrastructure layer. Here's the minimal viable tag set (the six tags that cover 90% of attribution needs): Enforce tags at the Terraform level so they can't be skipped: Find everything that's currently untagged: Once you start finding untagged resources, you'll discover a pattern: the oldest resources in the account are the least tagged, and they're often the most expensive. An EC2 instance from 2021 that predates your tagging policy is exactly the kind of thing that generates a $3,000/month line item nobody can explain. The most underused FinOps practice in engineering teams is reviewing code changes for cost implications before they merge. It takes thirty seconds per PR once you build the habit, and it prevents the kind of problem that opened this guide: the expensive feature that nobody priced before shipping. Add this section to your PR template: The discipline is in making cost estimation a first-class review concern, not an afterthought that gets caught by the finance team on the 15th of the month. By the end of month 3, you should have a baseline cost breakdown on file, 100% tag coverage on active resources, identified your top 5 cost drivers with specific reduction targets, and blocked at least one expensive PR with a cost justification that held up in review. The single most reliable source of cloud waste I find in every account I audit is over-provisioned compute. The pattern is consistent: an engineer provisions an instance at a size that handles their anticipated peak load, the peak never quite materialises at the expected scale, and nobody revisits the instance size because there's no automatic signal that says "this machine is 75% empty." Make sure you verify actual utilisation before changing anything: A word of caution: CPU utilisation below 20% is a signal, not a verdict. Some workloads are memory-intensive or I/O-bound and will show low CPU while being correctly sized. Before acting on any right-sizing recommendation, check memory utilisation (requires the CloudWatch agent) and network I/O patterns alongside CPU. S3 Standard costs \(0.023 per GB per month. S3 Glacier Deep Archive costs \)0.00099 per GB per month. The difference is a factor of 23. If you have data that you last accessed six months ago and you're keeping it in S3 Standard because nobody set up lifecycle policies, you're paying 23x more than necessary. The complete S3 lifecycle policy for engineering teams: A Savings Plan is a commitment to spend a minimum dollar amount per hour on AWS compute for one or three years, in exchange for discounts of 30–70% off On-Demand rates. The discount is real. The trap is buying before optimising. The wrong order:You have a \(50,000/month EC2 bill. You buy a Savings Plan covering \)35,000/hour. Then you implement right-sizing and Spot instances — and your actual spend drops to \(22,000/month. You've committed to paying \)35,000/month for 12 months against a need of \(22,000. You're paying \)13,000/month for compute you don't use, at a 30% discount. Congratulations on your discounted waste. The right order: Calculate what to commit to: Orphaned resources are the cloud equivalent of a gym membership you forgot to cancel. They exist, they charge you, but nobody notices until the annual audit. The root cause isn't laziness. It's the absence of lifecycle management at the infrastructure layer. When an engineer spins up an EC2 instance for a one-week experiment and then leaves the company, there's no automatic signal that the instance is now orphaned. It sits there, billing $140/month, until someone hunts it down. The fix is a weekly automated audit that surfaces candidates for deletion and notifies the registered owner, not a process change that depends on engineers remembering to clean up. The goal is to catch expensive infrastructure changes at the PR stage — before they deploy and before they generate a billing surprise. By month 16, you have the data. What changes at Stage 4 is the audience. You're no longer presenting to engineers who understand instance types and NAT Gateway pricing. You're presenting to a CTO who wants to know if the infrastructure investment is proportional to the business value it produces, and a CFO who wants to know when the line will stop going up. The vocabulary shift is simple but important. You stop saying "we right-sized our EC2 instances" and start saying "we reduced our infrastructure unit cost by 28% while maintaining the same request throughput." You stop saying "we eliminated NAT Gateway charges" and start saying "we closed a $6,400/month gap between what we were paying and what was necessary." The metric that anchors every executive FinOps conversation is cost per business unit. Not total bill (cost per API call, cost per user, cost per transaction, cost per model inference). That ratio tells the story of whether your infrastructure efficiency is improving as the business scales. Chargeback means actually billing departments for their cloud usage. Showback means showing departments their usage costs without the internal billing transfer. Both create the same outcome: engineers start caring about what they consume because someone they work with is paying attention to it. The tools that matter at each stage of this roadmap: The one certification worth your time:FinOps Certified Practitioner from the FinOps Foundation. It takes 20 hours to prepare and $300 to sit. It signals to hiring managers and clients that you understand the discipline formally — which matters when you're the person leading FinOps conversations at the executive level. Enable Cost Explorer if it isn't already on. Pull the baseline command from Section 1.1 and save the output. Run the untagged resource query from Section 1.2 and document how many resources are missing tags. Find your top three cost drivers. Present the findings to your engineering manager — not as a problem, but as an opportunity with a dollar figure attached. Run the rightsizing analyser from Section 2.1 on your EC2 fleet. Downsize the three highest-confidence candidates. Apply S3 lifecycle policies to your two largest buckets. Create VPC endpoints for S3, ECR, and DynamoDB. Estimate the savings from each action and document them against your baseline. Deploy the orphan reporter Lambda on a Sunday schedule. Add the cost check GitHub Action to your infrastructure repository. Start a monthly FinOps review meeting — even if it's just you and one other engineer. Build the habit before you need the audience. ✅ Do:Establish a cost baseline before any optimisation. The number is meaningless without a comparison point. ✅ Do:Right-size before buying Savings Plans. Always. The sequence changes the outcome. ✅ Do:Enforce tagging at the infrastructure layer — Terraform or CloudFormation — not as a process reminder. ✅ Do:Move staging and development to Spot instances. The interruption rate is manageable, while the 70% cost difference is not. ✅ Do:Add VPC endpoints for S3, ECR, and DynamoDB before reviewing data transfer costs. It's a 30-minute fix for a multi-thousand-dollar line item. ✅ Do:Present cost findings as cost-per-business-metric, not as total bill. "We reduced cost per transaction from \(0.0021 to \)0.0013" is a business result. "$38,000/month reduction" is an accounting result. ❌ Don't:Buy Savings Plans on an unoptimised baseline. You'll lock in discounted waste. ❌ Don't:Build FinOps dashboards before tagging is complete. Beautiful charts with no attribution data answer no questions. ❌ Don't:Run orphaned resource cleanup without human review first. Run in report-only mode for two weeks, verify the candidates are genuinely orphaned, then add deletion logic. FinOps Foundation Framework— The practitioner framework that defines the Inform, Optimise, and Operate cycle this roadmap is built on AWS Cost Explorer API Reference— Full reference for the cost query commands used throughout this guide AWS Compute Optimizer— AWS's own rightsizing recommendation service; complements the manual analysis in Stage 2 Infracost Documentation— Setup guide for the PR-level cost estimation tool in Stage 3 FinOps Certified Practitioner Exam— The certification referenced in the tools section AWS Savings Plans Documentation— The authoritative reference on commitment types, coverage rules, and purchase strategy Companion Repository— All scripts from this guide, including the rightsizing analyser, orphan reporter, and showback report generator Ayobami Adejumois a senior platform engineer and FinOps consultant. He has audited AWS infrastructure for 20+ Series A and Series B companies. He is an active FinOps Foundation SupporterWhat You'll Learn
Prerequisites
ReadOnlyAccesspolicy attachedThe Four Stages Overview
Stage 1 — Cost-Aware Engineer (Months 1–3)├── Read your cloud bill and understand it├── Tag every resource with meaningful metadata├── Identify your top 5 cost drivers└── Block your first expensive PR with cost justificationStage 2 — Optimisation Specialist (Months 4–8)├── Right-size every over-provisioned resource├── Implement storage lifecycle policies├── Move non-production to Spot instances└── Purchase your first Savings Plan in the right orderStage 3 — Automation Architect (Months 9–15)├── Build automated cleanup for orphaned resources├── Add cost estimation to your CI/CD pipeline├── Create cost-aware auto-scaling triggers└── Deploy a self-service FinOps dashboardStage 4 — Cloud Financial Manager (Months 16–24)├── Lead monthly FinOps reviews with engineering leadership├── Build chargeback models for departments├── Negotiate enterprise agreements with AWS└── Forecast cloud spend within 5% varianceStage 1: The Cost-Aware Engineer — Months 1 to 3
1.1 Reading the Bill Like an Engineer, Not an Accountant
# Pull last month's cost breakdown grouped by service# Run this before touching any optimisation — this is your baselineaws ce get-cost-and-usage \ --time-period Start=\((date -d 'last month' +%Y-%m-01),End=\)(date +%Y-%m-01) \ --granularity MONTHLY \ --group-by Type=DIMENSION,Key=SERVICE \ --metrics UnblendedCost \ --query 'ResultsByTime[0].Groups[*].{ Service:Keys[0],Cost:Metrics.UnblendedCost.Amount}' \ --output table | sort -k3 -rnaws-baseline-YYYY-MM.txt. You'll compare every future month against this number. Without a baseline, you can't measure progress — and without measurable progress, you can't make the case to leadership that the work is worth engineering time.Three questions for every service in your top 5:
1.2 The Tagging Strategy That Actually Survives
# These six tags enable cost attribution, accountability, and automated remediation# Add these to every resource in your AWS account — EC2, RDS, S3, Lambda, everythingEnvironment: "production" | "staging" | "dev"Team: "platform" | "backend" | "data" | "ml"Service: "payment-api" | "fraud-detection" | "user-service"Owner: "[email protected]" # Person responsible for this resourceCostCenter: "engineering" # For chargeback reportingAutoShutdown: "true" | "false" # Enables automated remediation# variables.tf# Add this to your Terraform root module# Any plan that creates a resource without these tags will fail validationvariable "required_tags" { description = "Tags required on every resource in this account" type = map(string) validation { condition = contains(keys(var.required_tags), "Environment") && contains(keys(var.required_tags), "Team") && contains(keys(var.required_tags), "Owner") error_message = "required_tags must include Environment, Team, and Owner." }}# Apply in every resourceresource "aws_instance" "app_server" { ami = data.aws_ami.amazon_linux.id instance_type = "t3.medium" tags = merge(var.required_tags, { Name = "app-server-${ var.environment}" Service = "payment-api" })}# List EC2 instances missing the Team tag# Run this weekly until you hit zero resultsaws ec2 describe-instances \ --query "Reservations[].Instances[?!not_null(Tags[?Key=='Team'].Value | [0])].[InstanceId, InstanceType, State.Name]" \ --output table1.3 The Cost-Aware Code Review
## Cost Impact (required for infrastructure and data changes)- [ ] This change does not affect cloud resource usage- [ ] New API calls introduced: estimated cost per call $______, calls/month ______- [ ] New data storage: estimated monthly delta $______- [ ] Cross-region data transfer introduced: yes / no- [ ] New external service dependency with per-call pricing: yes / noIf any box other than the first is checked, add a cost estimate before requesting review.Stage 1 Outcomes
Stage 2: The Optimisation Specialist — Months 4 to 8
2.1 Right-Sizing: The 80/20 of Cloud Savings
# rightsize_analyzer.py# Finds EC2 instances running below 20% average CPU for 14 days# These are right-sizing candidates — not automatic deletionsimport boto3from datetime import datetime, timedeltadef find_oversized_instances(region='us-east-1'): """ Returns instances with average CPU below 20% for the last 14 days. Low CPU alone doesn't mean right-size — check memory too if CW agent installed. """ ec2 = boto3.client('ec2', region_name=region) cw = boto3.client('cloudwatch', region_name=region) reservations = ec2.describe_instances( Filters=[{ 'Name': 'instance-state-name', 'Values': ['running']}] )['Reservations'] candidates = [] for r in reservations: for inst in r['Instances']: iid = inst['InstanceId'] itype = inst['InstanceType'] tags = { t['Key']: t['Value'] for t in inst.get('Tags', [])} # Pull 14-day average CPU from CloudWatch stats = cw.get_metric_statistics( Namespace='AWS/EC2', MetricName='CPUUtilization', Dimensions=[{ 'Name': 'InstanceId', 'Value': iid}], StartTime=datetime.utcnow() - timedelta(days=14), EndTime=datetime.utcnow(), Period=1209600, # One 14-day period Statistics=['Average'] )['Datapoints'] avg_cpu = stats[0]['Average'] if stats else 0.0 if avg_cpu < 20.0: candidates.append({ 'instance_id': iid, 'instance_type': itype, 'avg_cpu_pct': round(avg_cpu, 1), 'environment': tags.get('Environment', 'unknown'), 'owner': tags.get('Owner', 'unknown'), 'team': tags.get('Team', 'unknown'), }) return sorted(candidates, key=lambda x: x['avg_cpu_pct'])if __name__ == '__main__': results = find_oversized_instances() print(f"\nFound { len(results)} right-sizing candidates:\n") for r in results: print(f" { r['instance_id']} ({ r['instance_type']}) — " f"{ r['avg_cpu_pct']}% avg CPU — " f"owner: { r['owner']}")2.2 Storage Tiering: Stop Paying Retail for Cold Data
{ "Rules": [ { "ID": "application-logs-lifecycle", "Status": "Enabled", "Filter": { "Prefix": "logs/"}, "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA"}, { "Days": 90, "StorageClass": "GLACIER_IR"}, { "Days": 365, "StorageClass": "DEEP_ARCHIVE"} ], "Expiration": { "Days": 2555}, "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7} }, { "ID": "training-checkpoints-lifecycle", "Status": "Enabled", "Filter": { "Prefix": "ml-checkpoints/"}, "Transitions": [ { "Days": 7, "StorageClass": "STANDARD_IA"}, { "Days": 30, "StorageClass": "GLACIER_IR"} ], "Expiration": { "Days": 90} } ]}# Apply the lifecycle policy to a bucketaws s3api put-bucket-lifecycle-configuration \ --bucket your-logs-bucket \ --lifecycle-configuration file://lifecycle.json# Verify it applied correctlyaws s3api get-bucket-lifecycle-configuration \ --bucket your-logs-bucket2.3 Savings Plans: The Sequence Is Everything
Month 1-2: Right-size all instances using VPA and CloudWatch dataMonth 3: Move staging and development to Spot instancesMonth 4: Migrate compatible workloads to Graviton (20% cheaper)Month 5: Add VPC endpoints to eliminate NAT Gateway chargesMonth 6: THEN look at your steady-state On-Demand spendMonth 6+: Purchase Savings Plans covering 70% of that optimised baseline# Get your On-Demand EC2 spend for the last 30 days# This is your rightsized baseline — the number to commit againstaws ce get-cost-and-usage \ --time-period Start=\((date -d '30 days ago' +%Y-%m-%d),End=\)(date +%Y-%m-%d) \ --granularity DAILY \ --filter '{ "And": [ { "Dimensions": { "Key": "SERVICE", "Values": ["Amazon Elastic Compute Cloud - Compute"]}}, { "Dimensions": { "Key": "PURCHASE_TYPE", "Values": ["On-Demand"]}} ] }' \ --metrics UnblendedCost \ --query 'ResultsByTime[*].{ Date:TimePeriod.Start,Cost:Total.UnblendedCost.Amount}' \ --output table# Get AWS's own recommendation for what to commitaws savingsplans get-savings-plans-purchase-recommendation \ --savings-plans-type COMPUTE_SP \ --term-in-years ONE_YEAR \ --payment-option NO_UPFRONT \ --lookback-period-in-days THIRTY_DAYSStage 3: The Automation Architect — Months 9 to 15
3.1 The Orphaned Resource Problem — And Why It Never Fixes Itself
# orphan_reporter.py# Runs every Sunday via EventBridge → Lambda# Posts a Slack report of orphaned resources for human review# DOES NOT auto-delete — deletion requires a human decisionimport boto3import jsonimport urllib.requestfrom datetime import datetime, timedelta, timezoneSLACK_WEBHOOK = 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'UNATTACHED_VOLUME_AGE_DAYS = 14SNAPSHOT_AGE_DAYS = 90def find_orphaned_resources(): ec2 = boto3.client('ec2') report = { 'monthly_waste_usd': 0, 'items': []} # Unattached EBS volumes for vol in ec2.describe_volumes( Filters=[{ 'Name': 'status', 'Values': ['available']}] )['Volumes']: age = (datetime.now(timezone.utc) - vol['CreateTime']).days if age >= UNATTACHED_VOLUME_AGE_DAYS: cost = round(vol['Size'] * 0.08, 2) # gp3 rate tags = { t['Key']: t['Value'] for t in vol.get('Tags', [])} report['items'].append({ 'type': 'Unattached EBS Volume', 'id': vol['VolumeId'], 'detail': f"{ vol['Size']}GB { vol['VolumeType']} — { age} days old", 'owner': tags.get('Owner', 'unknown'), 'monthly_cost_usd': cost, }) report['monthly_waste_usd'] += cost # Unassociated Elastic IPs for addr in ec2.describe_addresses()['Addresses']: if 'AssociationId' not in addr: report['items'].append({ 'type': 'Unassociated Elastic IP', 'id': addr.get('AllocationId', addr['PublicIp']), 'detail': addr['PublicIp'], 'owner': 'unknown', 'monthly_cost_usd': 3.60, }) report['monthly_waste_usd'] += 3.60 # Old snapshots cutoff = (datetime.now(timezone.utc) - timedelta(days=SNAPSHOT_AGE_DAYS)).isoformat() for snap in ec2.describe_snapshots(OwnerIds=['self'])['Snapshots']: if snap['StartTime'].isoformat() < cutoff: cost = round(snap.get('VolumeSize', 0) * 0.05, 2) report['items'].append({ 'type': f'Snapshot ({ SNAPSHOT_AGE_DAYS}+ days old)', 'id': snap['SnapshotId'], 'detail': f"Created { snap['StartTime'].strftime('%Y-%m-%d')}", 'owner': 'unknown', 'monthly_cost_usd': cost, }) report['monthly_waste_usd'] += cost return reportdef post_to_slack(report): lines = [ f":money_with_wings: *Weekly Orphaned Resource Report*", f"Found *{ len(report['items'])} orphaned resources* " f"costing *${ report['monthly_waste_usd']:.2f}/month*\n", ] for item in report['items'][:20]: # Cap at 20 lines to stay readable lines.append( f"• `{ item['type']}` { item['id']} — { item['detail']} " f"— *${ item['monthly_cost_usd']:.2f}/mo* — owner: { item['owner']}" ) lines.append("\nReview and delete anything no longer needed.") req = urllib.request.Request( SLACK_WEBHOOK, data=json.dumps({ 'text': '\n'.join(lines)}).encode(), headers={ 'Content-Type': 'application/json'} ) urllib.request.urlopen(req)def lambda_handler(event, context): report = find_orphaned_resources() post_to_slack(report) return { 'items_found': len(report['items']), 'monthly_waste': report['monthly_waste_usd'], }3.2 Cost Estimation in Your CI/CD Pipeline
# .github/workflows/cost-check.yml# Runs on any PR that touches infrastructure files# Uses Infracost to estimate the monthly cost deltaname: Infrastructure Cost Checkon: pull_request: paths: - 'terraform/**' - 'infrastructure/**' - '*.tf'jobs: cost-estimate: name: Estimate monthly cost change runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Infracost uses: infracost/actions/setup@v3 with: api-key: ${ { secrets.INFRACOST_API_KEY }} - name: Generate cost estimate run: | infracost breakdown \ --path terraform/ \ --format json \ --out-file /tmp/infracost.json - name: Post cost diff to PR uses: infracost/actions/comment@v3 with: path: /tmp/infracost.json behavior: update - name: Block if monthly increase exceeds threshold run: | MONTHLY_DELTA=$(cat /tmp/infracost.json | \ jq '.projects[0].diff.totalMonthlyCost' | tr -d '"') echo "Estimated monthly cost change: \$$MONTHLY_DELTA" # Fail the PR if this change adds more than $500/month python3 -c " import sys delta = float('$MONTHLY_DELTA') if delta > 500: print(f'PR blocked: estimated +\\({ delta:.2f}/month exceeds \\)500 threshold') sys.exit(1) else: print(f'Cost check passed: estimated +\${ delta:.2f}/month') "Stage 4: The Cloud Financial Manager — Months 16 to 24
4.1 Leading FinOps Reviews with Executives
# unit_economics.py# Calculate cost per transaction — the metric that matters to leadershipimport boto3from datetime import datetime, timedeltadef calculate_cost_per_transaction(service_name, transaction_count, days_back=30): """ Returns cost per transaction for a given service over the last N days. transaction_count: total transactions for the same period (from your metrics) """ ce = boto3.client('ce') response = ce.get_cost_and_usage( TimePeriod={ 'Start': (datetime.now() - timedelta(days=days_back)).strftime('%Y-%m-%d'), 'End': datetime.now().strftime('%Y-%m-%d'), }, Granularity='MONTHLY', Metrics=['UnblendedCost'], Filter={ 'Tags': { 'Key': 'Service', 'Values': [service_name] } } ) total_cost = sum( float(period['Total']['UnblendedCost']['Amount']) for period in response['ResultsByTime'] ) cost_per_txn = total_cost / transaction_count if transaction_count > 0 else 0 return { 'service': service_name, 'period_days': days_back, 'total_cost_usd': round(total_cost, 2), 'transactions': transaction_count, 'cost_per_txn_usd': round(cost_per_txn, 6), }# Example: payment service processed 4.2M transactions this monthresult = calculate_cost_per_transaction('payment-api', 4_200_000)print(f"Cost per transaction: ${ result['cost_per_txn_usd']:.6f}")print(f"Total infrastructure cost: ${ result['total_cost_usd']:,.2f}")4.2 The Chargeback and Showback Models
# showback_report.py# Generates monthly cost-by-team report for distribution to engineering leadsimport boto3from datetime import datetimedef generate_team_showback(): ce = boto3.client('ce') response = ce.get_cost_and_usage( TimePeriod={ 'Start': datetime.now().replace(day=1).strftime('%Y-%m-%d'), 'End': datetime.now().strftime('%Y-%m-%d'), }, Granularity='MONTHLY', Metrics=['UnblendedCost'], GroupBy=[ { 'Type': 'TAG', 'Key': 'Team'}, { 'Type': 'DIMENSION', 'Key': 'SERVICE'}, ] ) by_team = { } for group in response['ResultsByTime'][0].get('Groups', []): team = group['Keys'][0].replace('Team$', '') or 'untagged' service = group['Keys'][1] cost = float(group['Metrics']['UnblendedCost']['Amount']) if team not in by_team: by_team[team] = { 'total': 0, 'services': { }} by_team[team]['total'] += cost by_team[team]['services'][service] = round(cost, 2) # Print sorted by total cost descending print(f"\n{ '='*52}") print(f" Month-to-Date Cloud Spend by Team") print(f" Generated: { datetime.now().strftime('%Y-%m-%d')}") print(f"{ '='*52}\n") for team, data in sorted(by_team.items(), key=lambda x: x[1]['total'], reverse=True): print(f" { team:<20} ${ data['total']:>10,.2f}/month") top_services = sorted(data['services'].items(), key=lambda x: x[1], reverse=True)[:3] for svc, cost in top_services: print(f" └─ { svc:<30} ${ cost:>8,.2f}") print()generate_team_showback()Essential Tools and Certifications
Stage Tool Why It Matters 1 AWS Cost Explorer Free, built-in, the starting point for all cost analysis 1 AWS CLI cecommandsScriptable cost queries — dashboards can't be automated 2 AWS Compute Optimizer ML-powered rightsizing recommendations for EC2 and RDS 2 VPA (Kubernetes) Pod-level rightsizing recommendations using actual usage 3 Infracost PR-level cost estimation for Terraform changes 3 AWS Budgets Proactive alerts — catches problems before the monthly invoice 4 AWS Cost and Usage Report + Athena SQL-level billing analysis at any granularity 4 CloudHealth or Vantage Multi-account, multi-cloud cost management Your 90-Day Action Plan
Month 1 — Foundation:
Month 2 — Quick Wins:
Month 3 — Automation and Habits:
Best Practices Summary
Resources
- 最近发表
-
- software architecture
- The REST API Handbook – How to Build, Test, Consume, and Document REST APIs
- Learn JavaScript for Beginners – JS Basics Handbook
- Key Technical Design Decisions for Building an Educational App with LLMs
- How to Test a Complex Full
- How to Build a PostgreSQL
- How to Build a Browser
- Command Line for Beginners – How to Use the Terminal Like a Pro [Full Handbook]
- Business Tools for Startup Success
- How to Preprocess Medical Images for Machine Learning – A Guide Using Chest X
- 随机阅读
-
- Task Management System
- How to Start your Career in Tech with freeCodeCamp
- The AI Chatbot Handbook – How to Build an AI Chatbot with Redis, Python, and GPT
- Command Line for Beginners – How to Use the Terminal Like a Pro [Full Handbook]
- Data Import Wizard
- Backend Challenges Teams Face When Processing Repeat Payments
- Open Source Tools Every STEM Student Should Know About
- Key Technical Design Decisions for Building an Educational App with LLMs
- How the Mixture of Experts Architecture Works in AI Models
- How to Scale Laravel Applications for High
- How Attribute
- How to Start your Career in Tech with freeCodeCamp
- How to Become a Full
- Database Version Control with Liquibase and Spring Boot
- How Attribute
- How to Build a PostgreSQL
- software architecture
- How to Start your Career in Tech with freeCodeCamp
- How to Preprocess Medical Images for Machine Learning – A Guide Using Chest X
- Database Version Control with Liquibase and Spring Boot
- 搜索
-