The True Cost of Toil: How to Calculate What Manual Ops Is Costing Your Team

Your 10-person engineering team is spending roughly $490,000 a year on work that a script could do. That’s not a guess. That’s the math, and we’ll show you how to run it for your own team in the next five minutes.
Toil is the quiet tax on every engineering org. It doesn’t show up on your P&L. Nobody tracks it.
But it’s there. Every hand-edited YAML file. Every SSH session to restart a service. Every 45-minute deploy that should take 45 seconds. We’ve audited dozens of small engineering teams and the pattern is always the same: 30-40% of engineering hours burned on work that produces no lasting value.
This article gives you a formula, a one-week audit framework, and a clear path to cut it.
What Counts as Toil (And What Doesn’t)
Google’s SRE team coined the term and defined toil as work that is manual, repetitive, automatable, interrupt-driven, and produces no lasting value. If you do the same thing every week and a script could do it instead, that’s toil.
At small teams, the most common toil we see includes:
- Manual deployments — someone runs a script or clicks through a CI dashboard every time code ships
- Hand-editing configs — YAML, env vars, DNS records updated by copy-paste
- Responding to the same alerts — the disk-full alert that fires every Thursday, the OOM kill on the staging box
- Certificate renewals — someone has a calendar reminder to renew certs manually
- One-off data fixes — running SQL queries to patch bad data because the app doesn’t handle edge cases
- Infrastructure provisioning — clicking through the AWS console to spin up a new environment
What’s NOT toil: on-call rotation (it’s operational work but not automatable in the same way), architecture decisions, debugging novel production issues, or writing documentation. Those require human judgment. Toil doesn’t.
For a deeper breakdown of how to spot toil in your stack, see our guide to identifying toil in your infrastructure.
The Toil Cost Formula
Here’s the formula we use when auditing client teams:
Annual Toil Cost = (Number of Engineers × Average Fully-Loaded Salary × Toil Percentage)
Let’s walk through each variable.
Number of engineers: Count everyone who touches infrastructure or ops tasks, not just people with “DevOps” in their title. At most small SaaS companies, that’s your entire backend team plus whoever gets paged at 2am.
Average fully-loaded salary: In 2026, the average DevOps engineer salary in the US sits around $135,000-$143,000 (Glassdoor, Salary.com). Add 25-30% for benefits, taxes, and overhead. That gets you to roughly $175,000 fully loaded. For backend engineers doing ops on the side, use your actual comp data.
Toil percentage: Google’s SRE teams average 33% toil across the company, based on their quarterly internal surveys. Most small teams we audit fall in the 30-40% range. They simply haven’t invested in automation yet. Be honest with yourself here. If you aren’t sure, track it for a week. We’ll show you how in the next section.
Worked Example
A SaaS startup with 10 engineers, average fully-loaded cost of $175,000, and 35% toil:
10 × $175,000 × 0.35 = $612,500/year in toil
That’s over $600K a year in engineering salary going to work that doesn’t ship features, doesn’t reduce tech debt, and doesn’t make your product better. It just keeps the lights on.
Even a conservative estimate of $140,000 average salary and 30% toil still comes out to $420,000 a year. For a company doing $2-5M in revenue, that’s a meaningful chunk of engineering spend producing zero competitive advantage.
Want us to run this calculation on your actual numbers? Request a free async audit and we’ll send you a written breakdown. No call required.
Beyond Salary: The Hidden Costs You’re Not Counting
The salary math is bad enough. But the real costs are the second-order effects that don’t show up in a spreadsheet.
Engineer Burnout and Turnover
Toil is the fastest path to burnout. Nobody goes into engineering to manually restart services and copy-paste config files. When your best people spend a third of their week on repetitive grind, they start looking for jobs where they don’t have to.
Replacing an engineer costs 50-200% of their annual salary. Recruiting, onboarding, the productivity dip during ramp-up, the institutional knowledge that walks out the door. At $175K fully-loaded, that’s $87K-$350K per departure.
Lose one engineer per year to toil-driven burnout and your real toil cost just doubled.
Slower Time-to-Market
Every hour on toil is an hour not shipping features. The DORA 2025 report found that teams with higher deployment frequency and lower lead time consistently outperform their peers on business metrics. But you can’t deploy more frequently when every deployment requires 45 minutes of manual steps. Your competitors who’ve automated their pipelines are shipping daily while you’re shipping weekly.
Human Error and Incidents
Manual processes are error-prone processes. We had a client where a mistyped environment variable in a hand-edited config file took down production for four hours. An automated deploy with validated templates would have caught it before it shipped. When you estimate your toil cost, add the cost of the incidents that wouldn’t have happened if the process was automated.
Context-Switching Tax
Toil is interrupt-driven by nature. A deploy request, an alert, a “can you quickly provision this?” Slack message. Research from UC Irvine (Gloria Mark) found it takes an average of 23 minutes to refocus after an interruption. If an engineer gets pulled into three toil tasks during a day of focused development work, they’ve lost over an hour just to context-switching, on top of the time the tasks themselves take.
How to Run a Toil Audit in One Week
You can’t fix what you don’t measure. Here’s the five-day framework we use with clients.
Monday-Tuesday: Track everything. Have every engineer log their tasks in 30-minute blocks. Use a simple spreadsheet with three columns: time, task description, and category (engineering or toil). Don’t overthink the categories yet. Just capture the data.
Wednesday: Categorize. Go through the logs and tag each task. If it’s manual, repetitive, and something a script could do, it’s toil. If it requires human judgment, creativity, or problem-solving for a novel issue, it’s engineering work. Be honest. “Reviewing the deploy” is engineering work. “Running the deploy script and watching the output” is toil.
Thursday: Calculate. Add up the toil hours per person. Divide by total hours to get your toil percentage. Multiply against your salary data using the formula above. Now you have a number.
Friday: Prioritize. Sort your toil items by hours × frequency. The task that takes 30 minutes and happens five times a week is costing you more than the task that takes two hours and happens once a month. Pick the top three. Those are your automation targets.
The goal isn’t zero toil. That’s unrealistic. Google targets 50% maximum, and their SRE teams average 33%. For a small team without dedicated ops, we recommend targeting under 25%. Getting from 35% to 25% on a 10-person team saves roughly $175,000 a year.
What to Do Once You Know Your Number
You’ve got the number. Now what? You have three options, and they’re not mutually exclusive.
Automate the Top 3 Toil Sources
Start with the highest-impact items from your audit. The usual suspects for small SaaS teams:
- CI/CD pipeline — automate builds, tests, and deploys end-to-end. This alone typically saves 5-10 hours per week across the team.
- Infrastructure provisioning — Terraform or Pulumi for environments. Stop clicking through the AWS console.
- Monitoring and alerting — proper thresholds, runbook links in alerts, auto-remediation for known issues. Stop waking people up for the same disk-full alert every week.
The catch with DIY automation: it has its own cost. Two engineers spending three months building automation is roughly $87K in salary before you see any return. The ROI is real, usually 3-4x in the first year. But the payback period matters when you’re a small team that needs to ship features today.
Outsource What You Can’t Automate Yet
This is where fractional DevOps comes in. A senior DevOps consultant on retainer implements the automation, handles the remaining toil, and trains your team. The math is straightforward. A $3K-5K/month retainer is $36K-60K/year. If your toil cost is $400K+, you’re paying a fraction of the problem cost for someone who can actually fix it.
We had a client, a 12-person SaaS company, where four engineers were each spending about 15 hours a week on deploys, config changes, and incident response. That’s roughly $280K a year in toil. A $4K/month retainer paid for itself in the first month. Within 90 days, those 15 hours dropped to about two hours per engineer per week.
For more on whether hiring full-time makes sense for your team size, see our breakdown of whether you actually need a full-time DevOps engineer.
Set a Toil Budget and Track It
Make toil visible by tracking it monthly. Set a target (we recommend under 25% for small teams), measure against it, and treat regressions like you’d treat a performance regression in your app. If toil percentage starts creeping back up, that’s a signal that you’ve taken on new services without investing in the automation to support them.
Google’s SRE teams do quarterly toil surveys and explicitly plan work to reduce toil when it exceeds their target. You don’t need Google’s process. You need a monthly 10-minute check-in: “What percentage of our time went to toil this month? Is it going up or down?”
The ROI of Toil Reduction
For the spreadsheet people (we respect you), here’s the ROI formula:
ROI = (Annual Toil Cost Saved - Investment) / Investment × 100
Example Scenarios
DIY automation investment:
- Investment: $87K (2 engineers × 3 months)
- Annual toil savings: $175K (10-point reduction on a 10-person team)
- Year 1 ROI: 101%
- Payback period: ~6 months
Fractional DevOps retainer:
- Investment: $48K/year ($4K/month retainer)
- Annual toil savings: $175K+
- Year 1 ROI: 265%
- Payback period: ~3 months
The retainer has faster payback because you’re buying expertise that already exists rather than building it from scratch. You’re also not pulling two engineers off feature work for three months.
FAQ
How much toil is normal? Google’s SRE teams average 33%. Most small engineering teams we audit fall in the 30-40% range. Aim for under 25%.
Can AI tools reduce toil? The DORA 2025 report found that AI coding assistants increase individual output by 21%, but organizational delivery metrics stay flat. AI helps with coding tasks but doesn’t meaningfully reduce operational toil like deploys, config management, and incident response. That still requires infrastructure automation.
What’s the difference between toil and operational work? All toil is operational work, but not all operational work is toil. On-call, incident response to novel issues, and capacity planning require human judgment. Toil is the subset that is repetitive, manual, and automatable. For a complete breakdown, see our guide to DevOps toil.
Should I automate everything or hire someone? Neither. Automate the highest-frequency tasks first (CI/CD, provisioning). For everything else, a fractional DevOps retainer gets you senior expertise without a $180K hire. If you’re consistently at 40+ engineers, then a full-time hire starts to make sense.
Start With the Number
The hardest part of reducing toil is admitting how much of it you have. Run the audit. Do the math. Once you see that your team is spending $400K-$600K a year on work that doesn’t move the product forward, the decision to invest in fixing it becomes obvious.
Most teams we work with see their first meaningful toil reduction within 30 days. The automation compounds. The engineers get happier. The product ships faster. It starts with knowing your number.
Want us to calculate your toil cost and identify the top three fixes? Get a free async audit. We’ll send you a Loom walkthrough and written report with specific recommendations. No call required.
Related Articles
How to Identify Toil in Your Infrastructure: A Practical Checklist
Your team is losing 10-15 hours a week to work that should be automated. The problem isn’t that nobody cares. It’s that nobody’s counting.
Toil hides in plain sight. It’s the deploy script someone runs manually every afternoon. The alert that fires every Thursday at 3am because the disk fills up. The DNS record change that requires SSH’ing into a box and editing a file by hand. Each task feels small. But add them up across your team and you’re looking at 30-40% of engineering time burned on work that produces zero lasting value.
Case Study: How We Reduced One Client's Toil by 60%
One of the engineers on the team I’m going to describe was spending roughly three hours a day not engineering. He was SSHing into servers to run deploys, digging through log files via tunnels, triaging alerts that didn’t mean anything, and resetting staging environments that never stayed stable. He was good at his job. He was also quietly looking at job listings.
How to Build a Toil Reduction Roadmap
The DORA 2024 report dropped a finding that should have caused a minor crisis in every engineering org: toil rose to 30% of engineering time, up from 25% the year before. That’s the first increase in five years, and it happened while teams were actively adopting AI tools and automation platforms. More tooling, more toil. Something isn’t working.
How mature is your DevOps?
Take our free assessment. Get a maturity score across 5 dimensions and specific recommendations — written by an engineer, not a bot.
Free DevOps AssessmentGet DevOps insights in your inbox
No spam. Unsubscribe anytime.


