DevOps Metrics, in Plain English: A Scorecard to Compare Health vs Noise (iOS + Google)

DevOps metrics are supposed to reduce arguments, not start new ones. But if you’ve ever stared at a dashboard on your iPhone and thought “is this good or just… a lot of charts?”, you’re not alone.

Balancing scale comparing two dashboard gauge dials

This guide explains the most common metrics and terms in plain English, using a scorecard format so you can compare “helpful signal” vs “busywork.”

One sentence to keep in mind: a metric is only useful if it changes what you do next.

Before we start, quick context: on iOS you’ll often view these in mobile-friendly views of Google tools (Google Cloud Monitoring dashboards, Looker Studio reports, Google Sheets scorecards, or even shared links in Google Chat). The concepts stay the same.

The scorecard: how to judge any DevOps metric (0–2 points each)

Use this as a lightweight rubric. A “good” metric doesn’t need to be fancy—it needs to be decision-ready.

User impact clarity: 0 = unclear, 1 = indirect, 2 = directly maps to user experience
Actionability: 0 = interesting, 1 = sometimes actionable, 2 = has a clear owner and playbook
Gaming resistance: 0 = easy to “improve” without real improvement, 1 = somewhat gameable, 2 = hard to fake
Stability: 0 = noisy, 1 = moderate noise, 2 = stable enough to compare week to week
Cost to measure: 0 = expensive/complex, 1 = moderate, 2 = cheap and low-maintenance

If a metric totals 8–10, it belongs on a shared scorecard. If it’s 5–7, keep it for debugging. If it’s 0–4, treat it as a vanity chart until proven otherwise.

SLO, SLI, and SLA: the “promise stack” (and what each one is for)

These get mixed up constantly, so here’s the simplest way to separate them.

Stacked blocks with icons for indicator, objective, agreement

SLI (Service Level Indicator): the raw measurement. Example: “% of requests under 300 ms” or “% of successful logins.”
SLO (Service Level Objective): the target you aim for. Example: “99.9% of logins succeed over 28 days.”
SLA (Service Level Agreement): the external contract (often with penalties). Example: “If uptime drops below 99.9%, customer gets credits.”

Scorecard note: SLOs are usually high-scoring because they connect technical reality to a clear “are we meeting expectations?” question.

A useful mental model: SLI is the thermometer, SLO is the healthy range, SLA is the paperwork.

Error budget: the “permission slip” to change things

Error budget sounds negative, but it’s actually how you keep reliability work from becoming endless fear of shipping.

Plain English: if your SLO is 99.9% over 30 days, you’re allowed 0.1% “badness” in that window. That allowance is the error budget.

Circular meter showing a partially used reliability budget

If you’re spending error budget fast, slow down risky releases and prioritize stability.
If you’re not spending any, you may be over-investing in caution (or your SLO is too easy).

Scorecard note: error budget is hard to game and highly actionable—when it’s tied to real user-facing SLIs.

MTTR, MTTD, and MTBF: incident time metrics without the confusion

These acronyms are everywhere. The trap is treating them like “team grades” instead of system signals.

MTTD (Mean Time To Detect): how long a problem exists before you notice.
MTTR (Mean Time To Restore/Recover): how long until service is back to acceptable.
MTBF (Mean Time Between Failures): average time between incidents.

How to use them without getting misleading results:

Segment by incident type (deploy-related vs third-party vs capacity). One blended MTTR hides the story.
Track medians and percentiles, not just averages. One multi-day incident can distort the mean.
Define “restore”. Is it “alerts stopped” or “user impact resolved”?

Scorecard note: MTTR can be good, but it’s somewhat gameable if teams “stop the bleeding” without fixing underlying causes. Pair it with post-incident follow-through.

DORA metrics, translated: what they really tell you (and what they don’t)

DORA metrics are popular because they’re simple and comparable across teams—but only if you define them carefully.

Four gauge dials representing delivery and reliability metrics

Deployment frequency: how often you release. Useful for seeing bottlenecks, but easy to game by shipping tiny changes.
Lead time for changes: time from “code committed” to “running in production.” Great for spotting review/CI/release friction.
Change failure rate: % of deployments that cause incidents, rollbacks, or hotfixes. Strong quality signal if your definition of “failure” is consistent.
Time to restore service: basically MTTR, but in the DORA set.

Scorecard note: the best pairing is deployment frequency + change failure rate. Speed without safety is chaos; safety without speed can turn into stagnation.

Latency, throughput, errors, saturation: a practical “what’s wrong?” map

If you only remember one diagnostic set, remember these four. They help you ask: is the system slow, busy, failing, or out of room?

Latency: how long requests take. Watch p95/p99, not just average.
Throughput: how much work is happening (requests per second, jobs per minute).
Errors: failures (5xx, failed tasks, exception rate). Define which errors matter to users.
Saturation: how close you are to capacity (CPU, memory, queue depth, DB connections).

Scorecard note: these are excellent for on-call response, but they need context to become leadership metrics. For example, “CPU 85%” is not automatically bad if latency and errors are fine.

Takeaway: build a “one-screen” scorecard you can trust

On iOS, the winning setup is usually a single, mobile-readable page (often a Google dashboard link) that shows: your SLO status, error budget burn, a couple DORA metrics, and the four golden signals for your critical path.

When in doubt, keep metrics that answer: Are users okay? and Do we know what to do next?

If a number can’t do either, it belongs in a debugging view—not the main scoreboard.