Hourglass formed from network nodes representing API timing
APIs can feel mysterious because the numbers sound technical: p95 latency, 429s, quotas, retries, timeouts. This guide translates the most common API metrics and terms into plain English, using a simple iPhone + Google-app mental model: a mobile client calling services across the network.

Think “what happened, how often, and how bad was it?”

The goal isn’t to turn you into an SRE. It’s to help you read a dashboard or incident summary and quickly understand what it’s actually saying.

First: a simple picture of an API request (so metrics make sense)

Arrow loop around a server cube showing request cycle
An API request is one round trip: the app sends a request, the server processes it, and the app receives a response. Most metrics are just measurements of one of those steps.

On iOS (including the Google app), the “felt” experience can also be shaped by radio conditions (Wi‑Fi vs cellular), backgrounding, and DNS/TLS setup—so the same API can look different depending on where and how it’s called.

When you see a metric, ask: is it measuring the client side, the server side, or the network in between?

Latency, response time, and “why p95 matters more than average”

Latency / response time is how long a request takes from start to finish. If an API “feels slow,” this is usually the first number people quote.

Average latency is the mean. It’s easy to compute, and easy to lie to you by accident—one extremely slow or extremely fast batch can skew it.

Median (p50) is the “typical” request. Half are faster, half are slower.

Comet with long tail symbolizing tail latency percentiles
p95 / p99 are the “tail” latencies: 95% (or 99%) of requests are faster than this number, but a small slice is much slower. On mobile, that slow slice is often what users remember (the spinner, the retry, the “something went wrong”).

  • If p50 is fine but p95 is bad, your system is usually “mostly okay, sometimes painful.” Look for sporadic network issues, overloaded shards, GC pauses, cold caches, or retries stacking.
  • If p50 and p95 are both bad, you likely have a broad slowdown (capacity, a slow dependency, a bad release, or a widespread network problem).

Status codes: what 200, 304, 401, 403, 404, 429, and 5xx really mean

Status codes are the API’s short way of saying “here’s what happened.” They’re not just developer trivia—they’re how you separate “bad request,” “blocked,” “rate-limited,” and “server down.”

  • 200 OK: The request worked.
  • 304 Not Modified: Usually cache-related. The client asked “has it changed?” and the server said “no.” Often good news (less data transferred).
  • 401 Unauthorized: Missing/expired auth. Often fixable by refreshing tokens or re-auth.
  • 403 Forbidden: Auth was understood, but access is denied (policy, permissions, region, org rules).
  • 404 Not Found: The endpoint/resource isn’t there. Can be a bad URL, deleted object, or wrong environment.
  • 429 Too Many Requests: Rate limit hit. The client is sending too much too fast (or quota is too low).
  • 5xx (500/502/503/504): Server-side failure. Could be your service, or something it depends on.

Plain-English shortcut: 4xx = the request wasn’t acceptable (often client/config/auth). 5xx = the server couldn’t fulfill it (capacity/bugs/dependencies).

Error rate vs failure rate vs “is it actually impacting users?”

Error rate is the percentage of requests that return an error (commonly non-2xx). But dashboards vary: some count 3xx as success, others don’t; some treat 404s as “expected,” others treat them as errors.

Failure rate is often a stricter idea: requests that did not accomplish the user goal. Example: a 200 with an empty payload might still be a “failure” for a feature, even though HTTP says “OK.”

User impact is the part people care about: did the app successfully do the thing? On iOS, you can have low server error rate but high perceived failure because of timeouts, offline conditions, or aggressive client-side time limits.

When you read an incident note, look for the definition: “errors” according to what rule?

Rate limits, quotas, and throttling (429s without the drama)

Partially closed valve representing API throttling and rate limits
These terms all mean “there’s a cap.” The difference is the time window and who enforces it.

  • Rate limit: A short-term cap (per second/minute). Exceed it and you’ll see 429 or slowed responses.
  • Quota: A longer-term allowance (per day/month/project). You can be “fine” for hours and then suddenly blocked.
  • Throttling: The system intentionally slows or rejects requests to protect itself or keep fairness.

What to check in plain English: is the app sending bursts (scrolling, auto-refresh, background fetch), or did traffic spike (release, campaign, bug loop)?

Timeouts, retries, and why they can make things worse

Timeout means the client or a proxy gave up waiting. On mobile, timeouts can happen even if the server is “fine,” because the network is unstable or the phone switched networks.

Retry means trying again after failure. Retries are helpful when failures are random—but harmful when the server is overloaded, because retries add extra load at the worst time.

  • Retry storm: lots of clients retrying at once, multiplying traffic.
  • Backoff: waiting longer between retries (usually exponentially) to reduce pileups.
  • Jitter: adding randomness so clients don’t all retry at the same second.

A practical read: if you see rising latency and rising request volume at the same time, retries may be amplifying the issue.

A quick checklist: when an API “feels slow” on iOS, what numbers to read first

  • p95 latency (not just average): are “slow cases” getting worse?
  • Timeout rate: are requests failing because the client gives up?
  • 5xx rate: is the server failing under load or due to a bad dependency?
  • 429 rate: are you being rate-limited?
  • Request volume: did traffic spike, or did a bug cause a loop?
  • Payload size: are responses suddenly heavier (images, large JSON)?
  • Dependency latency (DB, auth, cache): is “your API” waiting on something else?

If you only have time for one sanity check: compare p95 latency and error/timeout rate before and after the reported time.

Takeaway: translate metrics into a sentence you’d tell a teammate

Good API monitoring is mostly good translation. Turn charts into a plain-English statement like: “Most requests are fast, but 5% are timing out on cellular,” or “We’re healthy, but we’re rate-limiting bursts and need backoff.”

Once you can say it clearly, the next troubleshooting step usually becomes obvious.