What is Throughput?

Throughput is the amount of work a serving system completes per unit of time, such as requests per second, output tokens per second, or total tokens per second.

How It Works

Throughput describes capacity, not just speed. In LLM serving, teams may measure throughput as completed requests per second, generated tokens per second, total input plus output tokens per second, or useful answers per dollar. A system can maximize throughput by batching aggressively, but that may increase latency for individual users. Good capacity planning reports throughput together with latency percentiles, TTFT, input and output lengths, hardware utilization, error rate, and workload mix.

Key Characteristics

  • Measures completed work over time rather than one user's wait time
  • Can be expressed in requests, output tokens, total tokens, or business units
  • Strongly affected by batching, hardware, model size, quantization, and traffic shape
  • Often trades off against latency under high load
  • Requires workload definitions to make benchmarks meaningful

Common Use Cases

  1. Sizing GPU capacity for a production LLM API
  2. Comparing serving engines under the same traffic mix
  3. Evaluating continuous batching and quantization benefits
  4. Planning cost per million tokens or requests
  5. Detecting capacity regressions after model or configuration changes

Example

loading...
Loading code...

Frequently Asked Questions

Is throughput the same as tokens per second?

Tokens per second is one throughput metric, but throughput can also be measured in requests, batches, or useful completed tasks.

Can higher throughput make latency worse?

Yes. Larger batches can improve hardware utilization while making individual requests wait longer.

How should LLM throughput be benchmarked?

Use realistic prompt lengths, output lengths, concurrency, sampling settings, hardware, and latency targets.

Why report latency with throughput?

A high-throughput system may still be unusable if p95 or p99 latency is too high for the product.

Related Tools

Related Terms

Related Articles