← All guides

Input vs Output Tokens: Why Your LLM Bill Is Higher Than You Think

Published 2026-06-25 · 4 min read

Open any LLM pricing page and you will see two numbers: a price for input tokens and a higher price for output tokens. Understanding that gap is the fastest way to cut your bill.

Why output costs more

Input tokens are processed in parallel in a single forward pass. Output tokens are generated one at a time, each requiring a full pass over the model — far more compute per token. Providers price that reality in: output is typically 2–5× the input price. GPT-5.4 is $2.50 in but $15 out; Claude Sonnet 4.6 is $3 in but $15 out.

What it means for you

  • Long prompts are cheaper than long answers. A 10,000-token document in your prompt often costs less than a 2,000-token essay out.
  • Cap output aggressively. A sensible max_tokens protects against runaway generations — the most common cause of a blown budget.
  • Ask for structure, not prose. Requesting JSON or a short list instead of a verbose explanation can halve output tokens with no loss of value.
  • Reasoning models hide output cost. "Thinking" models emit internal tokens you still pay for at the output rate — budget accordingly.

Put a number on it

The cost calculator lets you set input and output tokens independently, so you can see exactly how trimming output changes the monthly total across every model. For the lowest output prices specifically, see the cheapest LLM APIs.

Compare every model's price in the live pricing table, or estimate your own bill with the cost calculator.