Input vs Output Token Pricing Explained (2026)

Open any LLM pricing page and you will see two numbers: a price for input tokens and a higher price for output tokens. Understanding that gap is the fastest way to cut your bill.

Why output costs more

Input tokens are processed in parallel in a single forward pass. Output tokens are generated one at a time, each requiring a full pass over the model — far more compute per token. Providers price that reality in: output is typically 2–5× the input price. GPT-5.4 is $2.50 in but $15 out; Claude Sonnet 4.6 is $3 in but $15 out.

What it means for you

Long prompts are cheaper than long answers. A 10,000-token document in your prompt often costs less than a 2,000-token essay out.
Cap output aggressively. A sensible max_tokens protects against runaway generations — the most common cause of a blown budget.
Ask for structure, not prose. Requesting JSON or a short list instead of a verbose explanation can halve output tokens with no loss of value.
Reasoning models hide output cost. "Thinking" models emit internal tokens you still pay for at the output rate — budget accordingly.

Put a number on it

The cost calculator lets you set input and output tokens independently, so you can see exactly how trimming output changes the monthly total across every model. For the lowest output prices specifically, see the cheapest LLM APIs.

Input vs Output Tokens: Why Your LLM Bill Is Higher Than You Think

Why output costs more

What it means for you

Put a number on it