Best LLM for Long Context (2026)
Feeding whole books, codebases or document sets to a model needs a large context window — and the cost scales with every token you send. Here are the models with the biggest windows and what they cost to fill them.
Top picks, with price
1. Gemini 2.5 Pro · Google
A 1M-token window and cheap per input token for its tier — great for very long inputs.
$1.25 in
$10.00 out
1M ctx
2. Gemini 2.0 Flash · Google
1M tokens at a tiny price — the cheapest way to process very long inputs.
$0.100 in
$0.400 out
1M ctx
3. GPT-5.4 · OpenAI
Large context with strong reasoning; prompt caching keeps re-reads affordable.
$2.50 in
$15.00 out
400K ctx
4. Claude Sonnet 4.6 · Anthropic
1M context with best-in-class caching — often cheaper in practice for repeated long prompts.
$3.00 in
$15.00 out
1M ctx
What to watch out for
- A big window is expensive to use: 1M input tokens on a flagship can cost several dollars per call.
- Prompt caching is the real long-context cost lever — re-reading a cached document can be 90% cheaper.
- Retrieval (RAG) is usually cheaper and more accurate than stuffing everything into context.
Run your own numbers in the cost calculator, or browse the full price comparison.