Cheapest LLM API That Is Still Good (2026)
When you are processing huge volumes — classification, extraction, summarisation — price per token is everything. These are the lowest-cost capable models on the market, with the real blended cost for a typical request.
Top picks, with price
1. Gemini 1.5 Flash-8B · Google
The cheapest capable model anywhere. Ideal for high-volume classification and extraction.
$0.037 in
$0.150 out
1M ctx
2. Gemini 2.0 Flash · Google
A small step up in quality, still extremely cheap, with a 1M-token context.
$0.100 in
$0.400 out
1M ctx
3. GPT-5.4 nano · OpenAI
OpenAI's cheapest tier — an easy drop-in if you already use OpenAI.
$0.200 in
$1.25 out
400K ctx
4. DeepSeek-V4 Flash · DeepSeek
Flagship-adjacent quality at fast-tier prices. Open weights if you want to self-host later.
$0.140 in
$0.280 out
128K ctx
What to watch out for
- Cheap models cost more if they fail and you have to retry or escalate — measure success rate, not just sticker price.
- Output tokens are 2–5× the input price on most providers; cap max_tokens to control spend.
- For massive batch jobs, check each provider’s Batch API — it is usually ~50% cheaper.
Run your own numbers in the cost calculator, or browse the full price comparison.