Guides

Cheapest LLM API That Is Still Good (2026)

When you are processing huge volumes — classification, extraction, summarisation — price per token is everything. These are the lowest-cost capable models on the market, with the real blended cost for a typical request.

Top picks, with price

1. Gemini 1.5 Flash-8B · Google

The cheapest capable model anywhere. Ideal for high-volume classification and extraction.

$0.037 in

$0.150 out

1M ctx

Google pricing →

2. Gemini 2.0 Flash · Google

A small step up in quality, still extremely cheap, with a 1M-token context.

$0.100 in

$0.400 out

1M ctx

Google pricing →

3. GPT-5.4 nano · OpenAI

OpenAI's cheapest tier — an easy drop-in if you already use OpenAI.

$0.200 in

$1.25 out

400K ctx

OpenAI pricing →

4. DeepSeek-V4 Flash · DeepSeek

Flagship-adjacent quality at fast-tier prices. Open weights if you want to self-host later.

$0.140 in

$0.280 out

128K ctx

DeepSeek pricing →

What to watch out for

Cheap models cost more if they fail and you have to retry or escalate — measure success rate, not just sticker price.
Output tokens are 2–5× the input price on most providers; cap max_tokens to control spend.
For massive batch jobs, check each provider’s Batch API — it is usually ~50% cheaper.

Run your own numbers in the cost calculator, or browse the full price comparison.

Related guides

Coding Cheapest Long context Reasoning