Updated 2026-06-26

Compare AI & LLM API prices in one place

Side-by-side API pricing for every major model — GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek and more. Sort by input or output cost per token, see the context window, and find the cheapest model that does the job.

Output price gap
up to 333×
Gemini 1.5 Flash-8B vs Claude Fable 5 — same task, wildly different bill
Cheapest output
$0.150
Gemini 1.5 Flash-8B · per 1M tokens
Biggest context
2M
Grok 4 Fast
Tracked
29
models · 8 providers

LLM API pricing comparison

All prices in USD per 1,000,000 tokens. Click any column to sort — the table defaults to cheapest reference cost first. Click a model to see its full provider pricing.

Model Provider Input /1M Output /1M Cost / call* Context Type
Gemini 1.5 Flash-8B
Cheapest capable model
Google $0.037 $0.150 $0.0002 1M Fast / cheap
Command R7B
Cheapest Cohere tier
Cohere $0.037 $0.150 $0.0002 128K Fast / cheap
Grok 3 mini xAI (Grok) $0.100 $0.300 $0.0004 128K Fast / cheap
Llama 3.1 8B
Open weights
Meta (Llama) $0.200 $0.200 $0.0004 128K Open weights open
DeepSeek-V4 Flash
Non-thinking mode, very cheap
DeepSeek $0.140 $0.280 $0.0004 128K Fast / cheap open
Llama 4 Scout
Open weights (Groq)
Meta (Llama) $0.110 $0.340 $0.0005 128K Open weights open
Gemini 2.0 Flash Google $0.100 $0.400 $0.0005 1M Fast / cheap
Grok 4 Fast
2M context
xAI (Grok) $0.200 $0.500 $0.0007 2M Fast / cheap
Mistral Small 4 Mistral $0.150 $0.600 $0.0007 128K Fast / cheap
Command R Cohere $0.150 $0.600 $0.0007 128K Balanced
Llama 4 Maverick
Open weights (Together AI)
Meta (Llama) $0.270 $0.850 $0.0011 500K Open weights open
Codestral
Code-specialised
Mistral $0.300 $0.900 $0.0012 32K Balanced
DeepSeek-V4 Pro
Thinking mode
DeepSeek $0.435 $0.870 $0.0013 128K Reasoning open
GPT-5.4 nano
Cheapest OpenAI tier
OpenAI $0.200 $1.25 $0.0015 400K Fast / cheap
Grok Code Fast 1
Code-specialised
xAI (Grok) $0.200 $1.50 $0.0017 256K Balanced
Mistral Large 3
Flagship
Mistral $0.500 $1.50 $0.0020 256K Flagship
Llama 3.3 70B
Open weights
Meta (Llama) $1.04 $1.04 $0.0021 128K Open weights open
Gemini 2.5 Flash Google $0.300 $2.50 $0.0028 1M Balanced
GPT-5.4 mini OpenAI $0.750 $4.50 $0.0053 400K Balanced
Claude Haiku 4.5
Fastest, near-frontier
Anthropic $1.00 $5.00 $0.0060 200K Fast / cheap
Mistral Medium 3.5 Mistral $1.50 $7.50 $0.0090 256K Balanced
Gemini 2.5 Pro
1M context (≤200k tier)
Google $1.25 $10.00 $0.011 1M Flagship
Command A
Flagship, RAG-tuned
Cohere $2.50 $10.00 $0.013 256K Flagship
GPT-5.4
Balanced flagship
OpenAI $2.50 $15.00 $0.017 400K Flagship
Claude Sonnet 4.6
Best speed/intelligence; caching cuts input ~90%
Anthropic $3.00 $15.00 $0.018 1M Balanced
Grok 4
Flagship
xAI (Grok) $3.00 $15.00 $0.018 256K Flagship
Claude Opus 4.8
Top Opus reasoning/agentic
Anthropic $5.00 $25.00 $0.030 1M Flagship
GPT-5.5
Flagship
OpenAI $5.00 $30.00 $0.035 400K Flagship
Claude Fable 5
Most capable widely released
Anthropic $10.00 $50.00 $0.060 1M Flagship

*Reference cost of one call with 1,000 input + 1,000 output tokens — a neutral yardstick. Use the calculator for your real usage. Cheapest row highlighted. Tick any models to build a shareable price card.

Go deeper

Pricing by provider

Best LLM for…

Frequently asked questions

How is LLM API pricing calculated?

Almost every LLM API charges per token, split into an input (prompt) price and a usually higher output (completion) price, quoted per 1,000,000 tokens. Your bill is (input tokens × input price) + (output tokens × output price). A token is roughly ¾ of a word.

Which LLM API is cheapest?

For high-volume work the cheapest capable models are Google Gemini 1.5 Flash-8B and 2.0 Flash, OpenAI GPT-5.4 nano, and DeepSeek-V4 Flash. The lowest sticker price is not always the cheapest in practice — a weaker model that needs retries or escalation can cost more overall.

What is the difference between input and output token pricing?

Input tokens are what you send (the prompt, system message, context, documents). Output tokens are what the model generates. Output is typically 2–5× more expensive than input, so capping max output length is the fastest way to cut cost.

Are these prices up to date?

Prices are list prices last verified on 2026-06-26 and link to each provider's official pricing page. The AI market moves fast — always confirm the current price with the provider before committing to volume.

How we keep prices honest. Every number is an official list price last verified on 2026-06-26, and each provider page links to the source. Spotted a stale price? The whole catalog lives in one file so it is fast to refresh.