Updated 2026-06-26

Compare AI & LLM API prices in one place

Side-by-side API pricing for every major model — GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek and more. Sort by input or output cost per token, see the context window, and find the cheapest model that does the job.

Estimate my monthly cost → Jump to the table

Output price gap

up to 333×

Gemini 1.5 Flash-8B vs Claude Fable 5 — same task, wildly different bill

Cheapest output

$0.150

Gemini 1.5 Flash-8B · per 1M tokens

Biggest context

Grok 4 Fast

Tracked

models · 8 providers

LLM API pricing comparison

All prices in USD per 1,000,000 tokens. Click any column to sort — the table defaults to cheapest reference cost first. Click a model to see its full provider pricing.

	Model	Provider	Input /1M	Output /1M	Cost / call*	Context	Type
	Gemini 1.5 Flash-8B Cheapest capable model	Google	$0.037	$0.150	$0.0002	1M	Fast / cheap
	Command R7B Cheapest Cohere tier	Cohere	$0.037	$0.150	$0.0002	128K	Fast / cheap
	Grok 3 mini	xAI (Grok)	$0.100	$0.300	$0.0004	128K	Fast / cheap
	Llama 3.1 8B Open weights	Meta (Llama)	$0.200	$0.200	$0.0004	128K	Open weights open
	DeepSeek-V4 Flash Non-thinking mode, very cheap	DeepSeek	$0.140	$0.280	$0.0004	128K	Fast / cheap open
	Llama 4 Scout Open weights (Groq)	Meta (Llama)	$0.110	$0.340	$0.0005	128K	Open weights open
	Gemini 2.0 Flash	Google	$0.100	$0.400	$0.0005	1M	Fast / cheap
	Grok 4 Fast 2M context	xAI (Grok)	$0.200	$0.500	$0.0007	2M	Fast / cheap
	Mistral Small 4	Mistral	$0.150	$0.600	$0.0007	128K	Fast / cheap
	Command R	Cohere	$0.150	$0.600	$0.0007	128K	Balanced
	Llama 4 Maverick Open weights (Together AI)	Meta (Llama)	$0.270	$0.850	$0.0011	500K	Open weights open
	Codestral Code-specialised	Mistral	$0.300	$0.900	$0.0012	32K	Balanced
	DeepSeek-V4 Pro Thinking mode	DeepSeek	$0.435	$0.870	$0.0013	128K	Reasoning open
	GPT-5.4 nano Cheapest OpenAI tier	OpenAI	$0.200	$1.25	$0.0015	400K	Fast / cheap
	Grok Code Fast 1 Code-specialised	xAI (Grok)	$0.200	$1.50	$0.0017	256K	Balanced
	Mistral Large 3 Flagship	Mistral	$0.500	$1.50	$0.0020	256K	Flagship
	Llama 3.3 70B Open weights	Meta (Llama)	$1.04	$1.04	$0.0021	128K	Open weights open
	Gemini 2.5 Flash	Google	$0.300	$2.50	$0.0028	1M	Balanced
	GPT-5.4 mini	OpenAI	$0.750	$4.50	$0.0053	400K	Balanced
	Claude Haiku 4.5 Fastest, near-frontier	Anthropic	$1.00	$5.00	$0.0060	200K	Fast / cheap
	Mistral Medium 3.5	Mistral	$1.50	$7.50	$0.0090	256K	Balanced
	Gemini 2.5 Pro 1M context (≤200k tier)	Google	$1.25	$10.00	$0.011	1M	Flagship
	Command A Flagship, RAG-tuned	Cohere	$2.50	$10.00	$0.013	256K	Flagship
	GPT-5.4 Balanced flagship	OpenAI	$2.50	$15.00	$0.017	400K	Flagship
	Claude Sonnet 4.6 Best speed/intelligence; caching cuts input ~90%	Anthropic	$3.00	$15.00	$0.018	1M	Balanced
	Grok 4 Flagship	xAI (Grok)	$3.00	$15.00	$0.018	256K	Flagship
	Claude Opus 4.8 Top Opus reasoning/agentic	Anthropic	$5.00	$25.00	$0.030	1M	Flagship
	GPT-5.5 Flagship	OpenAI	$5.00	$30.00	$0.035	400K	Flagship
	Claude Fable 5 Most capable widely released	Anthropic	$10.00	$50.00	$0.060	1M	Flagship
No models match your search.

*Reference cost of one call with 1,000 input + 1,000 output tokens — a neutral yardstick. Use the calculator for your real usage. Cheapest row highlighted. Tick any models to build a shareable price card.

Go deeper

▣ Cost calculator

Enter your tokens and request volume to see the real monthly cost across every model, ranked cheapest first.

▤ Local VRAM calculator

Running models locally? Find out how much GPU VRAM a model needs at each quantization — and which card fits.

◇ Cheapest LLM API

The lowest-cost capable models for high-volume work, with the trade-offs that matter at scale.

Pricing by provider

OpenAI pricing Anthropic pricing Google pricing Meta (Llama) pricing Mistral pricing DeepSeek pricing xAI (Grok) pricing Cohere pricing

Best LLM for…

Coding Cheapest Long context Reasoning & math

Frequently asked questions

How is LLM API pricing calculated?

Almost every LLM API charges per token, split into an input (prompt) price and a usually higher output (completion) price, quoted per 1,000,000 tokens. Your bill is (input tokens × input price) + (output tokens × output price). A token is roughly ¾ of a word.

Which LLM API is cheapest?

For high-volume work the cheapest capable models are Google Gemini 1.5 Flash-8B and 2.0 Flash, OpenAI GPT-5.4 nano, and DeepSeek-V4 Flash. The lowest sticker price is not always the cheapest in practice — a weaker model that needs retries or escalation can cost more overall.

What is the difference between input and output token pricing?

Input tokens are what you send (the prompt, system message, context, documents). Output tokens are what the model generates. Output is typically 2–5× more expensive than input, so capping max output length is the fastest way to cut cost.

Are these prices up to date?

Prices are list prices last verified on 2026-06-26 and link to each provider's official pricing page. The AI market moves fast — always confirm the current price with the provider before committing to volume.

How we keep prices honest. Every number is an official list price last verified on 2026-06-26, and each provider page links to the source. Spotted a stale price? The whole catalog lives in one file so it is fast to refresh.