Compare AI & LLM API prices in one place
Side-by-side API pricing for every major model — GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek and more. Sort by input or output cost per token, see the context window, and find the cheapest model that does the job.
LLM API pricing comparison
All prices in USD per 1,000,000 tokens. Click any column to sort — the table defaults to cheapest reference cost first. Click a model to see its full provider pricing.
| Model | Provider | Input /1M | Output /1M | Cost / call* | Context | Type | |
|---|---|---|---|---|---|---|---|
| Gemini 1.5 Flash-8B Cheapest capable model | $0.037 | $0.150 | $0.0002 | 1M | Fast / cheap | ||
| Command R7B Cheapest Cohere tier | Cohere | $0.037 | $0.150 | $0.0002 | 128K | Fast / cheap | |
| Grok 3 mini | xAI (Grok) | $0.100 | $0.300 | $0.0004 | 128K | Fast / cheap | |
| Llama 3.1 8B Open weights | Meta (Llama) | $0.200 | $0.200 | $0.0004 | 128K | Open weights open | |
| DeepSeek-V4 Flash Non-thinking mode, very cheap | DeepSeek | $0.140 | $0.280 | $0.0004 | 128K | Fast / cheap open | |
| Llama 4 Scout Open weights (Groq) | Meta (Llama) | $0.110 | $0.340 | $0.0005 | 128K | Open weights open | |
| Gemini 2.0 Flash | $0.100 | $0.400 | $0.0005 | 1M | Fast / cheap | ||
| Grok 4 Fast 2M context | xAI (Grok) | $0.200 | $0.500 | $0.0007 | 2M | Fast / cheap | |
| Mistral Small 4 | Mistral | $0.150 | $0.600 | $0.0007 | 128K | Fast / cheap | |
| Command R | Cohere | $0.150 | $0.600 | $0.0007 | 128K | Balanced | |
| Llama 4 Maverick Open weights (Together AI) | Meta (Llama) | $0.270 | $0.850 | $0.0011 | 500K | Open weights open | |
| Codestral Code-specialised | Mistral | $0.300 | $0.900 | $0.0012 | 32K | Balanced | |
| DeepSeek-V4 Pro Thinking mode | DeepSeek | $0.435 | $0.870 | $0.0013 | 128K | Reasoning open | |
| GPT-5.4 nano Cheapest OpenAI tier | OpenAI | $0.200 | $1.25 | $0.0015 | 400K | Fast / cheap | |
| Grok Code Fast 1 Code-specialised | xAI (Grok) | $0.200 | $1.50 | $0.0017 | 256K | Balanced | |
| Mistral Large 3 Flagship | Mistral | $0.500 | $1.50 | $0.0020 | 256K | Flagship | |
| Llama 3.3 70B Open weights | Meta (Llama) | $1.04 | $1.04 | $0.0021 | 128K | Open weights open | |
| Gemini 2.5 Flash | $0.300 | $2.50 | $0.0028 | 1M | Balanced | ||
| GPT-5.4 mini | OpenAI | $0.750 | $4.50 | $0.0053 | 400K | Balanced | |
| Claude Haiku 4.5 Fastest, near-frontier | Anthropic | $1.00 | $5.00 | $0.0060 | 200K | Fast / cheap | |
| Mistral Medium 3.5 | Mistral | $1.50 | $7.50 | $0.0090 | 256K | Balanced | |
| Gemini 2.5 Pro 1M context (≤200k tier) | $1.25 | $10.00 | $0.011 | 1M | Flagship | ||
| Command A Flagship, RAG-tuned | Cohere | $2.50 | $10.00 | $0.013 | 256K | Flagship | |
| GPT-5.4 Balanced flagship | OpenAI | $2.50 | $15.00 | $0.017 | 400K | Flagship | |
| Claude Sonnet 4.6 Best speed/intelligence; caching cuts input ~90% | Anthropic | $3.00 | $15.00 | $0.018 | 1M | Balanced | |
| Grok 4 Flagship | xAI (Grok) | $3.00 | $15.00 | $0.018 | 256K | Flagship | |
| Claude Opus 4.8 Top Opus reasoning/agentic | Anthropic | $5.00 | $25.00 | $0.030 | 1M | Flagship | |
| GPT-5.5 Flagship | OpenAI | $5.00 | $30.00 | $0.035 | 400K | Flagship | |
| Claude Fable 5 Most capable widely released | Anthropic | $10.00 | $50.00 | $0.060 | 1M | Flagship |
*Reference cost of one call with 1,000 input + 1,000 output tokens — a neutral yardstick. Use the calculator for your real usage. Cheapest row highlighted. Tick any models to build a shareable price card.
Share the image anywhere, or send the link — it reopens this exact comparison.
Go deeper
▣ Cost calculator
Enter your tokens and request volume to see the real monthly cost across every model, ranked cheapest first.
▤ Local VRAM calculator
Running models locally? Find out how much GPU VRAM a model needs at each quantization — and which card fits.
◇ Cheapest LLM API
The lowest-cost capable models for high-volume work, with the trade-offs that matter at scale.
Pricing by provider
Best LLM for…
Frequently asked questions
How is LLM API pricing calculated?
Almost every LLM API charges per token, split into an input (prompt) price and a usually higher output (completion) price, quoted per 1,000,000 tokens. Your bill is (input tokens × input price) + (output tokens × output price). A token is roughly ¾ of a word.
Which LLM API is cheapest?
For high-volume work the cheapest capable models are Google Gemini 1.5 Flash-8B and 2.0 Flash, OpenAI GPT-5.4 nano, and DeepSeek-V4 Flash. The lowest sticker price is not always the cheapest in practice — a weaker model that needs retries or escalation can cost more overall.
What is the difference between input and output token pricing?
Input tokens are what you send (the prompt, system message, context, documents). Output tokens are what the model generates. Output is typically 2–5× more expensive than input, so capping max output length is the fastest way to cut cost.
Are these prices up to date?
Prices are list prices last verified on 2026-06-26 and link to each provider's official pricing page. The AI market moves fast — always confirm the current price with the provider before committing to volume.