Free tool

LLM VRAM calculator

Running models locally? Pick a model size and quantization to see how much GPU VRAM you need — and which cards can actually run it.

Or type a custom size below.
Most popular for local use.
Estimated VRAM required
weights + ~20% for KV cache & overhead

Which GPUs can run it?

GPUVRAMTierFits?

Estimates for inference. Long contexts grow the KV cache and need more VRAM; training needs far more. A model that "just fits" may be slow — leave headroom.

How much VRAM do you need to run an LLM?

The rule of thumb: VRAM ≈ parameters × bytes-per-parameter × ~1.2. A model's weights are the bulk of it, and the bytes-per-parameter depends on the quantization you run:

The extra ~20% covers the KV cache (which grows with your context length) and runtime overhead. For long contexts, add more headroom. Want to skip the hardware entirely? Compare cheap hosted APIs instead — often cheaper than buying a GPU until you are at serious volume.