Skip to content

GPT OSS 20B

OpenAI
Multilingual Thinking Tool Calls

GPT OSS 20B is a 21.51-billion-parameter Mixture-of-Experts model from OpenAI, optimized for low-latency reasoning on consumer hardware. With 32 experts and 4 active per token, it runs within 16 GB of memory using native MXFP4 quantization. The model offers configurable reasoning effort, function calling, and multilingual conversation across 12 languages. A 128K context window and flash attention support long-document tasks, while the Apache 2.0 license and small active footprint make it ideal for local and latency-sensitive deployments. GGUF quants are available for local inference with llama.cpp.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 12.85 GB
Q8_0 High 11.28 GB
Q8_K_XL High 12.29 GB
Q6_K High 11.21 GB
Q6_K_XL High 11.21 GB
Q5_K_M Medium 10.91 GB
Q5_K_S Medium 10.91 GB
Q4_K_M Medium 10.83 GB
Q4_K_S Medium 10.82 GB
Q4_K_XL Medium 11.06 GB
Q4_0 Medium 10.71 GB
Q4_1 Medium 10.78 GB
Q3_K_M Low 10.72 GB
Q3_K_S Low 10.68 GB
Q2_K Low 10.68 GB
Q2_K_L Low 10.95 GB
Last updated: March 5, 2026