GPT OSS 20B

Multilingual Thinking Tool Calls

GPT OSS 20B is a 21.51-billion-parameter Mixture-of-Experts model from OpenAI, optimized for low-latency reasoning on consumer hardware. With 32 experts and 4 active per token, it runs within 16 GB of memory using native MXFP4 quantization. The model offers configurable reasoning effort, function calling, and multilingual conversation across 12 languages. A 128K context window and flash attention support long-document tasks, while the Apache 2.0 license and small active footprint make it ideal for local and latency-sensitive deployments. GGUF quants are available for local inference with llama.cpp.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	12.85 GB	—
Q8_0	High	11.28 GB	—
Q8_K_XL	High	12.29 GB	—
Q6_K	High	11.21 GB	—
Q6_K_XL	High	11.21 GB	—
Q5_K_M	Medium	10.91 GB	—
Q5_K_S	Medium	10.91 GB	—
Q4_K_M	Medium	10.83 GB	—
Q4_K_S	Medium	10.82 GB	—
Q4_K_XL	Medium	11.06 GB	—
Q4_0	Medium	10.71 GB	—
Q4_1	Medium	10.78 GB	—
Q3_K_M	Low	10.72 GB	—
Q3_K_S	Low	10.68 GB	—
Q2_K	Low	10.68 GB	—
Q2_K_L	Low	10.95 GB	—

Last updated: March 24, 2026