Skip to content

GPT OSS 120B

OpenAI
Multilingual Thinking Tool Calls

GPT OSS 120B is a 120.41-billion-parameter Mixture-of-Experts model from OpenAI, trained with large-scale distillation and reinforcement learning for agentic reasoning. With 128 experts and 4 active per token, it fits on a single 80 GB GPU thanks to native MXFP4 quantization of MoE weights. The model supports configurable reasoning effort, function calling, and multilingual conversation across 12 languages. Its 128K context window and flash attention enable long-document workflows, while the Apache 2.0 license allows unrestricted commercial use. GGUF quants are available for self-hosted inference with llama.cpp.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 60.88 GB
Q8_0 High 59.03 GB
Q8_K_XL High 60.04 GB
Q6_K High 58.93 GB
Q6_K_XL High 58.93 GB
Q5_K_M Medium 58.57 GB
Q5_K_S Medium 58.56 GB
Q4_K_M Medium 58.46 GB
Q4_K_S Medium 58.45 GB
Q4_K_XL Medium 58.69 GB
Q4_0 Medium 58.32 GB
Q4_1 Medium 58.41 GB
Q3_K_M Low 58.32 GB
Q3_K_S Low 58.27 GB
Q2_K Low 58.27 GB
Q2_K_L Low 58.54 GB
Last updated: March 5, 2026