GPT OSS 120B

Multilingual Thinking Tool Calls

GPT OSS 120B is a 120.41-billion-parameter Mixture-of-Experts model from OpenAI, trained with large-scale distillation and reinforcement learning for agentic reasoning. With 128 experts and 4 active per token, it fits on a single 80 GB GPU thanks to native MXFP4 quantization of MoE weights. The model supports configurable reasoning effort, function calling, and multilingual conversation across 12 languages. Its 128K context window and flash attention enable long-document workflows, while the Apache 2.0 license allows unrestricted commercial use. GGUF quants are available for self-hosted inference with llama.cpp.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	60.88 GB	—
Q8_0	High	59.03 GB	—
Q8_K_XL	High	60.04 GB	—
Q6_K	High	58.93 GB	—
Q6_K_XL	High	58.93 GB	—
Q5_K_M	Medium	58.57 GB	—
Q5_K_S	Medium	58.56 GB	—
Q4_K_M	Medium	58.46 GB	—
Q4_K_S	Medium	58.45 GB	—
Q4_K_XL	Medium	58.69 GB	—
Q4_0	Medium	58.32 GB	—
Q4_1	Medium	58.41 GB	—
Q3_K_M	Low	58.32 GB	—
Q3_K_S	Low	58.27 GB	—
Q2_K	Low	58.27 GB	—
Q2_K_L	Low	58.54 GB	—

Last updated: April 29, 2026