Skip to content

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA
Code Multilingual Thinking Tool Calls

Nemotron 3 Super 120B A12B is a 123.61-billion-parameter hybrid Mamba-2 Transformer LatentMoE model from NVIDIA, activating 12 billion parameters per token across 22 of 512 routed experts plus 1 shared expert. Trained on over 25 trillion tokens, it targets agentic reasoning, code generation, tool calling, and multilingual conversation in 7 languages. A 256K context window, toggleable thinking mode, and multi-token prediction enable high-throughput inference for complex multi-agent workflows. Its MoE sparsity quantizes well to GGUF for self-hosted deployment on multi-GPU setups.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
MXFP4_MOE Very high 76.42 GB
Q8_0 High 119.65 GB
Q8_K_XL High 123.39 GB
Q6_K High 106.87 GB
Q6_K_XL High 109.75 GB
Q5_K_M Medium 99.97 GB
Q5_K_S Medium 83.56 GB
Q5_K_XL Medium 100.19 GB
Q4_K_M Medium 76.87 GB
Q4_K_S Medium 73.59 GB
Q4_K_XL Medium 78.02 GB
Q3_K_M Low 57.48 GB
Q3_K_S Low 57.48 GB
Q3_K_XL Low 58.33 GB
Q2_K_XL Low 50.9 GB
Last updated: March 12, 2026