Skip to content

NVIDIA Nemotron 3 Nano 30B A3B

NVIDIA
Code Multilingual Thinking Tool Calls

Nemotron 3 Nano 30B A3B is a 31.58-billion-parameter hybrid Mamba-2 Transformer MoE model from NVIDIA, trained on 25 trillion tokens for unified reasoning and agentic tasks. With 128 experts and 6 active per token plus 1 shared expert, only 3.5B parameters activate per forward pass. The model supports code generation, tool calling, and multilingual conversation across 6 languages. A 256K context window and flash attention enable long-context workflows, with a toggleable reasoning mode for balancing quality and latency. Its MoE sparsity quantizes well to GGUF for self-hosted deployment.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
Q8_0 High 31.28 GB
Q8_K_XL High 37.67 GB
Q6_K High 31.21 GB
Q6_K_XL High 31.21 GB
Q5_K_M Medium 24.35 GB
Q5_K_S Medium 22.31 GB
Q5_K_XL Medium 25.62 GB
Q4_K_M Medium 22.89 GB
Q4_K_S Medium 20.51 GB
Q4_K_XL Medium 21.27 GB
Q4_0 Medium 16.96 GB
Q4_1 Medium 18.68 GB
Q3_K_M Low 18.63 GB
Q3_K_S Low 16.88 GB
Q3_K_XL Low 18.57 GB
Q2_K_L Low 16.85 GB
Q2_K_XL Low 18.55 GB
Last updated: March 5, 2026