NVIDIA Nemotron 3 Nano 30B A3B
NVIDIA
Code Multilingual Thinking Tool Calls
Nemotron 3 Nano 30B A3B is a 31.58-billion-parameter hybrid Mamba-2 Transformer MoE model from NVIDIA, trained on 25 trillion tokens for unified reasoning and agentic tasks. With 128 experts and 6 active per token plus 1 shared expert, only 3.5B parameters activate per forward pass. The model supports code generation, tool calling, and multilingual conversation across 6 languages. A 256K context window and flash attention enable long-context workflows, with a toggleable reasoning mode for balancing quality and latency. Its MoE sparsity quantizes well to GGUF for self-hosted deployment.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 31.28 GB | — |
| Q8_K_XL | High | 37.67 GB | — |
| Q6_K | High | 31.21 GB | — |
| Q6_K_XL | High | 31.21 GB | — |
| Q5_K_M | Medium | 24.35 GB | — |
| Q5_K_S | Medium | 22.31 GB | — |
| Q5_K_XL | Medium | 25.62 GB | — |
| Q4_K_M | Medium | 22.89 GB | — |
| Q4_K_S | Medium | 20.51 GB | — |
| Q4_K_XL | Medium | 21.27 GB | — |
| Q4_0 | Medium | 16.96 GB | — |
| Q4_1 | Medium | 18.68 GB | — |
| Q3_K_M | Low | 18.63 GB | — |
| Q3_K_S | Low | 16.88 GB | — |
| Q3_K_XL | Low | 18.57 GB | — |
| Q2_K_L | Low | 16.85 GB | — |
| Q2_K_XL | Low | 18.55 GB | — |
Last updated: March 5, 2026