NVIDIA Nemotron 3 Nano 30B A3B

Code Multilingual Thinking Tool Calls

Nemotron 3 Nano 30B A3B is a 31.58-billion-parameter hybrid Mamba-2 Transformer MoE model from NVIDIA, trained on 25 trillion tokens for unified reasoning and agentic tasks. With 128 experts and 6 active per token plus 1 shared expert, only 3.5B parameters activate per forward pass. The model supports code generation, tool calling, and multilingual conversation across 6 languages. A 256K context window and flash attention enable long-context workflows, with a toggleable reasoning mode for balancing quality and latency. Its MoE sparsity quantizes well to GGUF for self-hosted deployment.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
Q8_0	High	31.28 GB	—
Q8_K_XL	High	37.67 GB	—
Q6_K	High	31.21 GB	—
Q6_K_XL	High	31.21 GB	—
Q5_K_M	Medium	24.35 GB	—
Q5_K_S	Medium	22.31 GB	—
Q5_K_XL	Medium	25.62 GB	—
Q4_K_M	Medium	22.89 GB	—
Q4_K_S	Medium	20.51 GB	—
Q4_K_XL	Medium	21.27 GB	—
Q4_0	Medium	16.96 GB	—
Q4_1	Medium	18.68 GB	—
Q3_K_M	Low	18.63 GB	—
Q3_K_S	Low	16.88 GB	—
Q3_K_XL	Low	18.57 GB	—
Q2_K_L	Low	16.85 GB	—
Q2_K_XL	Low	18.55 GB	—

Last updated: March 24, 2026