Skip to content

NVIDIA Nemotron 3 Nano 4B

NVIDIA
Code Thinking Tool Calls

Nemotron 3 Nano 4B is a 3.97-billion-parameter hybrid Mamba-2/Transformer dense model from NVIDIA, compressed from the larger 9B Nano v2 model. It supports a toggleable thinking mode, tool calling, and code generation, making it well suited for agentic and reasoning workloads. A 262K context window and flash attention enable long-context workflows on modest hardware. GGUF quantizations range from 2 to 8 GB, making it ideal for edge devices and consumer GPUs where memory is limited.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
BF16 Full precision 7.96 GB
Q8_0 High 4.23 GB
Q8_K_XL High 5.63 GB
Q6_K High 4.06 GB
Q6_K_XL High 4.56 GB
Q5_K_M Medium 3.16 GB
Q5_K_S Medium 3.11 GB
Q5_K_XL Medium 3.31 GB
Q4_K_M Medium 2.9 GB
Q4_K_S Medium 2.83 GB
Q4_K_XL Medium 3.13 GB
IQ4_NL Medium 2.57 GB
IQ4_XS Medium 2.54 GB
Q4_0 Medium 2.53 GB
Q4_1 Medium 2.71 GB
Q3_K_M Low 2.46 GB
Q3_K_S Low 2.36 GB
Q3_K_XL Low 2.68 GB
IQ3_XXS Low 2.39 GB
Q2_K_XL Low 2.5 GB
IQ2_M Low 2.3 GB
IQ2_XXS Low 2.18 GB
Last updated: March 17, 2026