NVIDIA Nemotron 3 Nano 4B

Code Thinking Tool Calls

Nemotron 3 Nano 4B is a 3.97-billion-parameter hybrid Mamba-2/Transformer dense model from NVIDIA, compressed from the larger 9B Nano v2 model. It supports a toggleable thinking mode, tool calling, and code generation, making it well suited for agentic and reasoning workloads. A 262K context window and flash attention enable long-context workflows on modest hardware. GGUF quantizations range from 2 to 8 GB, making it ideal for edge devices and consumer GPUs where memory is limited.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
BF16	Full precision	7.96 GB	—
Q8_0	High	4.23 GB	—
Q8_K_XL	High	5.63 GB	—
Q6_K	High	4.06 GB	—
Q6_K_XL	High	4.56 GB	—
Q5_K_M	Medium	3.16 GB	—
Q5_K_S	Medium	3.11 GB	—
Q5_K_XL	Medium	3.31 GB	—
Q4_K_M	Medium	2.9 GB	—
Q4_K_S	Medium	2.83 GB	—
Q4_K_XL	Medium	3.13 GB	—
IQ4_NL	Medium	2.57 GB	—
IQ4_XS	Medium	2.54 GB	—
Q4_0	Medium	2.53 GB	—
Q4_1	Medium	2.71 GB	—
Q3_K_M	Low	2.46 GB	—
Q3_K_S	Low	2.36 GB	—
Q3_K_XL	Low	2.68 GB	—
IQ3_XXS	Low	2.39 GB	—
Q2_K_XL	Low	2.5 GB	—
IQ2_M	Low	2.3 GB	—
IQ2_XXS	Low	2.18 GB	—

Last updated: March 24, 2026