Granite 4.0 Tiny Base Preview

Code Multilingual

Granite 4.0 Tiny Base Preview is a 6.67-billion-parameter fine-grained Mixture-of-Experts model from IBM, designed for efficient instruction following and code generation. With 62 experts and 6 active per token, it delivers strong reasoning at a fraction of the compute cost of dense models its size. The model supports code-related tasks and multilingual conversation across 12 languages including English, Chinese, and Japanese. A 128K context window with flash attention enables long-document workflows, and it quantizes well to GGUF for lightweight self-hosted deployments.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	12.44 GB	—
Q8_0	High	6.62 GB	—
Q6_K	High	5.11 GB	—
Q5_K_M	Medium	4.42 GB	—
Q5_K_S	Medium	4.3 GB	—
Q4_K_M	Medium	3.77 GB	—
Q4_K_S	Medium	3.56 GB	—
Q4_0	Medium	3.53 GB	—
Q4_1	Medium	3.91 GB	—
Q3_K_M	Low	2.98 GB	—
Q3_K_S	Low	2.71 GB	—
Q2_K	Low	2.28 GB	—
Q3_K_L	Low	3.2 GB	—
Q5_0	Low	4.3 GB	—
Q5_1	Low	4.68 GB	—

Last updated: March 24, 2026