Granite 4.0 Tiny Base Preview
IBM
Code Multilingual
Granite 4.0 Tiny Base Preview is a 6.67-billion-parameter fine-grained Mixture-of-Experts model from IBM, designed for efficient instruction following and code generation. With 62 experts and 6 active per token, it delivers strong reasoning at a fraction of the compute cost of dense models its size. The model supports code-related tasks and multilingual conversation across 12 languages including English, Chinese, and Japanese. A 128K context window with flash attention enables long-document workflows, and it quantizes well to GGUF for lightweight self-hosted deployments.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| FP16 | Full precision | 12.44 GB | — |
| Q8_0 | High | 6.62 GB | — |
| Q6_K | High | 5.11 GB | — |
| Q5_K_M | Medium | 4.42 GB | — |
| Q5_K_S | Medium | 4.3 GB | — |
| Q4_K_M | Medium | 3.77 GB | — |
| Q4_K_S | Medium | 3.56 GB | — |
| Q4_0 | Medium | 3.53 GB | — |
| Q4_1 | Medium | 3.91 GB | — |
| Q3_K_M | Low | 2.98 GB | — |
| Q3_K_S | Low | 2.71 GB | — |
| Q2_K | Low | 2.28 GB | — |
| Q3_K_L | Low | 3.2 GB | — |
| Q5_0 | Low | 4.3 GB | — |
| Q5_1 | Low | 4.68 GB | — |
Last updated: March 5, 2026