Gemma 4 E2B
Google
Code Multilingual Thinking Tool Calls Vision
Gemma 4 E2B is Google DeepMind's ultra-compact Effective 2B dense model, distilled from Gemini research for phones and constrained environments. It scores 60.0 on MMLU-Pro, 37.5 on AIME 2026, and 44.0 on LiveCodeBench v6, bringing genuine reasoning to the smallest form factor in the family. Natively multimodal, it processes text, images, and audio with built-in thinking and tool-calling capabilities across a 128K context window. Released under the Apache 2.0 license, it requires only about 3 GB of VRAM at Q4, making it ideal for self-hosted deployment on phones, laptops, and ultra-low-power edge devices.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| FP16 | Full precision | 8.67 GB | — |
| BF16 | Full precision | 8.67 GB | — |
| Q8_0 | High | 4.63 GB | — |
| Q8_K_XL | High | 4.91 GB | — |
| Q6_K | High | 4.19 GB | — |
| Q6_K_XL | High | 4.39 GB | — |
| Q5_K_M | Medium | 3.13 GB | — |
| Q5_K_S | Medium | 3.09 GB | — |
| Q5_K_XL | Medium | 4 GB | — |
| Q4_K_M | Medium | 2.89 GB | — |
| Q4_K_S | Medium | 2.83 GB | — |
| Q4_K_XL | Medium | 2.96 GB | — |
| IQ4_NL | Medium | 2.83 GB | — |
| IQ4_XS | Medium | 2.78 GB | — |
| Q4_0 | Medium | 2.83 GB | — |
| Q4_1 | Medium | 2.94 GB | — |
| Q3_K_M | Low | 2.36 GB | — |
| Q3_K_S | Low | 2.28 GB | — |
| Q3_K_XL | Low | 2.71 GB | — |
| IQ3_XXS | Low | 2.21 GB | — |
| Q2_K_XL | Low | 2.24 GB | — |
| IQ2_M | Low | 2.13 GB | — |
Last updated: April 3, 2026