Skip to content

Gemma 4 E2B

Google
Code Multilingual Thinking Tool Calls Vision

Gemma 4 E2B is Google DeepMind's ultra-compact Effective 2B dense model, distilled from Gemini research for phones and constrained environments. It scores 60.0 on MMLU-Pro, 37.5 on AIME 2026, and 44.0 on LiveCodeBench v6, bringing genuine reasoning to the smallest form factor in the family. Natively multimodal, it processes text, images, and audio with built-in thinking and tool-calling capabilities across a 128K context window. Released under the Apache 2.0 license, it requires only about 3 GB of VRAM at Q4, making it ideal for self-hosted deployment on phones, laptops, and ultra-low-power edge devices.

Hardware Configuration
Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 8.67 GB
BF16 Full precision 8.67 GB
Q8_0 High 4.63 GB
Q8_K_XL High 4.91 GB
Q6_K High 4.19 GB
Q6_K_XL High 4.39 GB
Q5_K_M Medium 3.13 GB
Q5_K_S Medium 3.09 GB
Q5_K_XL Medium 4 GB
Q4_K_M Medium 2.89 GB
Q4_K_S Medium 2.83 GB
Q4_K_XL Medium 2.96 GB
IQ4_NL Medium 2.83 GB
IQ4_XS Medium 2.78 GB
Q4_0 Medium 2.83 GB
Q4_1 Medium 2.94 GB
Q3_K_M Low 2.36 GB
Q3_K_S Low 2.28 GB
Q3_K_XL Low 2.71 GB
IQ3_XXS Low 2.21 GB
Q2_K_XL Low 2.24 GB
IQ2_M Low 2.13 GB
Last updated: April 3, 2026