Gemma 4 E2B

Code Multilingual Thinking Tool Calls Vision

Gemma 4 E2B is Google DeepMind's ultra-compact Effective 2B dense model, distilled from Gemini research for phones and constrained environments. It scores 60.0 on MMLU-Pro, 37.5 on AIME 2026, and 44.0 on LiveCodeBench v6, bringing genuine reasoning to the smallest form factor in the family. Natively multimodal, it processes text, images, and audio with built-in thinking and tool-calling capabilities across a 128K context window. Released under the Apache 2.0 license, it requires only about 3 GB of VRAM at Q4, making it ideal for self-hosted deployment on phones, laptops, and ultra-low-power edge devices.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	8.67 GB	—
BF16	Full precision	8.67 GB	—
Q8_0	High	4.63 GB	—
Q8_K_XL	High	4.91 GB	—
Q6_K	High	4.19 GB	—
Q6_K_XL	High	4.39 GB	—
Q5_K_M	Medium	3.13 GB	—
Q5_K_S	Medium	3.09 GB	—
Q5_K_XL	Medium	4 GB	—
Q4_K_M	Medium	2.89 GB	—
Q4_K_S	Medium	2.83 GB	—
Q4_K_XL	Medium	2.96 GB	—
IQ4_NL	Medium	2.83 GB	—
IQ4_XS	Medium	2.78 GB	—
Q4_0	Medium	2.83 GB	—
Q4_1	Medium	2.94 GB	—
Q3_K_M	Low	2.36 GB	—
Q3_K_S	Low	2.28 GB	—
Q3_K_XL	Low	2.71 GB	—
IQ3_XXS	Low	2.21 GB	—
Q2_K_XL	Low	2.24 GB	—
IQ2_M	Low	2.13 GB	—

Last updated: April 3, 2026