Skip to content

Gemma 4 E4B

Google
Code Multilingual Thinking Tool Calls Vision

Gemma 4 E4B is Google DeepMind's Effective 4B dense edge model, distilled from Gemini research for on-device and embedded deployment. It scores 69.4 on MMLU-Pro, 42.5 on AIME 2026, and 52.0 on LiveCodeBench v6, delivering strong reasoning in a compact footprint. Natively multimodal, it processes text, images, and audio with built-in thinking and tool-calling capabilities across a 128K context window. Released under the Apache 2.0 license, it requires only about 5 GB of VRAM at Q4, making it an excellent choice for self-hosted deployment on consumer GPUs and edge devices.

Hardware Configuration
Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 14.02 GB
BF16 Full precision 14.02 GB
Q8_0 High 7.48 GB
Q8_K_XL High 8.06 GB
Q6_K High 6.59 GB
Q6_K_XL High 6.95 GB
Q5_K_M Medium 5.11 GB
Q5_K_S Medium 5.03 GB
Q5_K_XL Medium 6.19 GB
Q4_K_M Medium 4.97 GB
Q4_K_S Medium 4.51 GB
Q4_K_XL Medium 4.75 GB
IQ4_NL Medium 4.5 GB
IQ4_XS Medium 4.39 GB
Q4_0 Medium 4.5 GB
Q4_1 Medium 4.73 GB
Q3_K_M Low 3.78 GB
Q3_K_S Low 3.6 GB
Q3_K_XL Low 4.25 GB
IQ3_XXS Low 3.45 GB
Q2_K_XL Low 3.49 GB
IQ2_M Low 3.29 GB
Last updated: April 3, 2026