Gemma 4 E4B

Code Multilingual Thinking Tool Calls Vision

Gemma 4 E4B is Google DeepMind's Effective 4B dense edge model, distilled from Gemini research for on-device and embedded deployment. It scores 69.4 on MMLU-Pro, 42.5 on AIME 2026, and 52.0 on LiveCodeBench v6, delivering strong reasoning in a compact footprint. Natively multimodal, it processes text, images, and audio with built-in thinking and tool-calling capabilities across a 128K context window. Released under the Apache 2.0 license, it requires only about 5 GB of VRAM at Q4, making it an excellent choice for self-hosted deployment on consumer GPUs and edge devices.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	14.02 GB	—
BF16	Full precision	14.02 GB	—
Q8_0	High	7.48 GB	—
Q8_K_XL	High	8.06 GB	—
Q6_K	High	6.59 GB	—
Q6_K_XL	High	6.95 GB	—
Q5_K_M	Medium	5.11 GB	—
Q5_K_S	Medium	5.03 GB	—
Q5_K_XL	Medium	6.19 GB	—
Q4_K_M	Medium	4.97 GB	—
Q4_K_S	Medium	4.51 GB	—
Q4_K_XL	Medium	4.75 GB	—
IQ4_NL	Medium	4.5 GB	—
IQ4_XS	Medium	4.39 GB	—
Q4_0	Medium	4.5 GB	—
Q4_1	Medium	4.73 GB	—
Q3_K_M	Low	3.78 GB	—
Q3_K_S	Low	3.6 GB	—
Q3_K_XL	Low	4.25 GB	—
IQ3_XXS	Low	3.45 GB	—
Q2_K_XL	Low	3.49 GB	—
IQ2_M	Low	3.29 GB	—

Last updated: April 3, 2026