Gemma 4 26B A4B

Code Multilingual Thinking Tool Calls Vision

Gemma 4 26B A4B is Google DeepMind's Mixture-of-Experts model with 25.2 billion total parameters but only 3.8 billion active per token, distilled from Gemini research. It ranks #6 on the Arena AI leaderboard and scores 88.3 on AIME 2026, delivering near-flagship reasoning with a fraction of the compute. Natively multimodal, it processes text and images with built-in thinking and tool-calling capabilities across a 256K context window. Released under the Apache 2.0 license, it requires roughly 16 GB of VRAM at Q4, making it an exceptionally efficient choice for self-hosted deployment on consumer GPUs.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	47.04 GB	—
BF16	Full precision	47.03 GB	—
Q8_0	High	25.02 GB	—
Q8_K_XL	High	25.95 GB	—
Q6_K	High	21.33 GB	—
Q6_K_XL	High	22.19 GB	—
Q5_K_M	Medium	19.7 GB	—
Q5_K_S	Medium	17.48 GB	—
Q5_K_XL	Medium	19.81 GB	—
Q4_K_M	Medium	15.64 GB	—
Q4_K_S	Medium	15.27 GB	—
Q4_K_XL	Medium	15.97 GB	—
MXFP4_MOE	Medium	15.54 GB	—
IQ4_NL	Medium	12.5 GB	—
IQ4_XS	Medium	12.5 GB	—
Q3_K_M	Low	11.67 GB	—
Q3_K_S	Low	11.67 GB	—
Q3_K_XL	Low	12.04 GB	—
IQ3_S	Low	10.45 GB	—
IQ3_XXS	Low	10.45 GB	—
Q2_K_XL	Low	9.82 GB	—
IQ2_M	Low	9.29 GB	—

Last updated: April 29, 2026