Skip to content

Gemma 4 26B A4B

Google
Code Multilingual Thinking Tool Calls Vision

Gemma 4 26B A4B is Google DeepMind's Mixture-of-Experts model with 25.2 billion total parameters but only 3.8 billion active per token, distilled from Gemini research. It ranks #6 on the Arena AI leaderboard and scores 88.3 on AIME 2026, delivering near-flagship reasoning with a fraction of the compute. Natively multimodal, it processes text and images with built-in thinking and tool-calling capabilities across a 256K context window. Released under the Apache 2.0 license, it requires roughly 16 GB of VRAM at Q4, making it an exceptionally efficient choice for self-hosted deployment on consumer GPUs.

Hardware Configuration
Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 47.04 GB
BF16 Full precision 47.03 GB
Q8_0 High 25.02 GB
Q8_K_XL High 25.95 GB
Q6_K High 21.33 GB
Q6_K_XL High 22.19 GB
Q5_K_M Medium 19.7 GB
Q5_K_S Medium 17.48 GB
Q5_K_XL Medium 19.81 GB
Q4_K_M Medium 15.64 GB
Q4_K_S Medium 15.27 GB
Q4_K_XL Medium 15.97 GB
MXFP4_MOE Medium 15.54 GB
IQ4_NL Medium 12.5 GB
IQ4_XS Medium 12.5 GB
Q3_K_M Low 11.67 GB
Q3_K_S Low 11.67 GB
Q3_K_XL Low 12.04 GB
IQ3_S Low 10.45 GB
IQ3_XXS Low 10.45 GB
Q2_K_XL Low 9.82 GB
IQ2_M Low 9.29 GB
Last updated: April 29, 2026