Gemma 4 26B A4B
Google
Code Multilingual Thinking Tool Calls Vision
Gemma 4 26B A4B is Google DeepMind's Mixture-of-Experts model with 25.2 billion total parameters but only 3.8 billion active per token, distilled from Gemini research. It ranks #6 on the Arena AI leaderboard and scores 88.3 on AIME 2026, delivering near-flagship reasoning with a fraction of the compute. Natively multimodal, it processes text and images with built-in thinking and tool-calling capabilities across a 256K context window. Released under the Apache 2.0 license, it requires roughly 16 GB of VRAM at Q4, making it an exceptionally efficient choice for self-hosted deployment on consumer GPUs.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| FP16 | Full precision | 47.04 GB | — |
| BF16 | Full precision | 47.03 GB | — |
| Q8_0 | High | 25.02 GB | — |
| Q8_K_XL | High | 25.95 GB | — |
| Q6_K | High | 21.33 GB | — |
| Q6_K_XL | High | 22.19 GB | — |
| Q5_K_M | Medium | 19.7 GB | — |
| Q5_K_S | Medium | 17.48 GB | — |
| Q5_K_XL | Medium | 19.81 GB | — |
| Q4_K_M | Medium | 15.64 GB | — |
| Q4_K_S | Medium | 15.27 GB | — |
| Q4_K_XL | Medium | 15.97 GB | — |
| MXFP4_MOE | Medium | 15.54 GB | — |
| IQ4_NL | Medium | 12.5 GB | — |
| IQ4_XS | Medium | 12.5 GB | — |
| Q3_K_M | Low | 11.67 GB | — |
| Q3_K_S | Low | 11.67 GB | — |
| Q3_K_XL | Low | 12.04 GB | — |
| IQ3_S | Low | 10.45 GB | — |
| IQ3_XXS | Low | 10.45 GB | — |
| Q2_K_XL | Low | 9.82 GB | — |
| IQ2_M | Low | 9.29 GB | — |
Last updated: April 29, 2026