Gemma 4 31B
Google
Code Multilingual Thinking Tool Calls Vision
Gemma 4 31B is Google DeepMind's flagship open-weight dense model with 30.7 billion parameters, distilled from Gemini research. It ranks #3 on the Arena AI leaderboard and scores 85.2 on MMLU-Pro, 89.2 on AIME 2026, and 80.0 on LiveCodeBench v6, with a Codeforces ELO of 2,150. Natively multimodal, it processes text and images with built-in thinking and tool-calling capabilities across a 256K context window. Released under the Apache 2.0 license, it fits in roughly 17 GB of VRAM at Q4, making it ideal for self-hosted deployment on high-end consumer GPUs.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| FP16 | Full precision | 57.2 GB | — |
| BF16 | Full precision | 57.2 GB | — |
| Q8_0 | High | 30.39 GB | — |
| Q8_K_XL | High | 32.61 GB | — |
| Q6_K | High | 23.47 GB | — |
| Q6_K_XL | High | 25.63 GB | — |
| Q5_K_M | Medium | 20.17 GB | — |
| Q5_K_S | Medium | 19.67 GB | — |
| Q5_K_XL | Medium | 20.39 GB | — |
| Q4_K_M | Medium | 17.4 GB | — |
| Q4_K_S | Medium | 16.2 GB | — |
| Q4_K_XL | Medium | 17.48 GB | — |
| IQ4_NL | Medium | 16.1 GB | — |
| IQ4_XS | Medium | 15.25 GB | — |
| Q4_0 | Medium | 16.15 GB | — |
| Q4_1 | Medium | 17.81 GB | — |
| Q3_K_M | Low | 13.72 GB | — |
| Q3_K_S | Low | 12.3 GB | — |
| Q3_K_XL | Low | 14.27 GB | — |
| IQ3_XXS | Low | 11.02 GB | — |
| Q2_K_XL | Low | 10.97 GB | — |
| IQ2_M | Low | 10.01 GB | — |
| IQ2_XXS | Low | 7.95 GB | — |
Last updated: April 29, 2026