Skip to content

Gemma 4 31B

Google
Code Multilingual Thinking Tool Calls Vision

Gemma 4 31B is Google DeepMind's flagship open-weight dense model with 30.7 billion parameters, distilled from Gemini research. It ranks #3 on the Arena AI leaderboard and scores 85.2 on MMLU-Pro, 89.2 on AIME 2026, and 80.0 on LiveCodeBench v6, with a Codeforces ELO of 2,150. Natively multimodal, it processes text and images with built-in thinking and tool-calling capabilities across a 256K context window. Released under the Apache 2.0 license, it fits in roughly 17 GB of VRAM at Q4, making it ideal for self-hosted deployment on high-end consumer GPUs.

Hardware Configuration
Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 57.2 GB
BF16 Full precision 57.2 GB
Q8_0 High 30.39 GB
Q8_K_XL High 32.61 GB
Q6_K High 23.47 GB
Q6_K_XL High 25.63 GB
Q5_K_M Medium 20.17 GB
Q5_K_S Medium 19.67 GB
Q5_K_XL Medium 20.39 GB
Q4_K_M Medium 17.4 GB
Q4_K_S Medium 16.2 GB
Q4_K_XL Medium 17.48 GB
IQ4_NL Medium 16.1 GB
IQ4_XS Medium 15.25 GB
Q4_0 Medium 16.15 GB
Q4_1 Medium 17.81 GB
Q3_K_M Low 13.72 GB
Q3_K_S Low 12.3 GB
Q3_K_XL Low 14.27 GB
IQ3_XXS Low 11.02 GB
Q2_K_XL Low 10.97 GB
IQ2_M Low 10.01 GB
IQ2_XXS Low 7.95 GB
Last updated: April 29, 2026