Gemma 4 31B

Code Multilingual Thinking Tool Calls Vision

Gemma 4 31B is Google DeepMind's flagship open-weight dense model with 30.7 billion parameters, distilled from Gemini research. It ranks #3 on the Arena AI leaderboard and scores 85.2 on MMLU-Pro, 89.2 on AIME 2026, and 80.0 on LiveCodeBench v6, with a Codeforces ELO of 2,150. Natively multimodal, it processes text and images with built-in thinking and tool-calling capabilities across a 256K context window. Released under the Apache 2.0 license, it fits in roughly 17 GB of VRAM at Q4, making it ideal for self-hosted deployment on high-end consumer GPUs.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	57.2 GB	—
BF16	Full precision	57.2 GB	—
Q8_0	High	30.39 GB	—
Q8_K_XL	High	32.61 GB	—
Q6_K	High	23.47 GB	—
Q6_K_XL	High	25.63 GB	—
Q5_K_M	Medium	20.17 GB	—
Q5_K_S	Medium	19.67 GB	—
Q5_K_XL	Medium	20.39 GB	—
Q4_K_M	Medium	17.4 GB	—
Q4_K_S	Medium	16.2 GB	—
Q4_K_XL	Medium	17.48 GB	—
IQ4_NL	Medium	16.1 GB	—
IQ4_XS	Medium	15.25 GB	—
Q4_0	Medium	16.15 GB	—
Q4_1	Medium	17.81 GB	—
Q3_K_M	Low	13.72 GB	—
Q3_K_S	Low	12.3 GB	—
Q3_K_XL	Low	14.27 GB	—
IQ3_XXS	Low	11.02 GB	—
Q2_K_XL	Low	10.97 GB	—
IQ2_M	Low	10.01 GB	—
IQ2_XXS	Low	7.95 GB	—

Last updated: April 29, 2026