DeepSeek V3.1

Code Multilingual Thinking Tool Calls

DeepSeek V3.1 is a 685-billion-parameter Mixture-of-Experts model from DeepSeek, activating 8 of 256 experts per token plus one shared expert. It delivers frontier-level performance on code generation, reasoning, and multilingual tasks while using far fewer active parameters per inference step than comparably sized dense models. The model supports thinking mode, tool calling, and nine languages. With a 160K context window, it requires multi-GPU or distributed setups but quantizes down to Q2 levels for reduced VRAM footprint.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
Q8_0	High	664.33 GB	—
Q8_K_XL	High	726.99 GB	—
Q6_K	High	513.41 GB	—
Q6_K_XL	High	535.03 GB	—
Q5_K_M	Medium	443.48 GB	—
Q5_K_S	Medium	430.87 GB	—
Q5_K_XL	Medium	451.3 GB	—
Q4_K_M	Medium	377.56 GB	—
Q4_K_S	Medium	354.9 GB	—
Q4_K_XL	Medium	360.33 GB	—
Q4_0	Medium	354 GB	—
Q4_1	Medium	391.86 GB	—
Q3_K_M	Low	298.46 GB	—
Q3_K_S	Low	270.49 GB	—
Q3_K_XL	Low	279.43 GB	—
Q2_K	Low	228.82 GB	—
Q2_K_L	Low	229.02 GB	—
Q2_K_XL	Low	238.17 GB	—

Last updated: April 29, 2026