DeepSeek V3.1
DeepSeek
Code Multilingual Thinking Tool Calls
DeepSeek V3.1 is a 685-billion-parameter Mixture-of-Experts model from DeepSeek, activating 8 of 256 experts per token plus one shared expert. It delivers frontier-level performance on code generation, reasoning, and multilingual tasks while using far fewer active parameters per inference step than comparably sized dense models. The model supports thinking mode, tool calling, and nine languages. With a 160K context window, it requires multi-GPU or distributed setups but quantizes down to Q2 levels for reduced VRAM footprint.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 664.33 GB | — |
| Q8_K_XL | High | 726.99 GB | — |
| Q6_K | High | 513.41 GB | — |
| Q6_K_XL | High | 535.03 GB | — |
| Q5_K_M | Medium | 443.48 GB | — |
| Q5_K_S | Medium | 430.87 GB | — |
| Q5_K_XL | Medium | 451.3 GB | — |
| Q4_K_M | Medium | 377.56 GB | — |
| Q4_K_S | Medium | 354.9 GB | — |
| Q4_K_XL | Medium | 360.33 GB | — |
| Q4_0 | Medium | 354 GB | — |
| Q4_1 | Medium | 391.86 GB | — |
| Q3_K_M | Low | 298.46 GB | — |
| Q3_K_S | Low | 270.49 GB | — |
| Q3_K_XL | Low | 279.43 GB | — |
| Q2_K | Low | 228.82 GB | — |
| Q2_K_L | Low | 229.02 GB | — |
| Q2_K_XL | Low | 238.17 GB | — |
Last updated: March 5, 2026