DeepSeek R1 Distill Llama 70B
DeepSeek
Code Multilingual Thinking Tool Calls
DeepSeek R1 Distill Llama 70B is a 70.55-billion-parameter dense transformer from DeepSeek, distilled from the R1 reasoning model into a Llama-3-based architecture. It delivers frontier-level chain-of-thought reasoning in the 70B class, outperforming smaller reasoning models on math, code, and logic benchmarks. It supports code generation, tool calls, and nine languages including English, Chinese, and major European languages. With a 128K context window and flash attention, it suits multi-GPU self-hosted deployments and quantizes well to GGUF across a wide range of formats.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| FP16 | Full precision | 131.43 GB | — |
| Q8_0 | High | 69.82 GB | — |
| Q8_K_XL | High | 75.66 GB | — |
| Q6_K | High | 107.82 GB | — |
| Q6_K_XL | High | 56.96 GB | — |
| Q5_K_M | Medium | 46.52 GB | — |
| Q5_K_S | Medium | 45.32 GB | — |
| Q5_K_XL | Medium | 46.54 GB | — |
| Q4_K_M | Medium | 39.6 GB | — |
| Q4_K_S | Medium | 37.58 GB | — |
| Q4_K_XL | Medium | 39.73 GB | — |
| Q4_0 | Medium | 37.36 GB | — |
| Q4_1 | Medium | 41.27 GB | — |
| Q3_K_M | Low | 31.91 GB | — |
| Q3_K_S | Low | 28.79 GB | — |
| Q3_K_XL | Low | 32.48 GB | — |
| Q2_K | Low | 24.56 GB | — |
| Q2_K_L | Low | 24.79 GB | — |
| Q2_K_XL | Low | 25.11 GB | — |
Last updated: March 5, 2026