Skip to content

DeepSeek R1 Distill Llama 70B

DeepSeek
Code Multilingual Thinking Tool Calls

DeepSeek R1 Distill Llama 70B is a 70.55-billion-parameter dense transformer from DeepSeek, distilled from the R1 reasoning model into a Llama-3-based architecture. It delivers frontier-level chain-of-thought reasoning in the 70B class, outperforming smaller reasoning models on math, code, and logic benchmarks. It supports code generation, tool calls, and nine languages including English, Chinese, and major European languages. With a 128K context window and flash attention, it suits multi-GPU self-hosted deployments and quantizes well to GGUF across a wide range of formats.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 131.43 GB
Q8_0 High 69.82 GB
Q8_K_XL High 75.66 GB
Q6_K High 107.82 GB
Q6_K_XL High 56.96 GB
Q5_K_M Medium 46.52 GB
Q5_K_S Medium 45.32 GB
Q5_K_XL Medium 46.54 GB
Q4_K_M Medium 39.6 GB
Q4_K_S Medium 37.58 GB
Q4_K_XL Medium 39.73 GB
Q4_0 Medium 37.36 GB
Q4_1 Medium 41.27 GB
Q3_K_M Low 31.91 GB
Q3_K_S Low 28.79 GB
Q3_K_XL Low 32.48 GB
Q2_K Low 24.56 GB
Q2_K_L Low 24.79 GB
Q2_K_XL Low 25.11 GB
Last updated: March 5, 2026