DeepSeek R1 Distill Llama 70B

Code Multilingual Thinking Tool Calls

DeepSeek R1 Distill Llama 70B is a 70.55-billion-parameter dense transformer from DeepSeek, distilled from the R1 reasoning model into a Llama-3-based architecture. It delivers frontier-level chain-of-thought reasoning in the 70B class, outperforming smaller reasoning models on math, code, and logic benchmarks. It supports code generation, tool calls, and nine languages including English, Chinese, and major European languages. With a 128K context window and flash attention, it suits multi-GPU self-hosted deployments and quantizes well to GGUF across a wide range of formats.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
FP16	Full precision	131.43 GB	—
Q8_0	High	69.82 GB	—
Q8_K_XL	High	75.66 GB	—
Q6_K	High	107.82 GB	—
Q6_K_XL	High	56.96 GB	—
Q5_K_M	Medium	46.52 GB	—
Q5_K_S	Medium	45.32 GB	—
Q5_K_XL	Medium	46.54 GB	—
Q4_K_M	Medium	39.6 GB	—
Q4_K_S	Medium	37.58 GB	—
Q4_K_XL	Medium	39.73 GB	—
Q4_0	Medium	37.36 GB	—
Q4_1	Medium	41.27 GB	—
Q3_K_M	Low	31.91 GB	—
Q3_K_S	Low	28.79 GB	—
Q3_K_XL	Low	32.48 GB	—
Q2_K	Low	24.56 GB	—
Q2_K_L	Low	24.79 GB	—
Q2_K_XL	Low	25.11 GB	—

Last updated: April 29, 2026