Skip to content

Llama 3.3 70B Instruct

Meta
Code Multilingual Tool Calls

Llama 3.3 70B Instruct is a 70-billion-parameter dense transformer model from Meta, optimized for instruction following, code generation, and multilingual conversation. It delivers performance competitive with larger models in the Llama family while remaining practical for single-node GPU deployments. The model supports tool calling and eight languages including English, French, Spanish, and German. With a 128K context window and grouped-query attention, it quantizes efficiently down to Q4 levels for self-hosted inference on consumer hardware.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
FP16 Full precision 131.43 GB
Q8_0 High 69.82 GB
Q6_K High 53.91 GB
Q5_K_M Medium 46.52 GB
Q5_K_S Medium 45.32 GB
Q4_K_M Medium 39.6 GB
Q4_K_S Medium 37.58 GB
Q4_0 Medium 37.36 GB
Q3_K_M Low 31.91 GB
Q3_K_S Low 28.79 GB
Q3_K_XL Low 35.45 GB
Q2_K Low 24.56 GB
Q2_K_L Low 25.52 GB
Q3_K_L Low 34.59 GB
Q4_0_4_4 Low 37.22 GB
Q4_0_4_8 Low 37.22 GB
Q4_0_8_8 Low 37.22 GB
Q4_K_L Low 40.33 GB
Q5_K_L Low 47.13 GB
Q6_K_L Low 54.39 GB
Last updated: March 5, 2026