Llama 3.3 70B Instruct
Meta
Code Multilingual Tool Calls
Llama 3.3 70B Instruct is a 70-billion-parameter dense transformer model from Meta, optimized for instruction following, code generation, and multilingual conversation. It delivers performance competitive with larger models in the Llama family while remaining practical for single-node GPU deployments. The model supports tool calling and eight languages including English, French, Spanish, and German. With a 128K context window and grouped-query attention, it quantizes efficiently down to Q4 levels for self-hosted inference on consumer hardware.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| FP16 | Full precision | 131.43 GB | — |
| Q8_0 | High | 69.82 GB | — |
| Q6_K | High | 53.91 GB | — |
| Q5_K_M | Medium | 46.52 GB | — |
| Q5_K_S | Medium | 45.32 GB | — |
| Q4_K_M | Medium | 39.6 GB | — |
| Q4_K_S | Medium | 37.58 GB | — |
| Q4_0 | Medium | 37.36 GB | — |
| Q3_K_M | Low | 31.91 GB | — |
| Q3_K_S | Low | 28.79 GB | — |
| Q3_K_XL | Low | 35.45 GB | — |
| Q2_K | Low | 24.56 GB | — |
| Q2_K_L | Low | 25.52 GB | — |
| Q3_K_L | Low | 34.59 GB | — |
| Q4_0_4_4 | Low | 37.22 GB | — |
| Q4_0_4_8 | Low | 37.22 GB | — |
| Q4_0_8_8 | Low | 37.22 GB | — |
| Q4_K_L | Low | 40.33 GB | — |
| Q5_K_L | Low | 47.13 GB | — |
| Q6_K_L | Low | 54.39 GB | — |
Last updated: March 5, 2026