Llama 4 Maverick 17B 128E Instruct
Meta
Code Multilingual Tool Calls Vision
Llama 4 Maverick 17B 128E Instruct is a large-scale Mixture-of-Experts model from Meta with 17 billion parameters per expert and 128 experts, activating one expert per token for a total of approximately 400 billion parameters. It delivers frontier-class performance on vision, code generation, and multilingual tasks across 12 languages. Maverick represents the high-capacity tier of the Llama 4 family, trading higher memory requirements for stronger benchmark results. With a 1M token context window, it requires multi-GPU setups but quantizes down to Q2 levels.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 396.58 GB | — |
| Q8_K_XL | High | 428.4 GB | — |
| Q6_K | High | 306.2 GB | — |
| Q6_K_XL | High | 317.63 GB | — |
| Q5_K_M | Medium | 264.93 GB | — |
| Q5_K_S | Medium | 256.77 GB | — |
| Q5_K_XL | Medium | 267.29 GB | — |
| Q4_K_M | Medium | 226.1 GB | — |
| Q4_K_S | Medium | 212.16 GB | — |
| Q4_K_XL | Medium | 216.2 GB | — |
| Q4_0 | Medium | 211.19 GB | — |
| Q4_1 | Medium | 233.49 GB | — |
| Q3_K_M | Low | 177.95 GB | — |
| Q3_K_S | Low | 160.79 GB | — |
| Q3_K_XL | Low | 167.23 GB | — |
| Q2_K | Low | 135.64 GB | — |
| Q2_K_L | Low | 135.87 GB | — |
| Q2_K_XL | Low | 142.17 GB | — |
Last updated: March 5, 2026