Skip to content

Llama 4 Maverick 17B 128E Instruct

Meta
Code Multilingual Tool Calls Vision

Llama 4 Maverick 17B 128E Instruct is a large-scale Mixture-of-Experts model from Meta with 17 billion parameters per expert and 128 experts, activating one expert per token for a total of approximately 400 billion parameters. It delivers frontier-class performance on vision, code generation, and multilingual tasks across 12 languages. Maverick represents the high-capacity tier of the Llama 4 family, trading higher memory requirements for stronger benchmark results. With a 1M token context window, it requires multi-GPU setups but quantizes down to Q2 levels.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
Q8_0 High 396.58 GB
Q8_K_XL High 428.4 GB
Q6_K High 306.2 GB
Q6_K_XL High 317.63 GB
Q5_K_M Medium 264.93 GB
Q5_K_S Medium 256.77 GB
Q5_K_XL Medium 267.29 GB
Q4_K_M Medium 226.1 GB
Q4_K_S Medium 212.16 GB
Q4_K_XL Medium 216.2 GB
Q4_0 Medium 211.19 GB
Q4_1 Medium 233.49 GB
Q3_K_M Low 177.95 GB
Q3_K_S Low 160.79 GB
Q3_K_XL Low 167.23 GB
Q2_K Low 135.64 GB
Q2_K_L Low 135.87 GB
Q2_K_XL Low 142.17 GB
Last updated: March 5, 2026