Llama 4 Maverick 17B 128E Instruct

Code Multilingual Tool Calls Vision

Llama 4 Maverick 17B 128E Instruct is a large-scale Mixture-of-Experts model from Meta with 17 billion parameters per expert and 128 experts, activating one expert per token for a total of approximately 400 billion parameters. It delivers frontier-class performance on vision, code generation, and multilingual tasks across 12 languages. Maverick represents the high-capacity tier of the Llama 4 family, trading higher memory requirements for stronger benchmark results. With a 1M token context window, it requires multi-GPU setups but quantizes down to Q2 levels.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
Q8_0	High	396.58 GB	—
Q8_K_XL	High	428.4 GB	—
Q6_K	High	306.2 GB	—
Q6_K_XL	High	317.63 GB	—
Q5_K_M	Medium	264.93 GB	—
Q5_K_S	Medium	256.77 GB	—
Q5_K_XL	Medium	267.29 GB	—
Q4_K_M	Medium	226.1 GB	—
Q4_K_S	Medium	212.16 GB	—
Q4_K_XL	Medium	216.2 GB	—
Q4_0	Medium	211.19 GB	—
Q4_1	Medium	233.49 GB	—
Q3_K_M	Low	177.95 GB	—
Q3_K_S	Low	160.79 GB	—
Q3_K_XL	Low	167.23 GB	—
Q2_K	Low	135.64 GB	—
Q2_K_L	Low	135.87 GB	—
Q2_K_XL	Low	142.17 GB	—

Last updated: April 29, 2026