Skip to content

Llama 4 Scout 17B 16E Instruct

Meta
Code Multilingual Tool Calls Vision

Llama 4 Scout 17B 16E Instruct is a Mixture-of-Experts model from Meta with 17 billion parameters per expert and 16 experts, activating one expert per token. It supports vision, code generation, tool calling, and 12 languages, making it one of the most versatile models in the Llama 4 family. Scout targets the efficiency-focused segment, offering multimodal capabilities at lower compute cost than dense models of similar quality. Its 10M token context window is among the largest available, and it quantizes well for self-hosted multi-GPU deployments.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
Q8_0 High 106.66 GB
Q8_K_XL High 119.38 GB
Q6_K High 82.36 GB
Q6_K_XL High 87.61 GB
Q5_K_M Medium 71.29 GB
Q5_K_S Medium 69.16 GB
Q5_K_XL Medium 73.71 GB
Q4_K_M Medium 60.87 GB
Q4_K_S Medium 57.23 GB
Q4_K_XL Medium 57.74 GB
Q4_0 Medium 56.98 GB
Q4_1 Medium 62.94 GB
Q3_K_M Low 48.2 GB
Q3_K_S Low 43.53 GB
Q3_K_XL Low 45.65 GB
Q2_K Low 36.85 GB
Q2_K_L Low 37.07 GB
Q2_K_XL Low 39.47 GB
Last updated: March 5, 2026