Llama 4 Scout 17B 16E Instruct

Code Multilingual Tool Calls Vision

Llama 4 Scout 17B 16E Instruct is a Mixture-of-Experts model from Meta with 17 billion parameters per expert and 16 experts, activating one expert per token. It supports vision, code generation, tool calling, and 12 languages, making it one of the most versatile models in the Llama 4 family. Scout targets the efficiency-focused segment, offering multimodal capabilities at lower compute cost than dense models of similar quality. Its 10M token context window is among the largest available, and it quantizes well for self-hosted multi-GPU deployments.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
Q8_0	High	106.66 GB	—
Q8_K_XL	High	119.38 GB	—
Q6_K	High	82.36 GB	—
Q6_K_XL	High	87.61 GB	—
Q5_K_M	Medium	71.29 GB	—
Q5_K_S	Medium	69.16 GB	—
Q5_K_XL	Medium	73.71 GB	—
Q4_K_M	Medium	60.87 GB	—
Q4_K_S	Medium	57.23 GB	—
Q4_K_XL	Medium	57.74 GB	—
Q4_0	Medium	56.98 GB	—
Q4_1	Medium	62.94 GB	—
Q3_K_M	Low	48.2 GB	—
Q3_K_S	Low	43.53 GB	—
Q3_K_XL	Low	45.65 GB	—
Q2_K	Low	36.85 GB	—
Q2_K_L	Low	37.07 GB	—
Q2_K_XL	Low	39.47 GB	—

Last updated: April 29, 2026