Llama 4 Scout 17B 16E Instruct
Meta
Code Multilingual Tool Calls Vision
Llama 4 Scout 17B 16E Instruct is a Mixture-of-Experts model from Meta with 17 billion parameters per expert and 16 experts, activating one expert per token. It supports vision, code generation, tool calling, and 12 languages, making it one of the most versatile models in the Llama 4 family. Scout targets the efficiency-focused segment, offering multimodal capabilities at lower compute cost than dense models of similar quality. Its 10M token context window is among the largest available, and it quantizes well for self-hosted multi-GPU deployments.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 106.66 GB | — |
| Q8_K_XL | High | 119.38 GB | — |
| Q6_K | High | 82.36 GB | — |
| Q6_K_XL | High | 87.61 GB | — |
| Q5_K_M | Medium | 71.29 GB | — |
| Q5_K_S | Medium | 69.16 GB | — |
| Q5_K_XL | Medium | 73.71 GB | — |
| Q4_K_M | Medium | 60.87 GB | — |
| Q4_K_S | Medium | 57.23 GB | — |
| Q4_K_XL | Medium | 57.74 GB | — |
| Q4_0 | Medium | 56.98 GB | — |
| Q4_1 | Medium | 62.94 GB | — |
| Q3_K_M | Low | 48.2 GB | — |
| Q3_K_S | Low | 43.53 GB | — |
| Q3_K_XL | Low | 45.65 GB | — |
| Q2_K | Low | 36.85 GB | — |
| Q2_K_L | Low | 37.07 GB | — |
| Q2_K_XL | Low | 39.47 GB | — |
Last updated: March 5, 2026