Qwen3 8B
Qwen
Code Multilingual Thinking Tool Calls
Qwen3 8B is an 8-billion-parameter dense transformer from Alibaba's Qwen team, featuring built-in thinking capabilities alongside code generation, tool calling, and multilingual support. It advances beyond Qwen2.5 with improved reasoning, supporting chain-of-thought inference in a compact form factor. The model covers 14 languages including English, Chinese, and Arabic. With a 40K context window and flash attention, it fits on a single consumer GPU and quantizes efficiently for cost-effective self-hosted reasoning workloads.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 8.11 GB | — |
| Q8_K_XL | High | 10.08 GB | — |
| Q6_K | High | 6.26 GB | — |
| Q6_K_XL | High | 6.98 GB | — |
| Q5_K_M | Medium | 5.45 GB | — |
| Q5_K_S | Medium | 5.33 GB | — |
| Q5_K_XL | Medium | 5.47 GB | — |
| Q4_K_M | Medium | 4.68 GB | — |
| Q4_K_S | Medium | 4.47 GB | — |
| Q4_K_XL | Medium | 4.78 GB | — |
| Q4_1 | Medium | 4.89 GB | — |
| Q3_K_M | Low | 3.84 GB | — |
| Q3_K_S | Low | 3.51 GB | — |
| Q3_K_XL | Low | 4.01 GB | — |
| Q2_K | Low | 3.06 GB | — |
| Q2_K_L | Low | 3.19 GB | — |
| Q2_K_XL | Low | 3.26 GB | — |
Last updated: March 5, 2026