Qwen3.6 35B A3B
Qwen
Code Multilingual Thinking Tool Calls Vision
Qwen3.6 35B A3B is a Mixture-of-Experts model from Alibaba's Qwen team with 35.9 billion total parameters but only 3 billion active per token, routed across 256 experts using a novel hybrid Gated DeltaNet and Gated Attention mechanism. It is natively multimodal, processing text, images, and video, with built-in thinking and tool-calling capabilities across a 262K context window. The model supports over 200 languages and is released under the Apache 2.0 license. At Q4 quantization it requires roughly 20 GB of VRAM, making it highly practical for self-hosted deployment on consumer GPUs.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| BF16 | Full precision | 64.62 GB | — |
| Q8_0 | High | 34.37 GB | — |
| Q8_K_XL | High | 35.81 GB | — |
| Q6_K | High | 27.06 GB | — |
| Q6_K_XL | High | 29.66 GB | — |
| Q5_K_M | Medium | 24.64 GB | — |
| Q5_K_S | Medium | 23.23 GB | — |
| Q5_K_XL | Medium | 24.77 GB | — |
| Q4_K_M | Medium | 20.61 GB | — |
| Q4_K_S | Medium | 19.46 GB | — |
| Q4_K_XL | Medium | 20.82 GB | — |
| MXFP4_MOE | Medium | 20.22 GB | — |
| IQ4_NL | Medium | 16.8 GB | — |
| IQ4_XS | Medium | 16.51 GB | — |
| Q3_K_M | Low | 15.46 GB | — |
| Q3_K_S | Low | 14.3 GB | — |
| Q3_K_XL | Low | 15.69 GB | — |
| IQ3_S | Low | 12.74 GB | — |
| IQ3_XXS | Low | 12.3 GB | — |
| Q2_K_XL | Low | 11.45 GB | — |
| IQ2_M | Low | 10.73 GB | — |
| IQ2_XXS | Low | 10.02 GB | — |
| IQ1_M | Low | 9.36 GB | — |
Last updated: April 29, 2026