Skip to content

Qwen3.6 35B A3B

Qwen
Code Multilingual Thinking Tool Calls Vision

Qwen3.6 35B A3B is a Mixture-of-Experts model from Alibaba's Qwen team with 35.9 billion total parameters but only 3 billion active per token, routed across 256 experts using a novel hybrid Gated DeltaNet and Gated Attention mechanism. It is natively multimodal, processing text, images, and video, with built-in thinking and tool-calling capabilities across a 262K context window. The model supports over 200 languages and is released under the Apache 2.0 license. At Q4 quantization it requires roughly 20 GB of VRAM, making it highly practical for self-hosted deployment on consumer GPUs.

Hardware Configuration
Optional — for precise deployment recommendations
Quantization Quality Size Fit
BF16 Full precision 64.62 GB
Q8_0 High 34.37 GB
Q8_K_XL High 35.81 GB
Q6_K High 27.06 GB
Q6_K_XL High 29.66 GB
Q5_K_M Medium 24.64 GB
Q5_K_S Medium 23.23 GB
Q5_K_XL Medium 24.77 GB
Q4_K_M Medium 20.61 GB
Q4_K_S Medium 19.46 GB
Q4_K_XL Medium 20.82 GB
MXFP4_MOE Medium 20.22 GB
IQ4_NL Medium 16.8 GB
IQ4_XS Medium 16.51 GB
Q3_K_M Low 15.46 GB
Q3_K_S Low 14.3 GB
Q3_K_XL Low 15.69 GB
IQ3_S Low 12.74 GB
IQ3_XXS Low 12.3 GB
Q2_K_XL Low 11.45 GB
IQ2_M Low 10.73 GB
IQ2_XXS Low 10.02 GB
IQ1_M Low 9.36 GB
Last updated: April 29, 2026