Qwen3.6 35B A3B

Code Multilingual Thinking Tool Calls Vision

Qwen3.6 35B A3B is a Mixture-of-Experts model from Alibaba's Qwen team with 35.9 billion total parameters but only 3 billion active per token, routed across 256 experts using a novel hybrid Gated DeltaNet and Gated Attention mechanism. It is natively multimodal, processing text, images, and video, with built-in thinking and tool-calling capabilities across a 262K context window. The model supports over 200 languages and is released under the Apache 2.0 license. At Q4 quantization it requires roughly 20 GB of VRAM, making it highly practical for self-hosted deployment on consumer GPUs.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
BF16	Full precision	64.62 GB	—
Q8_0	High	34.37 GB	—
Q8_K_XL	High	35.81 GB	—
Q6_K	High	27.06 GB	—
Q6_K_XL	High	29.66 GB	—
Q5_K_M	Medium	24.64 GB	—
Q5_K_S	Medium	23.23 GB	—
Q5_K_XL	Medium	24.77 GB	—
Q4_K_M	Medium	20.61 GB	—
Q4_K_S	Medium	19.46 GB	—
Q4_K_XL	Medium	20.82 GB	—
MXFP4_MOE	Medium	20.22 GB	—
IQ4_NL	Medium	16.8 GB	—
IQ4_XS	Medium	16.51 GB	—
Q3_K_M	Low	15.46 GB	—
Q3_K_S	Low	14.3 GB	—
Q3_K_XL	Low	15.69 GB	—
IQ3_S	Low	12.74 GB	—
IQ3_XXS	Low	12.3 GB	—
Q2_K_XL	Low	11.45 GB	—
IQ2_M	Low	10.73 GB	—
IQ2_XXS	Low	10.02 GB	—
IQ1_M	Low	9.36 GB	—

Last updated: April 29, 2026