Qwen3.6 27B
Qwen
Code Multilingual Thinking Tool Calls Vision
Qwen3.6 27B is a dense large language model from Alibaba's Qwen team with 27 billion parameters and 64 layers, built on a novel hybrid Gated DeltaNet and Gated Attention architecture shared with its larger Mixture-of-Experts siblings. It is natively multimodal, processing text, images, and video, and ships with built-in thinking and tool-calling capabilities across a 262K context window extensible to roughly one million tokens via YaRN. The model is released under the Apache 2.0 license. At Q4 quantization it requires around 16 GB of VRAM, making it a strong fit for self-hosted deployment on a single high-end consumer GPU.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| BF16 | Full precision | 50.11 GB | — |
| Q8_0 | High | 26.63 GB | — |
| Q8_K_XL | High | 32.9 GB | — |
| Q6_K | High | 20.98 GB | — |
| Q6_K_XL | High | 23.88 GB | — |
| Q5_K_M | Medium | 18.17 GB | — |
| Q5_K_S | Medium | 17.66 GB | — |
| Q5_K_XL | Medium | 18.66 GB | — |
| Q4_K_M | Medium | 15.66 GB | — |
| Q4_K_S | Medium | 14.77 GB | — |
| Q4_K_XL | Medium | 16.4 GB | — |
| IQ4_NL | Medium | 14.97 GB | — |
| IQ4_XS | Medium | 14.38 GB | — |
| Q4_0 | Medium | 14.71 GB | — |
| Q4_1 | Medium | 16.07 GB | — |
| Q3_K_M | Low | 12.65 GB | — |
| Q3_K_S | Low | 11.51 GB | — |
| Q3_K_XL | Low | 13.48 GB | — |
| IQ3_XXS | Low | 11.17 GB | — |
| Q2_K_XL | Low | 11.04 GB | — |
| IQ2_M | Low | 10.1 GB | — |
| IQ2_XXS | Low | 8.74 GB | — |
Last updated: April 29, 2026