Qwen3 235B A22B
Qwen
Code Multilingual Thinking Tool Calls
Qwen3 235B A22B is a 235.09-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team, optimized for both thinking and non-thinking inference modes. It activates 8 of 128 experts per token, delivering frontier-level reasoning at a fraction of the compute cost of comparable dense models. The model supports code generation, tool calling, and 14 languages including English, Chinese, Japanese, and Arabic. With a 40K context window and flash attention, it targets multi-GPU server deployments and quantizes well to GGUF for self-hosted inference on high-end hardware.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 232.76 GB | — |
| Q8_K_XL | High | 246.89 GB | — |
| Q6_K | High | 179.76 GB | — |
| Q6_K_XL | High | 185.2 GB | — |
| Q5_K_M | Medium | 155.36 GB | — |
| Q5_K_S | Medium | 150.76 GB | — |
| Q5_K_XL | Medium | 155.43 GB | — |
| Q4_K_M | Medium | 132.39 GB | — |
| Q4_K_S | Medium | 124.51 GB | — |
| Q4_K_XL | Medium | 124.91 GB | — |
| Q4_1 | Medium | 137.12 GB | — |
| Q3_K_M | Low | 104.73 GB | — |
| Q3_K_S | Low | 94.47 GB | — |
| Q3_K_XL | Low | 96.61 GB | — |
| Q2_K | Low | 79.81 GB | — |
| Q2_K_L | Low | 79.94 GB | — |
| Q2_K_XL | Low | 81.97 GB | — |
Last updated: March 5, 2026