Qwen3 235B A22B

Code Multilingual Thinking Tool Calls

Qwen3 235B A22B is a 235.09-billion-parameter Mixture-of-Experts model from Alibaba's Qwen team, optimized for both thinking and non-thinking inference modes. It activates 8 of 128 experts per token, delivering frontier-level reasoning at a fraction of the compute cost of comparable dense models. The model supports code generation, tool calling, and 14 languages including English, Chinese, Japanese, and Arabic. With a 40K context window and flash attention, it targets multi-GPU server deployments and quantizes well to GGUF for self-hosted inference on high-end hardware.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
Q8_0	High	232.76 GB	—
Q8_K_XL	High	246.89 GB	—
Q6_K	High	179.76 GB	—
Q6_K_XL	High	185.2 GB	—
Q5_K_M	Medium	155.36 GB	—
Q5_K_S	Medium	150.76 GB	—
Q5_K_XL	Medium	155.43 GB	—
Q4_K_M	Medium	132.39 GB	—
Q4_K_S	Medium	124.51 GB	—
Q4_K_XL	Medium	124.91 GB	—
Q4_1	Medium	137.12 GB	—
Q3_K_M	Low	104.73 GB	—
Q3_K_S	Low	94.47 GB	—
Q3_K_XL	Low	96.61 GB	—
Q2_K	Low	79.81 GB	—
Q2_K_L	Low	79.94 GB	—
Q2_K_XL	Low	81.97 GB	—

Last updated: April 29, 2026