Skip to content

GLM 4.7

Zai Org
Code Thinking Tool Calls

GLM-4.7 is a 358.34-billion-parameter Mixture-of-Experts model from the GLM team at Zai Org, built for advanced coding, agentic reasoning, and tool use. It routes each token through 8 of 160 experts plus 1 shared expert, delivering frontier-level performance on benchmarks like SWE-bench and AIME while keeping per-token compute manageable. The model supports code generation, extended thinking with interleaved reasoning, and tool calling in English and Chinese. With a 198K context window and flash attention, it is designed for multi-step agentic workflows on high-end GPU deployments.

Hardware Configuration

Optional — for precise deployment recommendations
Quantization Quality Size Fit
Q8_0 High 354.79 GB
Q8_K_XL High 367.72 GB
Q6_K High 274.17 GB
Q6_K_XL High 280.43 GB
Q5_K_M Medium 236.81 GB
Q5_K_S Medium 230.04 GB
Q5_K_XL Medium 236.19 GB
Q4_K_M Medium 201.58 GB
Q4_K_S Medium 189.71 GB
Q4_K_XL Medium 190.51 GB
Q4_0 Medium 189.1 GB
Q4_1 Medium 209.19 GB
Q3_K_M Low 159.5 GB
Q3_K_S Low 144.39 GB
Q3_K_XL Low 147.83 GB
Q2_K Low 122.14 GB
Q2_K_L Low 122.31 GB
Q2_K_XL Low 125.92 GB
Last updated: March 5, 2026