GLM 4.7
Zai Org
Code Thinking Tool Calls
GLM-4.7 is a 358.34-billion-parameter Mixture-of-Experts model from the GLM team at Zai Org, built for advanced coding, agentic reasoning, and tool use. It routes each token through 8 of 160 experts plus 1 shared expert, delivering frontier-level performance on benchmarks like SWE-bench and AIME while keeping per-token compute manageable. The model supports code generation, extended thinking with interleaved reasoning, and tool calling in English and Chinese. With a 198K context window and flash attention, it is designed for multi-step agentic workflows on high-end GPU deployments.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 354.79 GB | — |
| Q8_K_XL | High | 367.72 GB | — |
| Q6_K | High | 274.17 GB | — |
| Q6_K_XL | High | 280.43 GB | — |
| Q5_K_M | Medium | 236.81 GB | — |
| Q5_K_S | Medium | 230.04 GB | — |
| Q5_K_XL | Medium | 236.19 GB | — |
| Q4_K_M | Medium | 201.58 GB | — |
| Q4_K_S | Medium | 189.71 GB | — |
| Q4_K_XL | Medium | 190.51 GB | — |
| Q4_0 | Medium | 189.1 GB | — |
| Q4_1 | Medium | 209.19 GB | — |
| Q3_K_M | Low | 159.5 GB | — |
| Q3_K_S | Low | 144.39 GB | — |
| Q3_K_XL | Low | 147.83 GB | — |
| Q2_K | Low | 122.14 GB | — |
| Q2_K_L | Low | 122.31 GB | — |
| Q2_K_XL | Low | 125.92 GB | — |
Last updated: March 5, 2026