GLM 4.7

Code Thinking Tool Calls

GLM-4.7 is a 358.34-billion-parameter Mixture-of-Experts model from the GLM team at Zai Org, built for advanced coding, agentic reasoning, and tool use. It routes each token through 8 of 160 experts plus 1 shared expert, delivering frontier-level performance on benchmarks like SWE-bench and AIME while keeping per-token compute manageable. The model supports code generation, extended thinking with interleaved reasoning, and tool calling in English and Chinese. With a 198K context window and flash attention, it is designed for multi-step agentic workflows on high-end GPU deployments.

Hardware Configuration

Vendor

Product

Platform

Family

Model

VRAM

System RAM (GB) Optional — for precise deployment recommendations

Quantization	Quality	Size	Fit
Q8_0	High	354.79 GB	—
Q8_K_XL	High	367.72 GB	—
Q6_K	High	274.17 GB	—
Q6_K_XL	High	280.43 GB	—
Q5_K_M	Medium	236.81 GB	—
Q5_K_S	Medium	230.04 GB	—
Q5_K_XL	Medium	236.19 GB	—
Q4_K_M	Medium	201.58 GB	—
Q4_K_S	Medium	189.71 GB	—
Q4_K_XL	Medium	190.51 GB	—
Q4_0	Medium	189.1 GB	—
Q4_1	Medium	209.19 GB	—
Q3_K_M	Low	159.5 GB	—
Q3_K_S	Low	144.39 GB	—
Q3_K_XL	Low	147.83 GB	—
Q2_K	Low	122.14 GB	—
Q2_K_L	Low	122.31 GB	—
Q2_K_XL	Low	125.92 GB	—

Last updated: April 29, 2026