Kimi K2 Thinking
Moonshot AI
Code Thinking Tool Calls
Kimi K2 Thinking is a 170.27-billion-parameter Mixture-of-Experts model from Moonshot AI, trained end-to-end for extended chain-of-thought reasoning with integrated tool use. It activates 8 of 384 experts plus 1 shared expert per token, delivering frontier performance on complex math, coding, and agentic benchmarks while maintaining long-horizon coherence across hundreds of consecutive tool invocations. The model supports code generation, deep thinking, and tool calling in English and Chinese. With a 256K context window and flash attention, it excels at multi-step agentic workflows that require sustained goal-directed reasoning.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 1016.07 GB | — |
| Q8_K_XL | High | 1108.35 GB | — |
| Q6_K | High | 785.01 GB | — |
| Q6_K_XL | High | 818.73 GB | — |
| Q5_K_M | Medium | 678.67 GB | — |
| Q5_K_S | Medium | 658.39 GB | — |
| Q5_K_XL | Medium | 679.68 GB | — |
| Q4_K_M | Medium | 578.6 GB | — |
| Q4_K_S | Medium | 543.23 GB | — |
| Q4_K_XL | Medium | 601.86 GB | — |
| Q4_0 | Medium | 541.24 GB | — |
| Q4_1 | Medium | 598.79 GB | — |
| Q3_K_M | Low | 456.33 GB | — |
| Q3_K_S | Low | 412.7 GB | — |
| Q3_K_XL | Low | 423.87 GB | — |
| Q2_K | Low | 348.4 GB | — |
| Q2_K_L | Low | 348.65 GB | — |
| Q2_K_XL | Low | 359.82 GB | — |
Last updated: March 5, 2026