Kimi K2 Instruct
Moonshot AI
Code Tool Calls
Kimi K2 Instruct is a 1,026.47-billion-parameter Mixture-of-Experts model from Moonshot AI, fine-tuned for instruction following, code generation, and autonomous tool use. It activates 8 of 384 experts plus 1 shared expert per token, achieving strong performance on coding and agentic benchmarks while keeping per-token compute equivalent to a 32B dense model. The model supports code generation and tool calling in English and Chinese, trained with the MuonClip optimizer on 15.5 trillion tokens. With a 128K context window and flash attention, it is suited for agentic deployments requiring reliable tool orchestration.
Hardware Configuration
Optional — for precise deployment recommendations
| Quantization | Quality | Size | Fit |
|---|---|---|---|
| Q8_0 | High | 1016.15 GB | — |
| Q8_K_XL | High | 1108.35 GB | — |
| Q6_K | High | 784.82 GB | — |
| Q6_K_XL | High | 818.73 GB | — |
| Q5_K_M | Medium | 678.37 GB | — |
| Q5_K_S | Medium | 658.07 GB | — |
| Q5_K_XL | Medium | 680.38 GB | — |
| Q4_K_M | Medium | 578.14 GB | — |
| Q4_K_S | Medium | 542.74 GB | — |
| Q4_K_XL | Medium | 546.77 GB | — |
| Q4_0 | Medium | 540.76 GB | — |
| Q4_1 | Medium | 598.41 GB | — |
| Q3_K_M | Low | 455.73 GB | — |
| Q3_K_S | Low | 412.01 GB | — |
| Q3_K_XL | Low | 421.03 GB | — |
| Q2_K | Low | 347.56 GB | — |
| Q2_K_L | Low | 347.82 GB | — |
| Q2_K_XL | Low | 355.65 GB | — |
Last updated: March 5, 2026