Kimi API Pricing
The lab behind Kimi — the long-context assistant — and the open-weight Kimi K2 models. Founded in 2023 by Yang Zhilin with Tsinghua schoolmates, Moonshot ships a 1-trillion-parameter open-weight MoE that rivals frontier US labs while staying downloadable on Hugging Face.
Kimi K2.6
Moonshot's current flagship — native multimodal (text+image+video), thinking + non-thinking modes, agent capabilities.
Kimi K2.5
Previous-gen multimodal Kimi.
Moonshot V1 128K Vision Preview
Vision-input preview of Moonshot V1 128K.
Moonshot V1 32K Vision Preview
Vision-input preview of Moonshot V1 32K.
Moonshot V1 8K Vision Preview
Vision-input preview of Moonshot V1 8K.
Kimi K2 (0905 Preview)
RETIRES 2026-05-25. Dated Sep-05 K2 preview snapshot.
Kimi K2 (0711 Preview)
RETIRES 2026-05-25. Dated Jul-11 K2 preview, smaller 131K context.
Kimi K2 Turbo Preview
RETIRES 2026-05-25. Turbo (low-latency) K2 preview.
Kimi K2 Thinking
RETIRES 2026-05-25. K2 reasoning-mode variant.
Kimi K2 Thinking Turbo
RETIRES 2026-05-25. Turbo reasoning variant.
Moonshot V1 (128K)
Legacy Moonshot V1 line, 128K context.
Moonshot V1 (32K)
Legacy Moonshot V1, 32K context tier.
Moonshot V1 (8K)
Legacy Moonshot V1, smallest 8K context.
| Model | Input /M | Output /M | Cached | Context | Max output | Vision | Tools | Tier |
|---|---|---|---|---|---|---|---|---|
| Kimi K2.6 FLAGSHIP | $0.95 | $4.00 | $0.16−83% | 262K | — | ✓ | ✓ | Frontier |
| Kimi K2.5 | $0.6 | $3.00 | $0.1−83% | 262K | — | ✓ | ✓ | Mid |
| Moonshot V1 128K Vision Preview PREVIEW | $2.00 | $5.00 | — | 131K | — | ✓ | ✓ | Preview |
| Moonshot V1 32K Vision Preview PREVIEW | $1.00 | $3.00 | — | 32K | — | ✓ | ✓ | Preview |
| Moonshot V1 8K Vision Preview PREVIEW | $0.2 | $2.00 | — | 8K | — | ✓ | ✓ | Preview |
| Kimi K2 (0905 Preview) DEPRECATED | $0.6 | $2.50 | $0.15−75% | 262K | — | ✗ | ✓ | Deprecated |
| Kimi K2 (0711 Preview) DEPRECATED | $0.6 | $2.50 | $0.15−75% | 131K | — | ✗ | ✓ | Deprecated |
| Kimi K2 Turbo Preview DEPRECATED | $1.15 | $8.00 | $0.15−87% | 262K | — | ✗ | ✓ | Deprecated |
| Kimi K2 Thinking DEPRECATED | $0.6 | $2.50 | $0.15−75% | 262K | — | ✗ | ✓ | Deprecated |
| Kimi K2 Thinking Turbo DEPRECATED | $1.15 | $8.00 | $0.15−87% | 262K | — | ✗ | ✓ | Deprecated |
| Moonshot V1 (128K) LEGACY | $2.00 | $5.00 | — | 131K | — | ✗ | ✓ | Legacy |
| Moonshot V1 (32K) LEGACY | $1.00 | $3.00 | — | 32K | — | ✗ | ✓ | Legacy |
| Moonshot V1 (8K) LEGACY | $0.2 | $2.00 | — | 8K | — | ✗ | ✓ | Legacy |
The Kimi story so far.
Releases and funding from Moonshot's Kimi line. Sourced from Moonshot's GitHub/model cards, TechCrunch, and Wikipedia — verified at publication.
platform.kimi.ai — direct API with an OpenAI-compatible endpoint and automatic context caching (cache-hit input is a fraction of the base rate). Billed per token.
OPEN-WEIGHT · SELF-HOSTKimi K2 weights — a 1-trillion-parameter MoE — are published on Hugging Face under a Modified MIT license, fully self-hostable with no per-token fee.
CONSUMER · APPThe Kimi assistant (kimi.com) — a free long-context chat app that made its name handling very long documents. For end-user usage; no API on this surface.
Moonshot AI (月之暗面) was founded in March 2023 in Beijing by Yang Zhilin (a former Meta AI and Google Brain researcher) with Tsinghua University schoolmates Zhou Xinyu and Wu Yuxin. Its consumer assistant, Kimi, launched in October 2023 and built its reputation on very long context — handling roughly 200,000 Chinese characters per conversation, well ahead of peers at the time.
Moonshot's technical signature is the Kimi K2 line: a 1-trillion-parameter mixture-of-experts model with ~32B active parameters, trained with the lab's own MuonClip optimizer and released as open weights (Modified MIT) on Hugging Face. K2.5 (January 2026) added native multimodality via a MoonViT vision encoder, and K2.6 (April 2026) is the current open-weight flagship, shipping variants from quick chat up to large multi-agent swarms.
Funding has been rapid: an Alibaba-led $1B round in February 2024 (at a $2.5B valuation), rising to a $2B raise at a ~$20B valuation in May 2026 led by Meituan's Long-Z, with Tencent, HongShan, China Mobile, and others among its backers — making Moonshot one of China's top-funded LLM startups.
Pricing on the Kimi Platform is competitive, and the API offers automatic context caching that drops cache-hit input to a fraction of the base rate — meaningful for long-context workloads. The catalog has consolidated: the dated K2 preview models retire on May 25, 2026, leaving K2.6 and K2.5 as the current generation.
The trade-offs: Kimi is China-based, so the direct API stores data under Chinese rules — though the open weights let you self-host anywhere. Versus DeepSeek it competes directly on open-weight frontier quality; versus Qwen it offers fewer model sizes but a distinctive 1T-parameter open MoE and long-context pedigree.
DeepSeek
MIT open weights at the lowest cost-per-token and frontier reasoning quality. The closest rival to Kimi K2's open-weight pitch — text-only, though, where K2 is multimodal.
CHINESE PEERAlibaba (Qwen)
Broad open-weight Qwen catalog with many sizes and modalities. Moonshot counters with a single, very large 1T-parameter open MoE and long-context heritage.
CHINESE PEERZhipu (GLM)
GLM family with free tiers and strong agentic models. Comparable China-residency profile; Kimi leads on raw parameter scale and long context.
PRIMARY RIVALOpenAI
Frontier benchmarks + global ecosystem (GPT-5, ChatGPT). Closed weights and several times pricier than Kimi K2, with no self-host option.
FRONTIER LEADERAnthropic
Leads coding + reasoning with Claude. No open weights and far higher prices — Kimi K2 targets the same agentic-coding use cases at a fraction of the cost.
Frequently asked.
Practical questions about Kimi K2, open weights, and long context.
Q · 01 Which Kimi model should I start with? +
$0.95/$4.00) — the open-weight flagship with multimodality, reasoning, and agentic tool use. For cheaper multimodal at near-flagship quality, use Kimi K2.5 ($0.60/$3.00). Both ship a 262K context. See the use case picker above.Q · 02 Are Kimi models really open-weight? +
Q · 03 What is Kimi's long-context heritage? +
262K-token window, and the API's automatic context caching cuts cache-hit input to a fraction of the base rate, which matters for long documents.Q · 04 Where is my data stored? +
Q · 05 What happened to the older K2 preview models? +
kimi-k2-0711, 0905, Turbo, and Thinking — retire on May 25, 2026. Moonshot's recommended replacements are K2.5 and K2.6, which are cheaper and stronger. New integrations should target the current generation.