GLM-5.2 API Pricing
GLM-5.2 is Z.AI's current flagship text model for long-horizon coding and project-scale engineering. The live vendor table lists $1.4/M input and $4.4/M output, with cached input at $0.26/M. The model doc lists 1M context and 128K max output.
Run the numbers.
Live calculator pre-loaded with current GLM-5.2 rates. Tweak spend, output mix, or cache assumptions for 1M-context coding and agentic workloads.
Real-world presets.
Repo implementation
Pull request review
Knowledge base answer
Support assistant
Paste text. See tokens. See cost.
Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (zai-org/GLM-5, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| GLM-5.2 Current | $1.40 cache $0.26 | $4.40 | $0.78 agentic 92/8 | 1M | 1M-context engineering |
| GLM-5.1 | $1.40 cache $0.26 | $4.40 | $0.78 same price | 200K | Predecessor flagship |
| GLM-5 | $1.00 cache $0.20 | $3.20 | $0.57 cheaper | 200K | Cheaper GLM-5 base |
| GLM-5 Turbo | $1.20 cache $0.24 | $4.00 | $0.70 cheaper | 200K | Latency-optimized GLM-5 |
| GLM-4.7 | $0.60 cache $0.11 | $2.20 | $0.36 cheaper | 200K | Budget coding agent |
| DeepSeek V4 Pro | $0.43 cache $0.00 | $0.87 | $0.24 cheaper | 128K | Low-cost reasoning |
| Qwen3 Max | $1.20 cache $1.20 | $6.00 | $0.90 Qwen flagship | 1M | Alibaba flagship |
Frequently asked.
Straight answers for teams estimating GLM-5.2 API bills, migration from GLM-5.1, and the effect of caching on long-context engineering workloads.
Q · 01 What is GLM-5.2 priced at? +
GLM-5.2 at $1.4/M input, $0.26/M cached input, and $4.4/M output. Cached input storage is marked as limited-time free on the pricing page.Q · 02 How is the effective price calculated? +
92% input, 8% output, and 82% input cache hits. For GLM-5.2, that lands at about $0.78/M blended tokens.Q · 03 Is GLM-5.2 more expensive than GLM-5.1? +
$1.4/M input, $0.26/M cached input, and $4.4/M output.Q · 04 What changed versus GLM-5.1? +
1M context, 128K maximum output, stronger coding benchmarks, deep thinking, function calling, structured output, context caching, MCP, and tool streaming support.Q · 05 Is prompt caching priced separately? +
$0.26/M. Cached input storage is currently shown as limited-time free, so this page models cache-hit input but not storage charges.Q · 06 How accurate is the tokenizer estimate? +
3.85 characters per token. Exact billing can vary by language, tool calls, hidden reasoning tokens, and cache boundaries.