Last verified
FLAGSHIP GLM1M CONTEXT128K OUTPUTDEEP THINKINGTOOL STREAMINGTEXT + CODE

GLM-5.2 API Pricing

GLM-5.2 is Z.AI's current flagship text model for long-horizon coding and project-scale engineering. The live vendor table lists $1.4/M input and $4.4/M output, with cached input at $0.26/M. The model doc lists 1M context and 128K max output.

Input - per 1M tokens
$1.40/M
Source Z.AI flat
Output - per 1M tokens
$4.40/M
Context 1M flat
Cached input - per 1M tokens
$0.26/M
Storage limited-time free -81%
Effective - agentic blend
$0.78/M
92/8 split - 82% cache
§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-5.2 rates. Tweak spend, output mix, or cache assumptions for 1M-context coding and agentic workloads.

$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
Words equivalent (English)
Effective rate
Open full calculator (all models · share URL · CSV) →
§ 02 / SCENARIOS

Real-world presets.

§ 03 / TOKENIZER

Paste text. See tokens. See cost.

Calibrated · measured on the vendor's tokenizer · 2026-06-10 Auto-counts as you type

Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (zai-org/GLM-5, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.

Characters 614
Words 88
Tokens (estimated) 117 tokens
Cost as input · uncached $0.000164 USD
Cost as output · uncached $0.000515 USD
Cost as cached input $0.000030 USD
§ 04 / SHELF

Up against the shelf.

All models →
Model Input /M Output /M Effective blended Context Best for
GLM-5.2 Current $1.40 cache $0.26 $4.40 $0.78 agentic 92/8 1M 1M-context engineering
GLM-5.1 $1.40 cache $0.26 $4.40 $0.78 same price 200K Predecessor flagship
GLM-5 $1.00 cache $0.20 $3.20 $0.57 cheaper 200K Cheaper GLM-5 base
GLM-5 Turbo $1.20 cache $0.24 $4.00 $0.70 cheaper 200K Latency-optimized GLM-5
GLM-4.7 $0.60 cache $0.11 $2.20 $0.36 cheaper 200K Budget coding agent
DeepSeek V4 Pro $0.43 cache $0.00 $0.87 $0.24 cheaper 128K Low-cost reasoning
Qwen3 Max $1.20 cache $1.20 $6.00 $0.90 Qwen flagship 1M Alibaba flagship

Frequently asked.

Straight answers for teams estimating GLM-5.2 API bills, migration from GLM-5.1, and the effect of caching on long-context engineering workloads.

Q · 01 What is GLM-5.2 priced at? +
Z.AI lists GLM-5.2 at $1.4/M input, $0.26/M cached input, and $4.4/M output. Cached input storage is marked as limited-time free on the pricing page.
Q · 02 How is the effective price calculated? +
The headline effective tile uses the site standard agentic blend: 92% input, 8% output, and 82% input cache hits. For GLM-5.2, that lands at about $0.78/M blended tokens.
Q · 03 Is GLM-5.2 more expensive than GLM-5.1? +
No. The current Z.AI pricing table lists GLM-5.2 and GLM-5.1 at the same API rates: $1.4/M input, $0.26/M cached input, and $4.4/M output.
Q · 04 What changed versus GLM-5.1? +
The model page positions GLM-5.2 as the long-horizon flagship with 1M context, 128K maximum output, stronger coding benchmarks, deep thinking, function calling, structured output, context caching, MCP, and tool streaming support.
Q · 05 Is prompt caching priced separately? +
Yes. The pricing page lists cached input at $0.26/M. Cached input storage is currently shown as limited-time free, so this page models cache-hit input but not storage charges.
Q · 06 How accurate is the tokenizer estimate? +
The widget uses a GLM-family planning estimate of 3.85 characters per token. Exact billing can vary by language, tool calls, hidden reasoning tokens, and cache boundaries.