Last verified 2026-06-19

FLAGSHIP GLM1M CONTEXT128K OUTPUTDEEP THINKINGTOOL STREAMINGTEXT + CODE

GLM-5.2 API Pricing

Q: What is GLM-5.2 priced at?

Z.AI lists GLM-5.2 at $1.4/M input, $0.26/M cached input, and $4.4/M output. Cached input storage is marked as limited-time free on the pricing page.

Q: Is prompt caching priced separately?

Yes. The pricing page lists cached input at $0.26/M. Cached input storage is currently shown as limited-time free, so this page models cache-hit input but not storage charges.

Q: How accurate is the tokenizer estimate?

The widget uses a GLM-family planning estimate of 3.85 characters per token. Exact billing can vary by language, tool calls, hidden reasoning tokens, and cache boundaries.

GLM-5.2 is Z.AI's current flagship text model for long-horizon coding and project-scale engineering. The live vendor table lists $1.4/M input and $4.4/M output, with cached input at $0.26/M. The model doc lists 1M context and 128K max output.

Input - per 1M tokens

$1.40/M

Source Z.AI flat

Output - per 1M tokens

$4.40/M

Context 1M flat

Cached input - per 1M tokens

$0.26/M

Storage limited-time free -81%

Effective - agentic blend

$0.78/M

92/8 split - 82% cache

§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-5.2 rates. Tweak spend, output mix, or cache assumptions for 1M-context coding and agentic workloads.

Spend

$ /mo

Workload split

Prompt cache hit rate

Tokens you can process

—

Words equivalent (English)

—

Effective rate

—

Open full calculator (all models · share URL · CSV) →

§ 02 / SCENARIOS

Real-world presets.

CODING AGENT

Repo implementation

$0.044/task

22,000 in - 3,000 out~2,273 units/$100

CODE REVIEW

Pull request review

$0.028/review

14,000 in - 1,800 out~3,636 units/$100

RAG

Knowledge base answer

$0.017/query

9,000 in - 1,000 out~5,882 units/$100

CHATBOT

Support assistant

$0.006/turn

2,500 in - 600 out~16,393 units/$100

§ 03 / TOKENIZER

Paste text. See tokens. See cost.

Your text · live count

Calibrated · measured on the vendor's tokenizer · 2026-06-10 Auto-counts as you type

Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (zai-org/GLM-5, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.

Characters 614

Words 88

Tokens (estimated) 117 tokens

Cost as input · uncached $0.000164 USD

Cost as output · uncached $0.000515 USD

Cost as cached input $0.000030 USD

§ 04 / SHELF

Up against the shelf.

All models →

Model	Input /M	Output /M	Effective blended	Context	Best for
GLM-5.2 Current	$1.40 cache $0.26	$4.40	$0.78 agentic 92/8	1M	1M-context engineering
GLM-5.1	$1.40 cache $0.26	$4.40	$0.78 same price	200K	Predecessor flagship
GLM-5	$1.00 cache $0.20	$3.20	$0.57 cheaper	200K	Cheaper GLM-5 base
GLM-5 Turbo	$1.20 cache $0.24	$4.00	$0.70 cheaper	200K	Latency-optimized GLM-5
GLM-4.7	$0.60 cache $0.11	$2.20	$0.36 cheaper	200K	Budget coding agent
DeepSeek V4 Pro	$0.43 cache $0.00	$0.87	$0.24 cheaper	128K	Low-cost reasoning
Qwen3 Max	$1.20 cache $1.20	$6.00	$0.90 Qwen flagship	1M	Alibaba flagship

Frequently asked.

Straight answers for teams estimating GLM-5.2 API bills, migration from GLM-5.1, and the effect of caching on long-context engineering workloads.

Q · 01 What is GLM-5.2 priced at? +

Z.AI lists GLM-5.2 at $1.4/M input, $0.26/M cached input, and $4.4/M output. Cached input storage is marked as limited-time free on the pricing page.

Q · 02 How is the effective price calculated? +

The headline effective tile uses the site standard agentic blend: 92% input, 8% output, and 82% input cache hits. For GLM-5.2, that lands at about $0.78/M blended tokens.

Q · 03 Is GLM-5.2 more expensive than GLM-5.1? +

No. The current Z.AI pricing table lists GLM-5.2 and GLM-5.1 at the same API rates: $1.4/M input, $0.26/M cached input, and $4.4/M output.

Q · 04 What changed versus GLM-5.1? +

The model page positions GLM-5.2 as the long-horizon flagship with 1M context, 128K maximum output, stronger coding benchmarks, deep thinking, function calling, structured output, context caching, MCP, and tool streaming support.

Q · 05 Is prompt caching priced separately? +

Yes. The pricing page lists cached input at $0.26/M. Cached input storage is currently shown as limited-time free, so this page models cache-hit input but not storage charges.

Q · 06 How accurate is the tokenizer estimate? +

The widget uses a GLM-family planning estimate of 3.85 characters per token. Exact billing can vary by language, tool calls, hidden reasoning tokens, and cache boundaries.

Reviewed by Yaroslav Vikhariev Founder - AI//COST - Pricing pulled daily from docs.z.ai - Last verified June 19, 2026

Methodology Report a correction More by Y.V.