Last verified
FLAGSHIP GLM200K CONTEXT128K OUTPUTPROMPT CACHINGTEXT + CODE

GLM-5.1 API Pricing

GLM-5.1 is Zhipu's current flagship text model for agentic coding and long-horizon work. The live vendor table lists $1.4/M input and $4.4/M output, with cached input at $0.26/M. Pulled directly from docs.z.ai daily.

Input - per 1M tokens
$1.40/M
Source Z.AI flat
Output - per 1M tokens
$4.40/M
Context 200K flat
Cached input - per 1M tokens
$0.26/M
Storage limited-time free -81%
Effective - agentic blend
$0.78/M
92/8 split - 82% cache
§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-5.1 rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
Words equivalent (English)
Effective rate
§ 02 / SCENARIOS

Real-world presets.

§ 03 / TAPE

Price history.

GLM-5.1 is listed at $1.4/M input and $4.4/M output on the live Z.AI pricing table.

Input · $1.4/M
Output · $4.4/M
Cached · $0.26/M
APR 03 Launch-wave list price stored at $1.4/M input and $4.4/M outputMAY 19 Live verification kept $1.4/M input and $4.4/M output
§ 04 / TOKENIZER

Paste text. See tokens. See cost.

Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type

This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.

Characters
Words
Tokens (estimated)
Cost as input · uncached
Cost as output · uncached
Cost as cached input
§ 05 / SHELF

Up against the shelf.

All models →
Model Input /M Output /M Effective blended Context Best for
GLM-5.1 Current $1.40 cache $0.26 $4.40 $0.78 agentic 92/8 200K Flagship agentic coding
GLM-5 $1.00 cache $0.20 $3.20 $0.57 cheaper 200K Cheaper GLM-5 family base
GLM-5 Turbo $1.20 cache $0.24 $4.00 $0.70 cheaper 200K Latency-optimized GLM-5
GLM-4.7 $0.60 cache $0.11 $2.20 $0.36 cheaper 200K Mid-tier GLM agents
Kimi K2.6 $0.95 cache $0.16 $4.00 $0.60 cheaper 262K Moonshot frontier alternative
DeepSeek V4 Pro $0.43 cache $0.00 $0.87 $0.14 cheaper 1M DeepSeek frontier discount tier
Qwen3 Coder Plus $1.00 $5.00 $1.32 pricier 1M Qwen code flagship

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is GLM-5.1 priced at? +
GLM-5.1 is listed at $1.4/M input and $4.4/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.
Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-5.1's effective blended cost is $0.78/M.
Q · 03 Is prompt caching priced separately? +
Yes. The vendor table lists cached input at $0.26/M versus $1.4/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.
Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.