FLAGSHIP GLM200K CONTEXT128K OUTPUTPROMPT CACHINGTEXT + CODE
GLM-5.1 API Pricing
GLM-5.1 is Zhipu's current flagship text model for agentic coding and long-horizon work. The live vendor table lists $1.4/M input and $4.4/M output, with cached input at $0.26/M. Pulled directly from docs.z.ai daily.
Input - per 1M tokens
$1.40/M
Source Z.AI flat
Output - per 1M tokens
$4.40/M
Context 200K flat
Cached input - per 1M tokens
$0.26/M
Storage limited-time free -81%
Effective - agentic blend
$0.78/M
92/8 split - 82% cache
§ 01 / TERMINAL
Run the numbers.
Live calculator pre-loaded with current GLM-5.1 rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.
$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
—
Words equivalent (English)
—
Effective rate
—
§ 02 / SCENARIOS
Real-world presets.
CODING AGENT
Repo implementation
$0.044/task
CODE REVIEW
Pull request review
$0.028/review
RAG
Knowledge base answer
$0.017/query
CHATBOT
Support assistant
$0.006/turn
§ 03 / TAPE
Price history.
Input · $1.4/M
Output · $4.4/M
Cached · $0.26/M
APR 03 Launch-wave list price stored at $1.4/M input and $4.4/M outputMAY 19 Live verification kept $1.4/M input and $4.4/M output
§ 04 / TOKENIZER
Paste text. See tokens. See cost.
Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type
This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.
Characters —
Words —
Tokens (estimated) —
Cost as input · uncached —
Cost as output · uncached —
Cost as cached input —
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| GLM-5.1 Current | $1.40 cache $0.26 | $4.40 | $0.78 agentic 92/8 | 200K | Flagship agentic coding |
| GLM-5 | $1.00 cache $0.20 | $3.20 | $0.57 cheaper | 200K | Cheaper GLM-5 family base |
| GLM-5 Turbo | $1.20 cache $0.24 | $4.00 | $0.70 cheaper | 200K | Latency-optimized GLM-5 |
| GLM-4.7 | $0.60 cache $0.11 | $2.20 | $0.36 cheaper | 200K | Mid-tier GLM agents |
| Kimi K2.6 | $0.95 cache $0.16 | $4.00 | $0.60 cheaper | 262K | Moonshot frontier alternative |
| DeepSeek V4 Pro | $0.43 cache $0.00 | $0.87 | $0.14 cheaper | 1M | DeepSeek frontier discount tier |
| Qwen3 Coder Plus | $1.00 | $5.00 | $1.32 pricier | 1M | Qwen code flagship |
Frequently asked.
Practical pricing questions, separated from calculator assumptions and regional tiers.
Q · 01 What is GLM-5.1 priced at? +
GLM-5.1 is listed at
$1.4/M input and $4.4/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-5.1's effective blended cost is
$0.78/M.Q · 03 Is prompt caching priced separately? +
Yes. The vendor table lists cached input at
$0.26/M versus $1.4/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a
zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.