OPENCLAW GLM200K CONTEXT128K OUTPUTPROMPT CACHINGTEXT + CODE
GLM-5 Turbo API Pricing
GLM-5 Turbo is Zhipu's latency-optimized GLM-5 variant for OpenClaw-style long-chain agent tasks. The live vendor table lists $1.2/M input and $4/M output, with cached input at $0.24/M. Pulled directly from docs.z.ai daily.
Input - per 1M tokens
$1.20/M
Source Z.AI flat
Output - per 1M tokens
$4.00/M
Context 200K flat
Cached input - per 1M tokens
$0.24/M
Storage limited-time free -80%
Effective - agentic blend
$0.70/M
92/8 split - 82% cache
§ 01 / TERMINAL
Run the numbers.
Live calculator pre-loaded with current GLM-5 Turbo rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.
$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
—
Words equivalent (English)
—
Effective rate
—
§ 02 / SCENARIOS
Real-world presets.
CODING AGENT
Repo implementation
$0.038/task
CODE REVIEW
Pull request review
$0.024/review
RAG
Knowledge base answer
$0.015/query
CHATBOT
Support assistant
$0.005/turn
§ 03 / TAPE
Price history.
Input · $1.2/M
Output · $4/M
Cached · $0.24/M
MAR 15 Launch-wave list price stored at $1.2/M input and $4/M outputMAY 19 Live verification kept $1.2/M input and $4/M output
§ 04 / TOKENIZER
Paste text. See tokens. See cost.
Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type
This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.
Characters —
Words —
Tokens (estimated) —
Cost as input · uncached —
Cost as output · uncached —
Cost as cached input —
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| GLM-5 Turbo Current | $1.20 cache $0.24 | $4.00 | $0.70 agentic 92/8 | 200K | Latency-optimized GLM-5 |
| GLM-5.1 | $1.40 cache $0.26 | $4.40 | $0.78 pricier | 200K | Flagship agentic coding |
| GLM-5 | $1.00 cache $0.20 | $3.20 | $0.57 cheaper | 200K | Cheaper GLM-5 family base |
| GLM-4.7 | $0.60 cache $0.11 | $2.20 | $0.36 cheaper | 200K | Mid-tier GLM agents |
| GLM-4.7 FlashX | $0.07 cache $0.01 | $0.40 | $0.05 cheaper | 200K | Ultra-cheap GLM traffic |
| GLM-4.7 Flash | $0.00 cache $0.00 | $0.00 | $0.00 cheaper | 200K | Free registered-user tier |
| DeepSeek V4 Pro | $0.43 cache $0.00 | $0.87 | $0.14 cheaper | 1M | DeepSeek frontier discount tier |
| Qwen3 Coder Plus | $1.00 | $5.00 | $1.32 pricier | 1M | Qwen code flagship |
Frequently asked.
Practical pricing questions, separated from calculator assumptions and regional tiers.
Q · 01 What is GLM-5 Turbo priced at? +
GLM-5 Turbo is listed at
$1.2/M input and $4/M output on the live Z.AI pricing table. Cached input is listed at $0.24/M. This page stores USD per-million-token pricing.Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-5 Turbo's effective blended cost is
$0.7/M.Q · 03 Is prompt caching priced separately? +
Yes. The vendor table lists cached input at
$0.24/M versus $1.2/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD. Chinese BigModel pages may surface overlapping model catalogs, but the quote tiles use the baseline row from the Z.AI developer pricing page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM-5 Turbo. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant only when the vendor documents it.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a
zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.