BUDGET GLM128K CONTEXT96K OUTPUTPROMPT CACHINGTEXT + CODE
GLM-4.5 Air API Pricing
GLM-4.5 Air is Zhipu's budget Air tier for cost-sensitive coding, reasoning, and agent workloads. The live vendor table lists $0.2/M input and $1.1/M output, with cached input at $0.03/M. Pulled directly from docs.z.ai daily.
Input - per 1M tokens
$0.20/M
Source Z.AI flat
Output - per 1M tokens
$1.10/M
Context 128K flat
Cached input - per 1M tokens
$0.03/M
Storage limited-time free -85%
Effective - agentic blend
$0.14/M
92/8 split - 82% cache
§ 01 / TERMINAL
Run the numbers.
Live calculator pre-loaded with current GLM-4.5 Air rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.
$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
—
Words equivalent (English)
—
Effective rate
—
§ 02 / SCENARIOS
Real-world presets.
CODING AGENT
Repo implementation
$0.008/task
CODE REVIEW
Pull request review
$0.005/review
RAG
Knowledge base answer
$0.003/query
CHATBOT
Support assistant
$0.001/turn
§ 03 / TAPE
Price history.
Input · $0.20/M
Output · $1.1/M
Cached · $0.03/M
JUL 28 Launch-wave list price stored at $0.2/M input and $1.1/M outputMAY 19 Live verification kept $0.2/M input and $1.1/M output
§ 04 / TOKENIZER
Paste text. See tokens. See cost.
Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type
This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.
Characters —
Words —
Tokens (estimated) —
Cost as input · uncached —
Cost as output · uncached —
Cost as cached input —
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| GLM-4.5 Air Current | $0.20 cache $0.03 | $1.10 | $0.14 agentic 92/8 | 128K | Budget GLM agents |
| GLM-4.7 FlashX | $0.07 cache $0.01 | $0.40 | $0.05 cheaper | 200K | Ultra-cheap GLM traffic |
| GLM-4.7 Flash | $0.00 cache $0.00 | $0.00 | $0.00 cheaper | 200K | Free registered-user tier |
| GLM-4.6 | $0.60 cache $0.11 | $2.20 | $0.36 pricier | 200K | More capable GLM sibling |
| Qwen3 Coder Flash | $0.30 | $1.50 | $0.40 pricier | 1M | Cheap Qwen coding |
| DeepSeek V4 Flash | $0.14 cache $0.00 | $0.28 | $0.05 cheaper | 1M | Budget reasoning and coding |
| Gemini 2.5 Flash | $0.30 cache $0.03 | $2.50 | $0.27 pricier | 1M | Google long-context budget |
Frequently asked.
Practical pricing questions, separated from calculator assumptions and regional tiers.
Q · 01 What is GLM-4.5 Air priced at? +
GLM-4.5 Air is listed at
$0.2/M input and $1.1/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-4.5 Air's effective blended cost is
$0.14/M.Q · 03 Is prompt caching priced separately? +
Yes. The vendor table lists cached input at
$0.03/M versus $0.2/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a
zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.