Last verified
BUDGET GLM128K CONTEXT96K OUTPUTPROMPT CACHINGTEXT + CODE

GLM-4.5 Air API Pricing

GLM-4.5 Air is Zhipu's budget Air tier for cost-sensitive coding, reasoning, and agent workloads. The live vendor table lists $0.2/M input and $1.1/M output, with cached input at $0.03/M. Pulled directly from docs.z.ai daily.

Input - per 1M tokens
$0.20/M
Source Z.AI flat
Output - per 1M tokens
$1.10/M
Context 128K flat
Cached input - per 1M tokens
$0.03/M
Storage limited-time free -85%
Effective - agentic blend
$0.14/M
92/8 split - 82% cache
§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-4.5 Air rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
Words equivalent (English)
Effective rate
§ 02 / SCENARIOS

Real-world presets.

§ 03 / TAPE

Price history.

GLM-4.5 Air is listed at $0.2/M input and $1.1/M output on the live Z.AI pricing table.

Input · $0.20/M
Output · $1.1/M
Cached · $0.03/M
JUL 28 Launch-wave list price stored at $0.2/M input and $1.1/M outputMAY 19 Live verification kept $0.2/M input and $1.1/M output
§ 04 / TOKENIZER

Paste text. See tokens. See cost.

Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type

This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.

Characters
Words
Tokens (estimated)
Cost as input · uncached
Cost as output · uncached
Cost as cached input
§ 05 / SHELF

Up against the shelf.

All models →
Model Input /M Output /M Effective blended Context Best for
GLM-4.5 Air Current $0.20 cache $0.03 $1.10 $0.14 agentic 92/8 128K Budget GLM agents
GLM-4.7 FlashX $0.07 cache $0.01 $0.40 $0.05 cheaper 200K Ultra-cheap GLM traffic
GLM-4.7 Flash $0.00 cache $0.00 $0.00 $0.00 cheaper 200K Free registered-user tier
GLM-4.6 $0.60 cache $0.11 $2.20 $0.36 pricier 200K More capable GLM sibling
Qwen3 Coder Flash $0.30 $1.50 $0.40 pricier 1M Cheap Qwen coding
DeepSeek V4 Flash $0.14 cache $0.00 $0.28 $0.05 cheaper 1M Budget reasoning and coding
Gemini 2.5 Flash $0.30 cache $0.03 $2.50 $0.27 pricier 1M Google long-context budget

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is GLM-4.5 Air priced at? +
GLM-4.5 Air is listed at $0.2/M input and $1.1/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.
Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-4.5 Air's effective blended cost is $0.14/M.
Q · 03 Is prompt caching priced separately? +
Yes. The vendor table lists cached input at $0.03/M versus $0.2/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.
Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.