Last verified 2026-05-19

BUDGET GLM128K CONTEXT96K OUTPUTPROMPT CACHINGTEXT + CODE

GLM-4.5 Air API Pricing

Q: What is GLM-4.5 Air priced at?

GLM-4.5 Air is listed at $0.2/M input and $1.1/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.

Q: Is prompt caching priced separately?

Yes. The vendor table lists cached input at $0.03/M versus $0.2/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.

Q: How accurate is the tokenizer estimate?

The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.

GLM-4.5 Air is Zhipu's budget Air tier for cost-sensitive coding, reasoning, and agent workloads. The live vendor table lists $0.2/M input and $1.1/M output, with cached input at $0.03/M. Pulled directly from docs.z.ai daily.

Input - per 1M tokens

$0.20/M

Source Z.AI flat

Output - per 1M tokens

$1.10/M

Context 128K flat

Cached input - per 1M tokens

$0.03/M

Storage limited-time free -85%

Effective - agentic blend

$0.14/M

92/8 split - 82% cache

§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-4.5 Air rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

Spend

$ /mo

Workload split

Prompt cache hit rate

Tokens you can process

—

Words equivalent (English)

—

Effective rate

—

Open full calculator (all models · share URL · CSV) →

§ 02 / SCENARIOS

Real-world presets.

CODING AGENT

Repo implementation

$0.008/task

22,000 in - 3,000 out~12,987 units/$100

CODE REVIEW

Pull request review

$0.005/review

14,000 in - 1,800 out~20,833 units/$100

RAG

Knowledge base answer

$0.003/query

9,000 in - 1,000 out~34,483 units/$100

CHATBOT

Support assistant

$0.001/turn

2,500 in - 600 out~83,333 units/$100

§ 03 / TOKENIZER

Paste text. See tokens. See cost.

Your text · live count

Calibrated · measured on the vendor's tokenizer · 2026-06-10 Auto-counts as you type

Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (zai-org/GLM-5, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.

Characters 489

Words 71

Tokens (estimated) 93 tokens

Cost as input · uncached $0.000019 USD

Cost as output · uncached $0.000102 USD

Cost as cached input $0.000003 USD

§ 04 / SHELF

Up against the shelf.

All models →

Model	Input /M	Output /M	Effective blended	Context	Best for
GLM-4.5 Air Current	$0.20 cache $0.03	$1.10	$0.14 agentic 92/8	128K	Budget GLM agents
GLM-4.7 FlashX	$0.07 cache $0.01	$0.40	$0.05 cheaper	200K	Ultra-cheap GLM traffic
GLM-4.7 Flash	$0.00 cache $0.00	$0.00	$0.00 cheaper	200K	Free registered-user tier
GLM-4.6	$0.60 cache $0.11	$2.20	$0.36 pricier	200K	More capable GLM sibling
Qwen3 Coder Flash	$0.30	$1.50	$0.40 pricier	1M	Cheap Qwen coding
DeepSeek V4 Flash	$0.14 cache $0.00	$0.28	$0.05 cheaper	1M	Budget reasoning and coding
Gemini 2.5 Flash	$0.30 cache $0.03	$2.50	$0.27 pricier	1M	Google long-context budget

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is GLM-4.5 Air priced at? +

GLM-4.5 Air is listed at $0.2/M input and $1.1/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.

Q · 02 How is the effective price calculated? +

AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-4.5 Air's effective blended cost is $0.14/M.

Q · 03 Is prompt caching priced separately? +

Yes. The vendor table lists cached input at $0.03/M versus $0.2/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.

Q · 04 Are regional prices different? +

Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.

Q · 05 Is there a batch discount? +

No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.

Q · 06 How accurate is the tokenizer estimate? +

The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.

Reviewed by Yaroslav Vikhariev Founder - AI//COST - Pricing pulled daily from docs.z.ai - Last verified May 19, 2026

Methodology Report a correction More by Y.V.