Last verified
OPEN-WEIGHT GLM128K CONTEXTNO CACHE ROWTEXT + CODEUSD PRICING

GLM-4 32B (0414, 128K) API Pricing

GLM-4 32B (0414, 128K) is the low-cost dated GLM-4 32B snapshot with symmetric input and output pricing. The live Z.AI pricing table lists $0.1/M input and $0.1/M output, with no separate cache-read row for this SKU. Pulled directly from docs.z.ai daily.

Input - per 1M tokens
$0.10/M
Source Z.AI flat
Output - per 1M tokens
$0.10/M
Context 128K flat
Cache N/A
$0.10/M
Cache not listed not listed
Effective - agentic blend
$0.10/M
92/8 split - 82% cache
§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-4 32B (0414, 128K) rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
Words equivalent (English)
Effective rate
§ 02 / SCENARIOS

Real-world presets.

§ 03 / TAPE

Price history.

GLM-4 32B (0414, 128K) is listed at $0.1/M input and $0.1/M output on the live Z.AI pricing table.

Input · $0.10/M
Output · $0.10/M
Cached · $0.10/M
APR 14 Launch-wave list price stored at $0.1/M input and $0.1/M outputMAY 19 Live verification kept $0.1/M input and $0.1/M output
§ 04 / TOKENIZER

Paste text. See tokens. See cost.

Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type

This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.

Characters
Words
Tokens (estimated)
Cost as input · uncached
Cost as output · uncached
Cost as cached input
§ 05 / SHELF

Up against the shelf.

All models →
Model Input /M Output /M Effective blended Context Best for
GLM-4 32B (0414, 128K) Current $0.10 $0.10 $0.10 agentic 92/8 128K Low-cost open-weight GLM
GLM-4.7 FlashX $0.07 cache $0.01 $0.40 $0.05 cheaper 200K Ultra-cheap GLM traffic
GLM-4.5 Air $0.20 cache $0.03 $1.10 $0.14 pricier 128K Budget GLM-4.5 production
GLM-4.7 $0.60 cache $0.11 $2.20 $0.36 pricier 200K Mid-tier GLM agents
GLM-5 $1.00 cache $0.20 $3.20 $0.57 pricier 200K Balanced GLM flagship
DeepSeek V4 Flash $0.14 cache $0.00 $0.28 $0.05 cheaper 1M Budget reasoning and coding
Qwen 3.5 Flash $0.10 $0.40 $0.12 pricier 1M Bulk Qwen long-context work

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is GLM-4 32B (0414, 128K) priced at? +
GLM-4 32B (0414, 128K) is listed at $0.1/M input and $0.1/M output on the live Z.AI pricing table. No separate cached-input row is listed for this SKU. This page stores USD per-million-token pricing.
Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. Because no cache price is listed, cached input is treated as the base input price. GLM-4 32B (0414, 128K)'s effective blended cost is $0.1/M.
Q · 03 Is prompt caching priced separately? +
No. The Z.AI table shows dashes for cached input on GLM-4 32B (0414, 128K), so the calculator treats cache hits as $0.1/M fresh input until the vendor publishes a separate cache row.
Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD. Chinese BigModel pages may surface overlapping model catalogs, but the quote tiles use the baseline row from the Z.AI developer pricing page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM-4 32B (0414, 128K). The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant only when the vendor documents it.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.