GLM-4 32B (0414, 128K) API Pricing
GLM-4 32B (0414, 128K) is the low-cost dated GLM-4 32B snapshot with symmetric input and output pricing. The live Z.AI pricing table lists $0.1/M input and $0.1/M output, with no separate cache-read row for this SKU. Pulled directly from docs.z.ai daily.
Run the numbers.
Live calculator pre-loaded with current GLM-4 32B (0414, 128K) rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.
Real-world presets.
Small repo implementation
Pull request review
Knowledge base answer
Support assistant
Price history.
Paste text. See tokens. See cost.
This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| GLM-4 32B (0414, 128K) Current | $0.10 | $0.10 | $0.10 agentic 92/8 | 128K | Low-cost open-weight GLM |
| GLM-4.7 FlashX | $0.07 cache $0.01 | $0.40 | $0.05 cheaper | 200K | Ultra-cheap GLM traffic |
| GLM-4.5 Air | $0.20 cache $0.03 | $1.10 | $0.14 pricier | 128K | Budget GLM-4.5 production |
| GLM-4.7 | $0.60 cache $0.11 | $2.20 | $0.36 pricier | 200K | Mid-tier GLM agents |
| GLM-5 | $1.00 cache $0.20 | $3.20 | $0.57 pricier | 200K | Balanced GLM flagship |
| DeepSeek V4 Flash | $0.14 cache $0.00 | $0.28 | $0.05 cheaper | 1M | Budget reasoning and coding |
| Qwen 3.5 Flash | $0.10 | $0.40 | $0.12 pricier | 1M | Bulk Qwen long-context work |
Frequently asked.
Practical pricing questions, separated from calculator assumptions and regional tiers.
Q · 01 What is GLM-4 32B (0414, 128K) priced at? +
$0.1/M input and $0.1/M output on the live Z.AI pricing table. No separate cached-input row is listed for this SKU. This page stores USD per-million-token pricing.Q · 02 How is the effective price calculated? +
$0.1/M.Q · 03 Is prompt caching priced separately? +
$0.1/M fresh input until the vendor publishes a separate cache row.Q · 04 Are regional prices different? +
Q · 05 Is there a batch discount? +
Q · 06 How accurate is the tokenizer estimate? +
zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.