GLM-4.5 BASE128K CONTEXT96K OUTPUTPROMPT CACHINGTEXT + CODE
GLM-4.5 API Pricing
GLM-4.5 is Z.AI's base 4.5 MoE model for reasoning, coding, and agent-oriented applications. The live vendor table lists $0.6/M input and $2.2/M output, with cached input at $0.11/M.
Input - per 1M tokens
$0.60/M
Source Z.AI flat
Output - per 1M tokens
$2.20/M
Context 128K flat
Cached input - per 1M tokens
$0.11/M
Storage limited-time free -82%
Effective - agentic blend
$0.36/M
92/8 split - 82% cache
§ 01 / TERMINAL
Run the numbers.
Live calculator pre-loaded with current GLM-4.5 rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.
$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
—
Words equivalent (English)
—
Effective rate
—
§ 02 / SCENARIOS
Real-world presets.
CODING AGENT
Repo implementation
$0.011/task
REASONING
Deep analysis
$0.008/problem
RAG
Knowledge base answer
$0.004/query
CHATBOT
Support assistant
$0.002/turn
§ 03 / TAPE
Price history.
Input · $0.60/M
Output · $2.2/M
Cached · $0.11/M
JUL 28 GLM-4.5 launch-wave price stored at $0.6/M input and $2.2/M outputJUN 09 Live verification kept $0.6/M input, $0.11/M cached input, and $2.2/M output
§ 04 / TOKENIZER
Paste text. See tokens. See cost.
Calibrated · measured on the vendor's tokenizer · 2026-06-10 Auto-counts as you type
Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (zai-org/GLM-5, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.
Characters 462
Words 67
Tokens (estimated) 88 tokens
Cost as input · uncached $0.000053 USD
Cost as output · uncached $0.000194 USD
Cost as cached input $0.000010 USD
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| GLM-4.5 Current | $0.60 cache $0.11 | $2.20 | $0.36 agentic 92/8 | 128K | Open-weight GLM-4.5 base |
| GLM-4.5 X | $2.20 cache $0.45 | $8.90 | $1.42 sibling | 128K | Premium 4.5 reasoning |
| GLM-4.5 Air | $0.20 cache $0.03 | $1.10 | $0.14 sibling | 128K | Budget 4.5 agents |
| GLM-4.5 AirX | $1.10 cache $0.22 | $4.50 | $0.71 sibling | 128K | Faster Air tier |
| GLM-4.6 | $0.60 cache $0.11 | $2.20 | $0.36 sibling | 200K | Newer open GLM coding |
| GLM-4.7 | $0.60 cache $0.11 | $2.20 | $0.36 sibling | 200K | Current coding workhorse |
| GLM-5.1 | $1.40 cache $0.26 | $4.40 | $0.78 sibling | 200K | Current GLM flagship |
| GLM-4.7 FlashX | $0.07 cache $0.01 | $0.40 | $0.05 sibling | 200K | Ultra-cheap GLM traffic |
Audit links
Frequently asked.
Short answers for teams comparing GLM-4.5 with GLM-4.5 Air, GLM-4.5 X, GLM-4.6, and GLM-5 tiers.
Q · 01 What is GLM-4.5's input price? +
Z.AI lists GLM-4.5 at $0.6 per 1M input tokens.
Q · 02 What is GLM-4.5's output price? +
The live Z.AI pricing table lists GLM-4.5 output at $2.2 per 1M output tokens.
Q · 03 Does GLM-4.5 support cached input pricing? +
Yes. Z.AI lists cached input at $0.11/M, with cached-input storage marked as limited-time free.
Q · 04 What is the GLM-4.5 context window? +
The GLM-4.5 model page lists a 128K context length and 96K maximum output tokens.
Q · 05 Is GLM-4.5 a text LLM? +
Yes. The Z.AI GLM-4.5 overview lists text input and text output; this page treats it as a text/code model.
Q · 06 How is the effective price calculated? +
The effective tile uses AI//COST's standard agentic blend: 92% input, 8% output, and 82% cached-input reuse where the vendor lists a cache rate.