Last verified
MID-TIER GLM200K CONTEXT128K OUTPUTPROMPT CACHINGTEXT + CODE

GLM-4.7 API Pricing

GLM-4.7 is Zhipu's mid-tier GLM model for coding, tool use, and general agent workloads. The live vendor table lists $0.6/M input and $2.2/M output, with cached input at $0.11/M. Pulled directly from docs.z.ai daily.

Input - per 1M tokens
$0.60/M
Source Z.AI flat
Output - per 1M tokens
$2.20/M
Context 200K flat
Cached input - per 1M tokens
$0.11/M
Storage limited-time free -82%
Effective - agentic blend
$0.36/M
92/8 split - 82% cache
§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-4.7 rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
Words equivalent (English)
Effective rate
§ 02 / SCENARIOS

Real-world presets.

§ 03 / TAPE

Price history.

GLM-4.7 is listed at $0.6/M input and $2.2/M output on the live Z.AI pricing table.

Input · $0.60/M
Output · $2.2/M
Cached · $0.11/M
DEC 23 Launch-wave list price stored at $0.6/M input and $2.2/M outputMAY 19 Live verification kept $0.6/M input and $2.2/M output
§ 04 / TOKENIZER

Paste text. See tokens. See cost.

Estimate · zhipu-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type

This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.

Characters
Words
Tokens (estimated)
Cost as input · uncached
Cost as output · uncached
Cost as cached input
§ 05 / SHELF

Up against the shelf.

All models →
Model Input /M Output /M Effective blended Context Best for
GLM-4.7 Current $0.60 cache $0.11 $2.20 $0.36 agentic 92/8 200K Mid-tier GLM agents
GLM-5.1 $1.40 cache $0.26 $4.40 $0.78 pricier 200K Flagship GLM coding
GLM-4.6 $0.60 cache $0.11 $2.20 $0.36 same blend 200K Previous mid-tier GLM
GLM-4.7 FlashX $0.07 cache $0.01 $0.40 $0.05 cheaper 200K Ultra-cheap GLM traffic
GLM-4.7 Flash $0.00 cache $0.00 $0.00 $0.00 cheaper 200K Free registered-user tier
Qwen3 Coder Flash $0.30 $1.50 $0.40 pricier 1M Cheap code-focused Qwen
DeepSeek V4 Flash $0.14 cache $0.00 $0.28 $0.05 cheaper 1M Budget reasoning and coding

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is GLM-4.7 priced at? +
GLM-4.7 is listed at $0.6/M input and $2.2/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.
Q · 02 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-4.7's effective blended cost is $0.36/M.
Q · 03 Is prompt caching priced separately? +
Yes. The vendor table lists cached input at $0.11/M versus $0.6/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.
Q · 04 Are regional prices different? +
Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.
Q · 05 Is there a batch discount? +
No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.
Q · 06 How accurate is the tokenizer estimate? +
The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.