Last verified 2026-05-19

FLAGSHIP GLM200K CONTEXT128K OUTPUTPROMPT CACHINGTEXT + CODE

GLM-5.1 API Pricing

Q: What is GLM-5.1 priced at?

GLM-5.1 is listed at $1.4/M input and $4.4/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.

Q: Is prompt caching priced separately?

Yes. The vendor table lists cached input at $0.26/M versus $1.4/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.

Q: How accurate is the tokenizer estimate?

The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.

GLM-5.1 is Zhipu's current flagship text model for agentic coding and long-horizon work. The live vendor table lists $1.4/M input and $4.4/M output, with cached input at $0.26/M. Pulled directly from docs.z.ai daily.

Input - per 1M tokens

$1.40/M

Source Z.AI flat

Output - per 1M tokens

$4.40/M

Context 200K flat

Cached input - per 1M tokens

$0.26/M

Storage limited-time free -81%

Effective - agentic blend

$0.78/M

92/8 split - 82% cache

§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current GLM-5.1 rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

Spend

$ /mo

Workload split

Prompt cache hit rate

Tokens you can process

—

Words equivalent (English)

—

Effective rate

—

Open full calculator (all models · share URL · CSV) →

§ 02 / SCENARIOS

Real-world presets.

CODING AGENT

Repo implementation

$0.044/task

22,000 in - 3,000 out~2,273 units/$100

CODE REVIEW

Pull request review

$0.028/review

14,000 in - 1,800 out~3,636 units/$100

RAG

Knowledge base answer

$0.017/query

9,000 in - 1,000 out~5,882 units/$100

CHATBOT

Support assistant

$0.006/turn

2,500 in - 600 out~16,393 units/$100

§ 03 / TOKENIZER

Paste text. See tokens. See cost.

Your text · live count

Calibrated · measured on the vendor's tokenizer · 2026-06-10 Auto-counts as you type

Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (zai-org/GLM-5, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.

Characters 480

Words 70

Tokens (estimated) 91 tokens

Cost as input · uncached $0.000127 USD

Cost as output · uncached $0.000400 USD

Cost as cached input $0.000024 USD

§ 04 / SHELF

Up against the shelf.

All models →

Model	Input /M	Output /M	Effective blended	Context	Best for
GLM-5.1 Current	$1.40 cache $0.26	$4.40	$0.78 agentic 92/8	200K	Flagship agentic coding
GLM-5	$1.00 cache $0.20	$3.20	$0.57 cheaper	200K	Cheaper GLM-5 family base
GLM-5 Turbo	$1.20 cache $0.24	$4.00	$0.70 cheaper	200K	Latency-optimized GLM-5
GLM-4.7	$0.60 cache $0.11	$2.20	$0.36 cheaper	200K	Mid-tier GLM agents
Kimi K2.6	$0.95 cache $0.16	$4.00	$0.60 cheaper	262K	Moonshot frontier alternative
DeepSeek V4 Pro	$0.43 cache $0.00	$0.87	$0.14 cheaper	1M	DeepSeek frontier discount tier
Qwen3 Coder Plus	$1.00	$5.00	$1.32 pricier	1M	Qwen code flagship

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is GLM-5.1 priced at? +

GLM-5.1 is listed at $1.4/M input and $4.4/M output on the live Z.AI pricing table. This page stores USD per-million-token pricing.

Q · 02 How is the effective price calculated? +

AI//COST uses the same 92/8 agentic blend everywhere. With an 82% cache hit rate, GLM-5.1's effective blended cost is $0.78/M.

Q · 03 Is prompt caching priced separately? +

Yes. The vendor table lists cached input at $0.26/M versus $1.4/M fresh input. Cached input storage is listed as limited-time free on the Z.AI page.

Q · 04 Are regional prices different? +

Z.AI publishes the official developer pricing page in USD; Chinese BigModel pages may surface the same model catalog in Chinese. The quote tiles use the baseline row named in the source page, not a reseller or proxy price.

Q · 05 Is there a batch discount? +

No separate batch discount row is listed on the Z.AI pricing page for GLM text models. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant when supported.

Q · 06 How accurate is the tokenizer estimate? +

The browser widget uses a zhipu-tokenizer-estimate chars-per-token estimate for English text. It is useful for rough planning, but actual billing comes from the vendor API usage fields and can differ for Chinese, code, or mixed-language prompts.

Reviewed by Yaroslav Vikhariev Founder - AI//COST - Pricing pulled daily from docs.z.ai - Last verified May 19, 2026

Methodology Report a correction More by Y.V.