InputsUSD · per-token list prices

Reasoning models bill hidden thinking tokens at the output rate. Cache misses are billed at the vendor's cache-write rate where one applies (e.g. Anthropic 5-min TTL, 1.25× input). Batch mode swaps both rates and turns cache modeling off.

ResultClaude Sonnet 4.6 · 1K req/mo
Monthly cost (API)
$13.50/mo
Per request$0.013
Annual projection$162
Effective blended rate$5.40 /1M tokens
Input cost$6.00/mo
Output cost $7.50/mo
Top 5 cheapest for this workload
  • 1. Leanstral $0 · free
  • 2. Hunyuan Lite $0 · free
  • 3. GLM-4.5 Flash $0 · free
  • 4. GLM-4.7 Flash $0 · free
  • 5. Doubao Seed 1.6 Flash $0.148
Embed this calculator ↓
Try common scenarios

Pre-configured use cases

Click any scenario to load it into the calculator. Costs below are computed from the current snapshot — they move when vendors reprice.

How it works

The math behind the estimate

Each model is priced with its own published levers: list input/output rates, cached-input rates, cache-write premiums where the vendor charges them, and batch-tier rates. Reasoning tokens are billed at the output rate, because that's how vendors bill them.

/* per request */
input_rate  = cache_on ? hit × cached_price + (1 − hit) × write_or_base_price
                       : base_input_price        /* batch mode: batch rates, cache off */
cost        = in_tokens × input_rate / 1M
            + (out_tokens + reasoning_tokens) × output_rate / 1M

/* monthly */
monthly     = cost × requests_per_month

Prices are taken from each vendor's official pricing page and stamped with the date they were last verified (currently 2026-06-09). The comparison table runs your exact workload through all 124 models with each model's own discounts.

Full methodology lives at /methodology/. Found an error? Report it.

Worked examples

Three levers, real numbers

Computed from the current snapshot — recalculated on every data refresh.

Lever 01 · caching
A support bot, before and after caching
10,000 tickets/month on Claude Haiku 4.5 with a 4K-token system prompt + context. At an 80% cache hit rate the same workload costs a fraction of the uncached bill.
ModelClaude Haiku 4.5
Workload10K req · 4,000 in / 600 out
Uncached$70.00/mo
80% cache hit$43.20/mo
−38%
caching pays for the integration in week one
Lever 02 · reasoning tokens
The thinking-token multiplier
5,000 requests/month on GPT-5.5, 500 visible output tokens each. Add a typical 6K-token thinking budget and the bill multiplies — same prompt, same visible answer.
ModelGPT-5.5
Workload5K req · 2,000 in / 500 out
No reasoning$125/mo
+6K thinking$1,025/mo
×8.2
budget thinking tokens before you ship a reasoning model
Lever 03 · batch
Overnight jobs at batch rates
500K classification calls/month on Gemini 3.5 Flash. If the workload tolerates async turnaround, batch rates cut the bill roughly in half — a one-line change for most pipelines.
ModelGemini 3.5 Flash
Workload500K req · 500 in / 60 out
Real-time$645/mo
Batch API$323/mo
−50%
same model, same output — half the rate
Optimization

Five ways to cut the bill

01
Turn on prompt caching

Cached input is up to 90–99% cheaper than fresh input on most vendors. For agents with a stable system prompt, caching alone often cuts the bill by half or more — set the cache hit rate above and watch the savings row. The full math: prompt caching cost math.

02
Route by task complexity

Classification and extraction don't need a frontier model. Routing simple tasks to a budget tier (10–50× cheaper per token) and reserving flagships for reasoning-heavy work is the single biggest lever after caching. Compare tiers in the pricing directory.

03
Trim context aggressively

Every token in the window bills on every call. Summarize old turns, truncate retrieved chunks, and strip boilerplate — a 30% context cut is a 30% input-cost cut.

04
Batch the non-real-time work

Batch APIs run at roughly half price on OpenAI, Anthropic, and Google. If a workload tolerates async turnaround, flip the Batch toggle above and compare.

05
Watch the hidden line items

Reasoning tokens, cache writes, retries, and tool-call overhead don't show up in naive estimates — they show up on your invoice. We covered the common surprises in hidden LLM API costs.

All models

Your workload on every model

Per-request and monthly cost for the scenario configured above, across all 124 live-priced models. Click a column to sort; click a model for its full pricing hub.

Model Context In $/1M Out $/1M Cached Per request Monthly vs №1
Leanstral Mistral documented elsewhere $0.00 $0.00 $0 $0
Hunyuan Lite Tencent (Hunyuan) documented elsewhere $0.00 $0.00 $0 $0
GLM-4.5 Flash Zhipu (Z.ai / GLM) 128K $0.00 $0.00 $0.00 $0 $0
GLM-4.7 Flash Zhipu (Z.ai / GLM) 200K $0.00 $0.00 $0.00 $0 $0
Doubao Seed 1.6 Flash ByteDance (Doubao) 256K $0.02 $0.21 $0.00 $0.00015 $0.148
Doubao Seed 2.0 Mini ByteDance (Doubao) 256K $0.03 $0.28 $0.01 $0.00020 $0.197
Ministral 3 3B Mistral 128K $0.10 $0.10 $0.00025 $0.250
GLM-4 32B (0414, 128K) Zhipu (Z.ai / GLM) 128K $0.10 $0.10 $0.00025 $0.250
Doubao Seed 1.6 Lite ByteDance (Doubao) 256K $0.04 $0.34 $0.01 $0.00025 $0.253
Hunyuan A13B Tencent (Hunyuan) 224K $0.07 $0.28 $0.00028 $0.281
Qwen3 VL Flash Alibaba (Qwen) 256K $0.05 $0.40 $0.00030 $0.300
GLM-4.7 FlashX Zhipu (Z.ai / GLM) 200K $0.07 $0.40 $0.01 $0.00034 $0.340
Baichuan4 Air Baichuan 32K $0.14 $0.14 $0.00035 $0.345
Yi Lightning 01.AI 16K $0.14 $0.14 $0.00035 $0.348
Devstral Small 2 Mistral documented elsewhere $0.10 $0.30 $0.00035 $0.350
Mistral Small 3.2 Mistral documented elsewhere $0.10 $0.30 $0.00035 $0.350
Mistral Small 4 Mistral 256K $0.10 $0.30 $0.10 $0.00035 $0.350
Doubao Seed Character ByteDance (Doubao) 128K $0.11 $0.28 $0.02 $0.00037 $0.367
Hunyuan TurboS Tencent (Hunyuan) documented elsewhere $0.11 $0.28 $0.00037 $0.367
Ministral 3 8B Mistral 128K $0.15 $0.15 $0.00037 $0.375
Doubao Seed 1.6 Vision ByteDance (Doubao) 256K $0.06 $0.56 $0.02 $0.00039 $0.393
Qwen 3.5 Flash Alibaba (Qwen) 1M $0.10 $0.40 $0.00040 $0.400
Gemini 2.5 Flash-Lite Google 1M $0.10 $0.40 $0.01 $0.00040 $0.400
DeepSeek V4 Flash DeepSeek 1M $0.14 $0.28 $0.00 $0.00042 $0.420
Doubao Seed 2.0 Lite ByteDance (Doubao) 256K $0.09 $0.51 $0.02 $0.00042 $0.423
Hunyuan Translation Lite Tencent (Hunyuan) documented elsewhere $0.14 $0.42 $0.00049 $0.493
Ministral 3 14B Mistral 128K $0.20 $0.20 $0.00050 $0.500
Hunyuan T1 Tencent (Hunyuan) documented elsewhere $0.14 $0.56 $0.00056 $0.564
Doubao Seed Translation ByteDance (Doubao) documented elsewhere $0.17 $0.51 $0.00059 $0.592
Hunyuan Translation Tencent (Hunyuan) documented elsewhere $0.17 $0.51 $0.00059 $0.592
Qwen3 32B Alibaba (Qwen) 131K $0.16 $0.64 $0.00064 $0.640
Qwen3 8B Alibaba (Qwen) 128K $0.20 $0.76 $0.00077 $0.769
Doubao Seed 1.6 ByteDance (Doubao) 256K $0.11 $1.13 $0.02 $0.00079 $0.790
Doubao Seed 1.8 ByteDance (Doubao) 256K $0.11 $1.13 $0.02 $0.00079 $0.790
Qwen3 30B A3B Alibaba (Qwen) 128K $0.22 $0.87 $0.00087 $0.867
Qwen3 30B A3B Instruct 2507 Alibaba (Qwen) 262K $0.22 $0.87 $0.00087 $0.867
Doubao Seed Code ByteDance (Doubao) 128K $0.17 $1.13 $0.03 $0.00090 $0.901
GLM-4.5 Air Zhipu (Z.ai / GLM) 128K $0.20 $1.10 $0.03 $0.00095 $0.950
Qwen3 Next 80B A3B Instruct Alibaba (Qwen) 262K $0.16 $1.30 $0.00098 $0.976
Qwen3 Next 80B A3B Thinking Alibaba (Qwen) 262K $0.16 $1.30 $0.00098 $0.976
Qwen3 235B A22B Instruct 2507 Alibaba (Qwen) 262K $0.25 $1.00 $0.00100 $0.997
QwQ 32B Alibaba (Qwen) 131K $0.29 $0.86 $0.00100 $1.00
GPT-5.4 nano OpenAI 400K $0.20 $1.25 $0.02 $0.00103 $1.03
Codestral Mistral documented elsewhere — not on pricing page $0.30 $0.90 $0.00105 $1.05
MiniMax M2 MiniMax 205K $0.30 $1.20 $0.03 $0.00120 $1.20
MiniMax M2-her MiniMax 64K $0.30 $1.20 $0.00120 $1.20
MiniMax M2.1 MiniMax 205K $0.30 $1.20 $0.03 $0.00120 $1.20
MiniMax M2.5 MiniMax 205K $0.30 $1.20 $0.03 $0.00120 $1.20
MiniMax M2.7 MiniMax 205K $0.30 $1.20 $0.06 $0.00120 $1.20
MiniMax M3 MiniMax 1M $0.30 $1.20 $0.06 $0.00120 $1.20
Qwen3 VL Plus Alibaba (Qwen) 256K $0.20 $1.60 $0.00120 $1.20
Gemini 3.1 Flash-Lite Google 1M $0.25 $1.50 $0.03 $0.00125 $1.25
DeepSeek V4 Pro DeepSeek 1M $0.43 $0.87 $0.00 $0.00130 $1.30
Qwen3 Coder Flash Alibaba (Qwen) 1M $0.30 $1.50 $0.00135 $1.35
Qwen3 14B Alibaba (Qwen) 131K $0.35 $1.40 $0.00140 $1.40
Hunyuan 2.0 Instruct Tencent (Hunyuan) 128K $0.45 $1.12 $0.00146 $1.46
Hunyuan T1 Vision Tencent (Hunyuan) 28K $0.42 $1.27 $0.00148 $1.48
Hunyuan TurboS Vision Tencent (Hunyuan) 32K $0.42 $1.27 $0.00148 $1.48
Hunyuan TurboS Vision Video Tencent (Hunyuan) 24K $0.42 $1.27 $0.00148 $1.48
Tencent HY Vision 1.5 Instruct Tencent (Hunyuan) 24K $0.42 $1.27 $0.00148 $1.48
Qwen3 30B A3B Thinking 2507 Alibaba (Qwen) 262K $0.22 $2.60 $0.00173 $1.73
Qwen3 235B A22B Thinking 2507 Alibaba (Qwen) 262K $0.25 $2.49 $0.00175 $1.75
Magistral Small Mistral documented elsewhere $0.50 $1.50 $0.00175 $1.75
Mistral Large 3 Mistral documented elsewhere — not on pricing page $0.50 $1.50 $0.00175 $1.75
Qwen3.7 Plus Alibaba (Qwen) 1M $0.44 $1.77 $0.00177 $1.77
Devstral 2 Mistral documented elsewhere — not on pricing page $0.40 $2.00 $0.00180 $1.80
Gemini 2.5 Flash Google 1M $0.30 $2.50 $0.03 $0.00185 $1.85
Baichuan-M2 Baichuan 32K $0.28 $2.82 $0.00197 $1.97
Qwen 3.5 Plus Alibaba (Qwen) 256K $0.40 $2.40 $0.00200 $2.00
Doubao Seed 2.0 Code ByteDance (Doubao) 256K $0.45 $2.25 $0.09 $0.00203 $2.03
Doubao Seed 2.0 Pro ByteDance (Doubao) 256K $0.45 $2.25 $0.09 $0.00203 $2.03
Baichuan-M3-Plus Baichuan 32K $0.70 $1.27 $0.00204 $2.04
Hunyuan 2.0 Think (HYThink) Tencent (Hunyuan) 128K $0.56 $2.24 $0.00224 $2.24
GLM-4.5 Zhipu (Z.ai / GLM) 128K $0.60 $2.20 $0.11 $0.00230 $2.30
GLM-4.6 Zhipu (Z.ai / GLM) 200K $0.60 $2.20 $0.11 $0.00230 $2.30
GLM-4.7 Zhipu (Z.ai / GLM) 200K $0.60 $2.20 $0.11 $0.00230 $2.30
MiniMax M2.1 Highspeed MiniMax 205K $0.60 $2.40 $0.03 $0.00240 $2.40
MiniMax M2.5 Highspeed MiniMax 205K $0.60 $2.40 $0.03 $0.00240 $2.40
MiniMax M2.7 Highspeed MiniMax 205K $0.60 $2.40 $0.06 $0.00240 $2.40
Qwen 3.5 122B A10B Alibaba (Qwen) 256K $0.40 $3.20 $0.00240 $2.40
Kimi K2.5 Moonshot (Kimi) 262K $0.60 $3.00 $0.10 $0.00270 $2.70
Qwen3 235B A22B Alibaba (Qwen) 131K $0.70 $2.80 $0.00280 $2.80
QwQ Plus Alibaba (Qwen) 131K $0.80 $2.40 $0.00280 $2.80
Qwen 3.5 397B A17B Alibaba (Qwen) 256K $0.60 $3.60 $0.00300 $3.00
GLM-5 Zhipu (Z.ai / GLM) 200K $1.00 $3.20 $0.20 $0.00360 $3.60
Grok 4.20 (0309) Non-Reasoning xAI 1M $1.25 $2.50 $0.20 $0.00375 $3.75
Grok 4.20 (0309) Reasoning xAI 1M $1.25 $2.50 $0.20 $0.00375 $3.75
Grok 4.20 Multi-Agent (0309) xAI 2M $1.25 $2.50 $0.20 $0.00375 $3.75
Grok 4.3 xAI 1M $1.25 $2.50 $0.20 $0.00375 $3.75
GPT-5.4 mini OpenAI 400K $0.75 $4.50 $0.07 $0.00375 $3.75
Kimi K2.6 Moonshot (Kimi) 262K $0.95 $4.00 $0.16 $0.00390 $3.90
Baichuan3-Turbo Baichuan 32K $1.69 $1.69 $0.00422 $4.22
GLM-5 Turbo Zhipu (Z.ai / GLM) 200K $1.20 $4.00 $0.24 $0.00440 $4.40
GLM-4.5 AirX Zhipu (Z.ai / GLM) 128K $1.10 $4.50 $0.22 $0.00445 $4.45
Qwen3 Coder Plus Alibaba (Qwen) 1M $1.00 $5.00 $0.00450 $4.50
Claude Haiku 4.5 Anthropic 200K $1.00 $5.00 $0.10 $0.00450 $4.50
Baichuan-M2-Plus Baichuan 32K $1.41 $4.22 $0.00493 $4.93
Baichuan-M3 Baichuan 32K $1.41 $4.22 $0.00493 $4.93
GLM-5.1 Zhipu (Z.ai / GLM) 200K $1.40 $4.40 $0.26 $0.00500 $5.00
Baichuan4 Turbo Baichuan 32K $2.11 $2.11 $0.00528 $5.28
Qwen3 Max Alibaba (Qwen) 252K $1.20 $6.00 $0.00540 $5.40
Magistral Medium Mistral documented elsewhere — not on pricing page $2.00 $5.00 $0.00650 $6.50
Mistral Medium 3.1 Mistral documented elsewhere $1.50 $7.50 $0.00675 $6.75
Mistral Medium 3.5 Mistral documented elsewhere — not on pricing page $1.50 $7.50 $0.00675 $6.75
Gemini 2.5 Pro Google 2M $1.25 $10.0 $0.13 $0.00750 $7.50
Gemini 3.5 Flash Google 1M $1.50 $9.00 $0.15 $0.00750 $7.50
Baichuan3-Turbo (128K) Baichuan 128K $3.38 $3.38 $0.00845 $8.45
GLM-4.5 X Zhipu (Z.ai / GLM) 128K $2.20 $8.90 $0.45 $0.00885 $8.85
Qwen3.7 Max Alibaba (Qwen) 1M $2.77 $8.31 $0.00969 $9.69
GPT-5.3-Codex OpenAI 400K $1.75 $14.0 $0.17 $0.011 $10.50
GPT-5.4 OpenAI 1.05M $2.50 $15.0 $0.25 $0.013 $12.50
Claude Sonnet 4.5 Anthropic 200K $3.00 $15.0 $0.30 $0.013 $13.50
Claude Sonnet 4.6 Anthropic 1M $3.00 $15.0 $0.30 $0.013 $13.50
Claude Opus 4.5 Anthropic 200K $5.00 $25.0 $0.50 $0.022 $22.50
Claude Opus 4.6 Anthropic 1M $5.00 $25.0 $0.50 $0.022 $22.50
Claude Opus 4.7 Anthropic 1M $5.00 $25.0 $0.50 $0.022 $22.50
Claude Opus 4.8 Anthropic 1M $5.00 $25.0 $0.50 $0.022 $22.50
chat-latest OpenAI 400K $5.00 $30.0 $0.50 $0.025 $25.00
GPT-5.5 OpenAI 1M $5.00 $30.0 $0.50 $0.025 $25.00
Baichuan4 Baichuan 32K $14.1 $14.1 $0.035 $35.21
Claude Fable 5 Anthropic 1M $10.0 $50.0 $1.00 $0.045 $45.00
Claude Opus 4.1 Anthropic 200K $15.0 $75.0 $1.50 $0.068 $67.50
GPT-5.4 Pro OpenAI 1.05M $30.0 $180 $0.150 $150
GPT-5.5 Pro OpenAI 1.05M $30.0 $180 $0.150 $150

Common questions.

How the estimate works, where it can drift from your invoice, and what the levers mean.

Q · 01 How accurate are these estimates? +
The calculator multiplies your token counts by each vendor's published list prices, verified against the vendor's own pricing page (date in the header). It models caching, cache-write premiums, batch tiers, and reasoning tokens — but not negotiated enterprise discounts, regional taxes, or provider-side failures and retries. Treat results as a tight planning estimate, then measure a production sample.
Q · 02 What are reasoning tokens and why do they matter? +
Reasoning (thinking) models generate hidden chain-of-thought tokens before the visible answer, and vendors bill them at the output rate. A request that returns 500 visible tokens can quietly bill thousands of reasoning tokens on top. Most calculators ignore this entirely — it's the single most common reason real bills exceed estimates. Set the reasoning field to your model's typical thinking budget.
Q · 03 How does the cache hit-rate math work? +
Input tokens served from cache bill at the model's cached-input price. Cache misses bill at the vendor's cache-write rate where one exists — Anthropic charges 1.25× base input to write a 5-minute cache entry — otherwise at the base input rate. Formula: input_rate = hit × cached_price + (1 − hit) × write_or_base_price. Models with no published cache pricing ignore the slider, and the hint under the field tells you.
Q · 04 Can I combine batch mode with caching? +
Not in this calculator. Published cached-input prices are standard-tier prices; stacking the batch discount on top would double-count. When Batch is on we use the vendor's batch input/output rates and disable cache modeling — a conservative estimate. If a vendor documents combined batch+cache pricing, we'll model it.
Q · 05 How fresh is the pricing data? +
Every model's rates are verified against the vendor's own pricing page and stamped — the date in the page header is the snapshot's last verification date. When a vendor reprices, the whole page (presets, examples, table) recomputes on rebuild. Spotted a stale number? Report it and we re-verify within a day.
Q · 06 Why might my actual bill still differ? +
Variable context growth in agent loops, tool-call payloads, retries, and tokenizer differences between vendors (the same text can tokenize to different counts — some tokenizers run up to ~35% heavier on identical text). The calculator assumes fixed average tokens per request; measure a representative sample with your vendor's usage dashboard for the first weeks in production.
Q · 07 Can I share or export a scenario? +
Yes — Share scenario copies a URL that encodes every input, so a teammate opens the calculator in exactly your state. Export CSV downloads the full 124-model comparison for your current workload. Both are free, no signup.
Q · 08 Are the $0 models really free? +
The free-tier entries (e.g. GLM Flash tiers, Leanstral, Hunyuan Lite) publish a genuine $0 per-token API price, typically with rate limits, capacity queues, or data-use terms attached. They're real options for prototypes and low-volume tools — read the vendor's terms before betting production traffic on one.
Deep dive

Per-model pricing hubs

Embed

Put this calculator on your site

Free to embed — your readers always see current pricing. The widget links back here for the full comparison table.

<iframe
  src="https://aicost.tools/calculator/llm-api/embed/"
  width="100%" height="760" loading="lazy"
  style="border:1px solid #3d3a32"
  title="LLM API Cost Calculator by AI//COST">
</iframe>
More calculators

The cost toolbox

Every tool runs on the same live-pricing backbone. See the full index →

One weekly digest. Zero noise.

One weekly digest · No spam, ever · Unsubscribe in one click