LLM API cost
calculator.
Estimate monthly spend across 124 live-priced models from 14 providers — including the levers most calculators skip: reasoning tokens, cache hit rates with write premiums, and batch tiers. Rates come straight from vendor pricing pages, stamped with their verification date.
Reasoning models bill hidden thinking tokens at the output rate. Cache misses are billed at the vendor's cache-write rate where one applies (e.g. Anthropic 5-min TTL, 1.25× input). Batch mode swaps both rates and turns cache modeling off.
- 1. Leanstral $0 · free
- 2. Hunyuan Lite $0 · free
- 3. GLM-4.5 Flash $0 · free
- 4. GLM-4.7 Flash $0 · free
- 5. Doubao Seed 1.6 Flash $0.148
Pre-configured use cases
Click any scenario to load it into the calculator. Costs below are computed from the current snapshot — they move when vendors reprice.
The math behind the estimate
Each model is priced with its own published levers: list input/output rates, cached-input rates, cache-write premiums where the vendor charges them, and batch-tier rates. Reasoning tokens are billed at the output rate, because that's how vendors bill them.
/* per request */
input_rate = cache_on ? hit × cached_price + (1 − hit) × write_or_base_price
: base_input_price /* batch mode: batch rates, cache off */
cost = in_tokens × input_rate / 1M
+ (out_tokens + reasoning_tokens) × output_rate / 1M
/* monthly */
monthly = cost × requests_per_month Prices are taken from each vendor's official pricing page and stamped with the date they were last verified (currently 2026-06-09). The comparison table runs your exact workload through all 124 models with each model's own discounts.
Full methodology lives at /methodology/. Found an error? Report it.
Three levers, real numbers
Computed from the current snapshot — recalculated on every data refresh.
Five ways to cut the bill
Cached input is up to 90–99% cheaper than fresh input on most vendors. For agents with a stable system prompt, caching alone often cuts the bill by half or more — set the cache hit rate above and watch the savings row. The full math: prompt caching cost math.
Classification and extraction don't need a frontier model. Routing simple tasks to a budget tier (10–50× cheaper per token) and reserving flagships for reasoning-heavy work is the single biggest lever after caching. Compare tiers in the pricing directory.
Every token in the window bills on every call. Summarize old turns, truncate retrieved chunks, and strip boilerplate — a 30% context cut is a 30% input-cost cut.
Batch APIs run at roughly half price on OpenAI, Anthropic, and Google. If a workload tolerates async turnaround, flip the Batch toggle above and compare.
Reasoning tokens, cache writes, retries, and tool-call overhead don't show up in naive estimates — they show up on your invoice. We covered the common surprises in hidden LLM API costs.
Your workload on every model
Per-request and monthly cost for the scenario configured above, across all 124 live-priced models. Click a column to sort; click a model for its full pricing hub.
| Model | Context | In $/1M | Out $/1M | Cached | Per request | Monthly | vs №1 |
|---|---|---|---|---|---|---|---|
| Leanstral Mistral | documented elsewhere | $0.00 | $0.00 | — | $0 | $0 | — |
| Hunyuan Lite Tencent (Hunyuan) | documented elsewhere | $0.00 | $0.00 | — | $0 | $0 | — |
| GLM-4.5 Flash Zhipu (Z.ai / GLM) | 128K | $0.00 | $0.00 | $0.00 | $0 | $0 | — |
| GLM-4.7 Flash Zhipu (Z.ai / GLM) | 200K | $0.00 | $0.00 | $0.00 | $0 | $0 | — |
| Doubao Seed 1.6 Flash ByteDance (Doubao) | 256K | $0.02 | $0.21 | $0.00 | $0.00015 | $0.148 | — |
| Doubao Seed 2.0 Mini ByteDance (Doubao) | 256K | $0.03 | $0.28 | $0.01 | $0.00020 | $0.197 | — |
| Ministral 3 3B Mistral | 128K | $0.10 | $0.10 | — | $0.00025 | $0.250 | — |
| GLM-4 32B (0414, 128K) Zhipu (Z.ai / GLM) | 128K | $0.10 | $0.10 | — | $0.00025 | $0.250 | — |
| Doubao Seed 1.6 Lite ByteDance (Doubao) | 256K | $0.04 | $0.34 | $0.01 | $0.00025 | $0.253 | — |
| Hunyuan A13B Tencent (Hunyuan) | 224K | $0.07 | $0.28 | — | $0.00028 | $0.281 | — |
| Qwen3 VL Flash Alibaba (Qwen) | 256K | $0.05 | $0.40 | — | $0.00030 | $0.300 | — |
| GLM-4.7 FlashX Zhipu (Z.ai / GLM) | 200K | $0.07 | $0.40 | $0.01 | $0.00034 | $0.340 | — |
| Baichuan4 Air Baichuan | 32K | $0.14 | $0.14 | — | $0.00035 | $0.345 | — |
| Yi Lightning 01.AI | 16K | $0.14 | $0.14 | — | $0.00035 | $0.348 | — |
| Devstral Small 2 Mistral | documented elsewhere | $0.10 | $0.30 | — | $0.00035 | $0.350 | — |
| Mistral Small 3.2 Mistral | documented elsewhere | $0.10 | $0.30 | — | $0.00035 | $0.350 | — |
| Mistral Small 4 Mistral | 256K | $0.10 | $0.30 | $0.10 | $0.00035 | $0.350 | — |
| Doubao Seed Character ByteDance (Doubao) | 128K | $0.11 | $0.28 | $0.02 | $0.00037 | $0.367 | — |
| Hunyuan TurboS Tencent (Hunyuan) | documented elsewhere | $0.11 | $0.28 | — | $0.00037 | $0.367 | — |
| Ministral 3 8B Mistral | 128K | $0.15 | $0.15 | — | $0.00037 | $0.375 | — |
| Doubao Seed 1.6 Vision ByteDance (Doubao) | 256K | $0.06 | $0.56 | $0.02 | $0.00039 | $0.393 | — |
| Qwen 3.5 Flash Alibaba (Qwen) | 1M | $0.10 | $0.40 | — | $0.00040 | $0.400 | — |
| Gemini 2.5 Flash-Lite Google | 1M | $0.10 | $0.40 | $0.01 | $0.00040 | $0.400 | — |
| DeepSeek V4 Flash DeepSeek | 1M | $0.14 | $0.28 | $0.00 | $0.00042 | $0.420 | — |
| Doubao Seed 2.0 Lite ByteDance (Doubao) | 256K | $0.09 | $0.51 | $0.02 | $0.00042 | $0.423 | — |
| Hunyuan Translation Lite Tencent (Hunyuan) | documented elsewhere | $0.14 | $0.42 | — | $0.00049 | $0.493 | — |
| Ministral 3 14B Mistral | 128K | $0.20 | $0.20 | — | $0.00050 | $0.500 | — |
| Hunyuan T1 Tencent (Hunyuan) | documented elsewhere | $0.14 | $0.56 | — | $0.00056 | $0.564 | — |
| Doubao Seed Translation ByteDance (Doubao) | documented elsewhere | $0.17 | $0.51 | — | $0.00059 | $0.592 | — |
| Hunyuan Translation Tencent (Hunyuan) | documented elsewhere | $0.17 | $0.51 | — | $0.00059 | $0.592 | — |
| Qwen3 32B Alibaba (Qwen) | 131K | $0.16 | $0.64 | — | $0.00064 | $0.640 | — |
| Qwen3 8B Alibaba (Qwen) | 128K | $0.20 | $0.76 | — | $0.00077 | $0.769 | — |
| Doubao Seed 1.6 ByteDance (Doubao) | 256K | $0.11 | $1.13 | $0.02 | $0.00079 | $0.790 | — |
| Doubao Seed 1.8 ByteDance (Doubao) | 256K | $0.11 | $1.13 | $0.02 | $0.00079 | $0.790 | — |
| Qwen3 30B A3B Alibaba (Qwen) | 128K | $0.22 | $0.87 | — | $0.00087 | $0.867 | — |
| Qwen3 30B A3B Instruct 2507 Alibaba (Qwen) | 262K | $0.22 | $0.87 | — | $0.00087 | $0.867 | — |
| Doubao Seed Code ByteDance (Doubao) | 128K | $0.17 | $1.13 | $0.03 | $0.00090 | $0.901 | — |
| GLM-4.5 Air Zhipu (Z.ai / GLM) | 128K | $0.20 | $1.10 | $0.03 | $0.00095 | $0.950 | — |
| Qwen3 Next 80B A3B Instruct Alibaba (Qwen) | 262K | $0.16 | $1.30 | — | $0.00098 | $0.976 | — |
| Qwen3 Next 80B A3B Thinking Alibaba (Qwen) | 262K | $0.16 | $1.30 | — | $0.00098 | $0.976 | — |
| Qwen3 235B A22B Instruct 2507 Alibaba (Qwen) | 262K | $0.25 | $1.00 | — | $0.00100 | $0.997 | — |
| QwQ 32B Alibaba (Qwen) | 131K | $0.29 | $0.86 | — | $0.00100 | $1.00 | — |
| GPT-5.4 nano OpenAI | 400K | $0.20 | $1.25 | $0.02 | $0.00103 | $1.03 | — |
| Codestral Mistral | documented elsewhere — not on pricing page | $0.30 | $0.90 | — | $0.00105 | $1.05 | — |
| MiniMax M2 MiniMax | 205K | $0.30 | $1.20 | $0.03 | $0.00120 | $1.20 | — |
| MiniMax M2-her MiniMax | 64K | $0.30 | $1.20 | — | $0.00120 | $1.20 | — |
| MiniMax M2.1 MiniMax | 205K | $0.30 | $1.20 | $0.03 | $0.00120 | $1.20 | — |
| MiniMax M2.5 MiniMax | 205K | $0.30 | $1.20 | $0.03 | $0.00120 | $1.20 | — |
| MiniMax M2.7 MiniMax | 205K | $0.30 | $1.20 | $0.06 | $0.00120 | $1.20 | — |
| MiniMax M3 MiniMax | 1M | $0.30 | $1.20 | $0.06 | $0.00120 | $1.20 | — |
| Qwen3 VL Plus Alibaba (Qwen) | 256K | $0.20 | $1.60 | — | $0.00120 | $1.20 | — |
| Gemini 3.1 Flash-Lite Google | 1M | $0.25 | $1.50 | $0.03 | $0.00125 | $1.25 | — |
| DeepSeek V4 Pro DeepSeek | 1M | $0.43 | $0.87 | $0.00 | $0.00130 | $1.30 | — |
| Qwen3 Coder Flash Alibaba (Qwen) | 1M | $0.30 | $1.50 | — | $0.00135 | $1.35 | — |
| Qwen3 14B Alibaba (Qwen) | 131K | $0.35 | $1.40 | — | $0.00140 | $1.40 | — |
| Hunyuan 2.0 Instruct Tencent (Hunyuan) | 128K | $0.45 | $1.12 | — | $0.00146 | $1.46 | — |
| Hunyuan T1 Vision Tencent (Hunyuan) | 28K | $0.42 | $1.27 | — | $0.00148 | $1.48 | — |
| Hunyuan TurboS Vision Tencent (Hunyuan) | 32K | $0.42 | $1.27 | — | $0.00148 | $1.48 | — |
| Hunyuan TurboS Vision Video Tencent (Hunyuan) | 24K | $0.42 | $1.27 | — | $0.00148 | $1.48 | — |
| Tencent HY Vision 1.5 Instruct Tencent (Hunyuan) | 24K | $0.42 | $1.27 | — | $0.00148 | $1.48 | — |
| Qwen3 30B A3B Thinking 2507 Alibaba (Qwen) | 262K | $0.22 | $2.60 | — | $0.00173 | $1.73 | — |
| Qwen3 235B A22B Thinking 2507 Alibaba (Qwen) | 262K | $0.25 | $2.49 | — | $0.00175 | $1.75 | — |
| Magistral Small Mistral | documented elsewhere | $0.50 | $1.50 | — | $0.00175 | $1.75 | — |
| Mistral Large 3 Mistral | documented elsewhere — not on pricing page | $0.50 | $1.50 | — | $0.00175 | $1.75 | — |
| Qwen3.7 Plus Alibaba (Qwen) | 1M | $0.44 | $1.77 | — | $0.00177 | $1.77 | — |
| Devstral 2 Mistral | documented elsewhere — not on pricing page | $0.40 | $2.00 | — | $0.00180 | $1.80 | — |
| Gemini 2.5 Flash Google | 1M | $0.30 | $2.50 | $0.03 | $0.00185 | $1.85 | — |
| Baichuan-M2 Baichuan | 32K | $0.28 | $2.82 | — | $0.00197 | $1.97 | — |
| Qwen 3.5 Plus Alibaba (Qwen) | 256K | $0.40 | $2.40 | — | $0.00200 | $2.00 | — |
| Doubao Seed 2.0 Code ByteDance (Doubao) | 256K | $0.45 | $2.25 | $0.09 | $0.00203 | $2.03 | — |
| Doubao Seed 2.0 Pro ByteDance (Doubao) | 256K | $0.45 | $2.25 | $0.09 | $0.00203 | $2.03 | — |
| Baichuan-M3-Plus Baichuan | 32K | $0.70 | $1.27 | — | $0.00204 | $2.04 | — |
| Hunyuan 2.0 Think (HYThink) Tencent (Hunyuan) | 128K | $0.56 | $2.24 | — | $0.00224 | $2.24 | — |
| GLM-4.5 Zhipu (Z.ai / GLM) | 128K | $0.60 | $2.20 | $0.11 | $0.00230 | $2.30 | — |
| GLM-4.6 Zhipu (Z.ai / GLM) | 200K | $0.60 | $2.20 | $0.11 | $0.00230 | $2.30 | — |
| GLM-4.7 Zhipu (Z.ai / GLM) | 200K | $0.60 | $2.20 | $0.11 | $0.00230 | $2.30 | — |
| MiniMax M2.1 Highspeed MiniMax | 205K | $0.60 | $2.40 | $0.03 | $0.00240 | $2.40 | — |
| MiniMax M2.5 Highspeed MiniMax | 205K | $0.60 | $2.40 | $0.03 | $0.00240 | $2.40 | — |
| MiniMax M2.7 Highspeed MiniMax | 205K | $0.60 | $2.40 | $0.06 | $0.00240 | $2.40 | — |
| Qwen 3.5 122B A10B Alibaba (Qwen) | 256K | $0.40 | $3.20 | — | $0.00240 | $2.40 | — |
| Kimi K2.5 Moonshot (Kimi) | 262K | $0.60 | $3.00 | $0.10 | $0.00270 | $2.70 | — |
| Qwen3 235B A22B Alibaba (Qwen) | 131K | $0.70 | $2.80 | — | $0.00280 | $2.80 | — |
| QwQ Plus Alibaba (Qwen) | 131K | $0.80 | $2.40 | — | $0.00280 | $2.80 | — |
| Qwen 3.5 397B A17B Alibaba (Qwen) | 256K | $0.60 | $3.60 | — | $0.00300 | $3.00 | — |
| GLM-5 Zhipu (Z.ai / GLM) | 200K | $1.00 | $3.20 | $0.20 | $0.00360 | $3.60 | — |
| Grok 4.20 (0309) Non-Reasoning xAI | 1M | $1.25 | $2.50 | $0.20 | $0.00375 | $3.75 | — |
| Grok 4.20 (0309) Reasoning xAI | 1M | $1.25 | $2.50 | $0.20 | $0.00375 | $3.75 | — |
| Grok 4.20 Multi-Agent (0309) xAI | 2M | $1.25 | $2.50 | $0.20 | $0.00375 | $3.75 | — |
| Grok 4.3 xAI | 1M | $1.25 | $2.50 | $0.20 | $0.00375 | $3.75 | — |
| GPT-5.4 mini OpenAI | 400K | $0.75 | $4.50 | $0.07 | $0.00375 | $3.75 | — |
| Kimi K2.6 Moonshot (Kimi) | 262K | $0.95 | $4.00 | $0.16 | $0.00390 | $3.90 | — |
| Baichuan3-Turbo Baichuan | 32K | $1.69 | $1.69 | — | $0.00422 | $4.22 | — |
| GLM-5 Turbo Zhipu (Z.ai / GLM) | 200K | $1.20 | $4.00 | $0.24 | $0.00440 | $4.40 | — |
| GLM-4.5 AirX Zhipu (Z.ai / GLM) | 128K | $1.10 | $4.50 | $0.22 | $0.00445 | $4.45 | — |
| Qwen3 Coder Plus Alibaba (Qwen) | 1M | $1.00 | $5.00 | — | $0.00450 | $4.50 | — |
| Claude Haiku 4.5 Anthropic | 200K | $1.00 | $5.00 | $0.10 | $0.00450 | $4.50 | — |
| Baichuan-M2-Plus Baichuan | 32K | $1.41 | $4.22 | — | $0.00493 | $4.93 | — |
| Baichuan-M3 Baichuan | 32K | $1.41 | $4.22 | — | $0.00493 | $4.93 | — |
| GLM-5.1 Zhipu (Z.ai / GLM) | 200K | $1.40 | $4.40 | $0.26 | $0.00500 | $5.00 | — |
| Baichuan4 Turbo Baichuan | 32K | $2.11 | $2.11 | — | $0.00528 | $5.28 | — |
| Qwen3 Max Alibaba (Qwen) | 252K | $1.20 | $6.00 | — | $0.00540 | $5.40 | — |
| Magistral Medium Mistral | documented elsewhere — not on pricing page | $2.00 | $5.00 | — | $0.00650 | $6.50 | — |
| Mistral Medium 3.1 Mistral | documented elsewhere | $1.50 | $7.50 | — | $0.00675 | $6.75 | — |
| Mistral Medium 3.5 Mistral | documented elsewhere — not on pricing page | $1.50 | $7.50 | — | $0.00675 | $6.75 | — |
| Gemini 2.5 Pro Google | 2M | $1.25 | $10.0 | $0.13 | $0.00750 | $7.50 | — |
| Gemini 3.5 Flash Google | 1M | $1.50 | $9.00 | $0.15 | $0.00750 | $7.50 | — |
| Baichuan3-Turbo (128K) Baichuan | 128K | $3.38 | $3.38 | — | $0.00845 | $8.45 | — |
| GLM-4.5 X Zhipu (Z.ai / GLM) | 128K | $2.20 | $8.90 | $0.45 | $0.00885 | $8.85 | — |
| Qwen3.7 Max Alibaba (Qwen) | 1M | $2.77 | $8.31 | — | $0.00969 | $9.69 | — |
| GPT-5.3-Codex OpenAI | 400K | $1.75 | $14.0 | $0.17 | $0.011 | $10.50 | — |
| GPT-5.4 OpenAI | 1.05M | $2.50 | $15.0 | $0.25 | $0.013 | $12.50 | — |
| Claude Sonnet 4.5 Anthropic | 200K | $3.00 | $15.0 | $0.30 | $0.013 | $13.50 | — |
| Claude Sonnet 4.6 Anthropic | 1M | $3.00 | $15.0 | $0.30 | $0.013 | $13.50 | — |
| Claude Opus 4.5 Anthropic | 200K | $5.00 | $25.0 | $0.50 | $0.022 | $22.50 | — |
| Claude Opus 4.6 Anthropic | 1M | $5.00 | $25.0 | $0.50 | $0.022 | $22.50 | — |
| Claude Opus 4.7 Anthropic | 1M | $5.00 | $25.0 | $0.50 | $0.022 | $22.50 | — |
| Claude Opus 4.8 Anthropic | 1M | $5.00 | $25.0 | $0.50 | $0.022 | $22.50 | — |
| chat-latest OpenAI | 400K | $5.00 | $30.0 | $0.50 | $0.025 | $25.00 | — |
| GPT-5.5 OpenAI | 1M | $5.00 | $30.0 | $0.50 | $0.025 | $25.00 | — |
| Baichuan4 Baichuan | 32K | $14.1 | $14.1 | — | $0.035 | $35.21 | — |
| Claude Fable 5 Anthropic | 1M | $10.0 | $50.0 | $1.00 | $0.045 | $45.00 | — |
| Claude Opus 4.1 Anthropic | 200K | $15.0 | $75.0 | $1.50 | $0.068 | $67.50 | — |
| GPT-5.4 Pro OpenAI | 1.05M | $30.0 | $180 | — | $0.150 | $150 | — |
| GPT-5.5 Pro OpenAI | 1.05M | $30.0 | $180 | — | $0.150 | $150 | — |
Common questions.
How the estimate works, where it can drift from your invoice, and what the levers mean.
Q · 01 How accurate are these estimates? +
Q · 02 What are reasoning tokens and why do they matter? +
Q · 03 How does the cache hit-rate math work? +
input_rate = hit × cached_price + (1 − hit) × write_or_base_price. Models with no published cache pricing ignore the slider, and the hint under the field tells you.Q · 04 Can I combine batch mode with caching? +
Q · 05 How fresh is the pricing data? +
Q · 06 Why might my actual bill still differ? +
Q · 07 Can I share or export a scenario? +
Q · 08 Are the $0 models really free? +
Per-model pricing hubs
Put this calculator on your site
Free to embed — your readers always see current pricing. The widget links back here for the full comparison table.
The cost toolbox
Every tool runs on the same live-pricing backbone. See the full index →