Prices verified 2026-07-28 · 169 models · 25 providers

LLM API cost
calculator.

Q: Can I share or export a scenario?

Yes — Share scenario copies a URL that encodes every input, so a teammate opens the calculator in exactly your state. Export CSV downloads the full 169-model comparison for your current workload. Both are free, no signup.

Estimate monthly spend across 169 live-priced models from 25 providers — including the levers most calculators skip: reasoning tokens, cache hit rates with write premiums, and batch tiers. Rates come straight from vendor pricing pages, stamped with their verification date.

Jump to calculator Or start from a pre-configured scenario

InputsUSD · per-token list prices

Model — 169 live-priced

Requests per month

Input tokens/request ≈ 1,500 words

Output tokens/request ≈ 375 words

Reasoning tokens/request — billed as output

Cache hit rate % — cached input billed cheaper

Batch API −50% input

Reasoning models bill hidden thinking tokens at the output rate. Cache misses are billed at the vendor's cache-write rate where one applies (e.g. Anthropic 5-min TTL, 1.25× input). Batch mode swaps both rates and turns cache modeling off.

ResultClaude Sonnet 4.6 · 1K req/mo

Monthly cost (API)

$13.50/mo

Per request$0.013

Annual projection$162

Effective blended rate$5.40 /1M tokens

Input cost$6.00/mo

Output cost $7.50/mo

Prompt caching saves $0/mo

Top 5 cheapest for this workload

1. KAT-Coder-Air V1 $0 · free
2. KAT-Coder-Exp 72B 1010 $0 · free
3. Leanstral $0 · free
4. Hunyuan Lite $0 · free
5. GLM-4.5 Flash $0 · free

Embed this calculator ↓

Try common scenarios

Pre-configured use cases

Click any scenario to load it into the calculator. Costs below are computed from the current snapshot — they move when vendors reprice.

How it works

The math behind the estimate

Each model is priced with its own published levers: list input/output rates, cached-input rates, cache-write premiums where the vendor charges them, and batch-tier rates. Reasoning tokens are billed at the output rate, because that's how vendors bill them.

/* per request */
input_rate  = cache_on ? hit × cached_price + (1 − hit) × write_or_base_price
                       : base_input_price        /* batch mode: batch rates, cache off */
cost        = in_tokens × input_rate / 1M
            + (out_tokens + reasoning_tokens) × output_rate / 1M

/* monthly */
monthly     = cost × requests_per_month

Prices are taken from each vendor's official pricing page and stamped with the date they were last verified (currently 2026-07-28). The comparison table runs your exact workload through all 169 models with each model's own discounts.

Full methodology lives at /methodology/. Found an error? Report it.

Worked examples

Three levers, real numbers

Computed from the current snapshot — recalculated on every data refresh.

Lever 01 · caching

A support bot, before and after caching

10,000 tickets/month on Claude Haiku 4.5 with a 4K-token system prompt + context. At an 80% cache hit rate the same workload costs a fraction of the uncached bill.

ModelClaude Haiku 4.5

Workload10K req · 4,000 in / 600 out

Uncached$70.00/mo

80% cache hit$43.20/mo

−38%

caching pays for the integration in week one

Lever 02 · reasoning tokens

The thinking-token multiplier

5,000 requests/month on GPT-5.5, 500 visible output tokens each. Add a typical 6K-token thinking budget and the bill multiplies — same prompt, same visible answer.

ModelGPT-5.5

Workload5K req · 2,000 in / 500 out

No reasoning$125/mo

+6K thinking$1,025/mo

×8.2

budget thinking tokens before you ship a reasoning model

Lever 03 · batch

Overnight jobs at batch rates

500K classification calls/month on Gemini 3.5 Flash. If the workload tolerates async turnaround, batch rates cut the bill roughly in half — a one-line change for most pipelines.

ModelGemini 3.5 Flash

Workload500K req · 500 in / 60 out

Real-time$645/mo

Batch API$323/mo

−50%

same model, same output — half the rate

Optimization

Five ways to cut the bill

Turn on prompt caching

Cached input is up to 90–99% cheaper than fresh input on most vendors. For agents with a stable system prompt, caching alone often cuts the bill by half or more — set the cache hit rate above and watch the savings row. The full math: prompt caching cost math.

Route by task complexity

Classification and extraction don't need a frontier model. Routing simple tasks to a budget tier (10–50× cheaper per token) and reserving flagships for reasoning-heavy work is the single biggest lever after caching. Compare tiers in the pricing directory.

Trim context aggressively

Every token in the window bills on every call. Summarize old turns, truncate retrieved chunks, and strip boilerplate — a 30% context cut is a 30% input-cost cut.

Batch the non-real-time work

Batch APIs run at roughly half price on OpenAI, Anthropic, and Google. If a workload tolerates async turnaround, flip the Batch toggle above and compare.

Watch the hidden line items

Reasoning tokens, cache writes, retries, and tool-call overhead don't show up in naive estimates — they show up on your invoice. We covered the common surprises in hidden LLM API costs.

All models

Your workload on every model

Per-request and monthly cost for the scenario configured above, across all 169 live-priced models. Click a column to sort; click a model for its full pricing hub.

Model	Context	In $/1M	Out $/1M	Cached	Per request	Monthly	vs №1
KAT-Coder-Air V1 Kwaipilot	not listed	Free	Free	—	$0	$0	—
KAT-Coder-Exp 72B 1010 Kwaipilot	not listed	Free	Free	—	$0	$0	—
Leanstral Mistral	documented elsewhere	Free	Free	—	$0	$0	—
Hunyuan Lite Tencent (Hunyuan)	documented elsewhere	Free	Free	—	$0	$0	—
GLM-4.5 Flash Zhipu (Z.ai / GLM)	128K	Free	Free	Free	$0	$0	—
GLM-4.7 Flash Zhipu (Z.ai / GLM)	200K	Free	Free	Free	$0	$0	—
Qwen3.7 Flash Alibaba (Qwen)	1M	$0.03	$0.13	—	$0.00013	$0.125	—
Command R7B Cohere	128K	$0.04	$0.15	—	$0.00015	$0.150	—
Doubao Seed 1.6 Flash ByteDance (Doubao)	256K	$0.02	$0.22	$0.0044	$0.00015	$0.155	—
Doubao Seed 2.0 Mini ByteDance (Doubao)	256K	$0.03	$0.28	$0.006	$0.0002	$0.197	—
Ministral 3 3B Mistral	128K	$0.10	$0.10	—	$0.00025	$0.250	—
Reka Edge Reka	not listed	$0.10	$0.10	—	$0.00025	$0.250	—
GLM-4 32B (0414, 128K) Zhipu (Z.ai / GLM)	128K	$0.10	$0.10	—	$0.00025	$0.250	—
Doubao Seed 1.6 Lite ByteDance (Doubao)	256K	$0.04	$0.34	$0.008	$0.00025	$0.253	—
Hunyuan A13B Tencent (Hunyuan)	224K	$0.07	$0.28	—	$0.00028	$0.281	—
Qwen3 VL Flash Alibaba (Qwen)	256K	$0.05	$0.40	—	$0.0003	$0.300	—
GLM-4.7 FlashX Zhipu (Z.ai / GLM)	200K	$0.07	$0.40	$0.01	$0.00034	$0.340	—
Baichuan4 Air Baichuan	32K	$0.14	$0.14	—	$0.00035	$0.345	—
Yi Lightning 01.AI	16K	$0.14	$0.14	—	$0.00035	$0.348	—
Devstral Small 2 Mistral	documented elsewhere	$0.10	$0.30	—	$0.00035	$0.350	—
Mistral Small 4 Mistral	256K	$0.10	$0.30	$0.10	$0.00035	$0.350	—
Step 3.5 Flash StepFun	256K	$0.10	$0.31	$0.02	$0.00036	$0.360	—
Doubao Seed Character ByteDance (Doubao)	128K	$0.11	$0.28	$0.02	$0.00037	$0.367	—
Hunyuan TurboS Tencent (Hunyuan)	documented elsewhere	$0.11	$0.28	—	$0.00037	$0.367	—
Ministral 3 8B Mistral	128K	$0.15	$0.15	—	$0.00037	$0.375	—
Qwen 3.5 Flash Alibaba (Qwen)	1M	$0.10	$0.40	—	$0.0004	$0.400	—
Gemini 2.5 Flash-Lite Google	1M	$0.10	$0.40	$0.01	$0.0004	$0.400	—
DeepSeek V4 Flash DeepSeek	1M	$0.14	$0.28	$0.0028	$0.00042	$0.420	—
MiMo-V2.5 Xiaomi MiMo	1M	$0.14	$0.28	$0.0028	$0.00042	$0.420	—
Doubao Seed 2.0 Lite ByteDance (Doubao)	256K	$0.09	$0.53	$0.02	$0.00044	$0.442	—
ERNIE 4.5 Turbo Baidu (ERNIE)	128K	$0.12	$0.47	$0.03	$0.00047	$0.472	—
Hunyuan Translation Lite Tencent (Hunyuan)	documented elsewhere	$0.14	$0.42	—	$0.00049	$0.493	—
Ministral 3 14B Mistral	128K	$0.20	$0.20	—	$0.0005	$0.500	—
KAT-Coder-Air V2.5 Kwaipilot	256K	$0.14	$0.56	$0.03	$0.00056	$0.563	—
Hunyuan Hy3 Tencent (Hunyuan)	256K	$0.14	$0.56	$0.04	$0.00056	$0.564	—
Hunyuan T1 Tencent (Hunyuan)	documented elsewhere	$0.14	$0.56	—	$0.00056	$0.564	—
Doubao Seed Translation ByteDance (Doubao)	documented elsewhere	$0.17	$0.51	—	$0.00059	$0.592	—
Hunyuan Translation Tencent (Hunyuan)	documented elsewhere	$0.17	$0.51	—	$0.00059	$0.592	—
Command R Cohere	128K	$0.15	$0.60	—	$0.0006	$0.600	—
Jamba Mini 1.7 AI21 Labs	256K	$0.20	$0.40	—	$0.0006	$0.600	—
Qwen3 32B Alibaba (Qwen)	131K	$0.16	$0.64	—	$0.00064	$0.640	—
Qwen3 8B Alibaba (Qwen)	128K	$0.18	$0.70	—	$0.00071	$0.710	—
Doubao Seed 1.6 ByteDance (Doubao)	256K	$0.11	$1.13	$0.02	$0.00079	$0.790	—
Doubao Seed 1.6 Vision ByteDance (Doubao)	256K	$0.11	$1.13	$0.02	$0.00079	$0.790	—
Doubao Seed 1.8 ByteDance (Doubao)	256K	$0.11	$1.13	$0.02	$0.00079	$0.790	—
Qwen3 30B A3B Alibaba (Qwen)	128K	$0.20	$0.80	—	$0.0008	$0.800	—
Qwen3 30B A3B Instruct 2507 Alibaba (Qwen)	262K	$0.20	$0.80	—	$0.0008	$0.800	—
Qwen3 Next 80B A3B Instruct Alibaba (Qwen)	262K	$0.15	$1.20	—	$0.0009	$0.900	—
Qwen3 Next 80B A3B Thinking Alibaba (Qwen)	262K	$0.15	$1.20	—	$0.0009	$0.900	—
Doubao Seed Code ByteDance (Doubao)	128K	$0.17	$1.13	$0.03	$0.0009	$0.901	—
Qwen3 235B A22B Instruct 2507 Alibaba (Qwen)	262K	$0.23	$0.92	—	$0.00092	$0.920	—
GLM-4.5 Air Zhipu (Z.ai / GLM)	128K	$0.20	$1.10	$0.03	$0.00095	$0.950	—
Step 3.7 Flash StepFun	256K	$0.20	$1.19	$0.04	$0.00099	$0.993	—
GPT-5.4 nano OpenAI	400K	$0.20	$1.25	$0.02	$0.00103	$1.03	—
Codestral Mistral	documented elsewhere — not on pricing page	$0.30	$0.90	—	$0.00105	$1.05	—
KAT-Coder-Pro V1 Kwaipilot	not listed	$0.30	$1.18	$0.06	$0.00118	$1.18	—
KAT-Coder-Pro V2 Kwaipilot	not listed	$0.30	$1.18	$0.06	$0.00118	$1.18	—
MiniMax M2 MiniMax	205K	$0.30	$1.20	$0.03	$0.0012	$1.20	—
MiniMax M2-her MiniMax	64K	$0.30	$1.20	—	$0.0012	$1.20	—
MiniMax M2.1 MiniMax	205K	$0.30	$1.20	$0.03	$0.0012	$1.20	—
MiniMax M2.5 MiniMax	205K	$0.30	$1.20	$0.03	$0.0012	$1.20	—
MiniMax M2.7 MiniMax	205K	$0.30	$1.20	$0.06	$0.0012	$1.20	—
MiniMax M3 MiniMax	1M	$0.30	$1.20	$0.06	$0.0012	$1.20	—
Qwen3 VL Plus Alibaba (Qwen)	256K	$0.20	$1.60	—	$0.0012	$1.20	—
Qwen3.6 Flash Alibaba (Qwen)	1M	$0.25	$1.50	$0.25	$0.00125	$1.25	—
Gemini 3.1 Flash-Lite Google	1M	$0.25	$1.50	$0.03	$0.00125	$1.25	—
DeepSeek V4 Pro DeepSeek	1M	$0.43	$0.87	$0.0036	$0.0013	$1.30	—
MiMo-V2.5-Pro Xiaomi MiMo	1M	$0.43	$0.87	$0.0036	$0.0013	$1.30	—
Qwen3 Coder Flash Alibaba (Qwen)	1M	$0.30	$1.50	—	$0.00135	$1.35	—
Qwen3 Coder Next Alibaba (Qwen)	262K	$0.30	$1.50	—	$0.00135	$1.35	—
Qwen3 14B Alibaba (Qwen)	131K	$0.35	$1.40	—	$0.0014	$1.40	—
Hunyuan 2.0 Instruct Tencent (Hunyuan)	128K	$0.45	$1.12	—	$0.00146	$1.46	—
Hunyuan T1 Vision Tencent (Hunyuan)	28K	$0.42	$1.27	—	$0.00148	$1.48	—
Hunyuan TurboS Vision Tencent (Hunyuan)	32K	$0.42	$1.27	—	$0.00148	$1.48	—
Hunyuan TurboS Vision Video Tencent (Hunyuan)	24K	$0.42	$1.27	—	$0.00148	$1.48	—
Tencent HY Vision 1.5 Instruct Tencent (Hunyuan)	24K	$0.42	$1.27	—	$0.00148	$1.48	—
Qwen3 30B A3B Thinking 2507 Alibaba (Qwen)	262K	$0.20	$2.40	—	$0.0016	$1.60	—
Qwen3.7 Plus Alibaba (Qwen)	1M	$0.40	$1.60	—	$0.0016	$1.60	—
Qwen3 235B A22B Thinking 2507 Alibaba (Qwen)	262K	$0.23	$2.30	—	$0.00161	$1.61	—
Aya Expanse 32B Cohere	128K	$0.50	$1.50	—	$0.00175	$1.75	—
Magistral Small Mistral	documented elsewhere	$0.50	$1.50	—	$0.00175	$1.75	—
Mistral Large 3 Mistral	documented elsewhere — not on pricing page	$0.50	$1.50	—	$0.00175	$1.75	—
Devstral 2 Mistral	documented elsewhere — not on pricing page	$0.40	$2.00	—	$0.0018	$1.80	—
Amazon Nova 2 Lite Amazon	1M	$0.30	$2.50	—	$0.00185	$1.85	—
Gemini 2.5 Flash Google	1M	$0.30	$2.50	$0.03	$0.00185	$1.85	—
Gemini 3.5 Flash-Lite Google	1M	$0.30	$2.50	$0.03	$0.00185	$1.85	—
Qwen3.6 35B A3B Alibaba (Qwen)	256K	$0.38	$2.25	$0.38	$0.00188	$1.88	—
Baichuan-M2 Baichuan	32K	$0.28	$2.82	—	$0.00197	$1.97	—
Doubao Seed 2.1 Turbo ByteDance (Doubao)	256K	$0.44	$2.21	$0.09	$0.00199	$1.99	—
Qwen 3.5 Plus Alibaba (Qwen)	256K	$0.40	$2.40	—	$0.002	$2.00	—
Doubao Seed 2.0 Code ByteDance (Doubao)	256K	$0.45	$2.25	$0.09	$0.00203	$2.03	—
Baichuan-M3-Plus Baichuan	32K	$0.70	$1.27	—	$0.00204	$2.04	—
Doubao Seed 2.0 Pro ByteDance (Doubao)	256K	$0.47	$2.36	$0.09	$0.00212	$2.12	—
Hunyuan 2.0 Think (HYThink) Tencent (Hunyuan)	128K	$0.56	$2.24	—	$0.00224	$2.24	—
GLM-4.5 Zhipu (Z.ai / GLM)	128K	$0.60	$2.20	$0.11	$0.0023	$2.30	—
GLM-4.6 Zhipu (Z.ai / GLM)	200K	$0.60	$2.20	$0.11	$0.0023	$2.30	—
GLM-4.7 Zhipu (Z.ai / GLM)	200K	$0.60	$2.20	$0.11	$0.0023	$2.30	—
MiniMax M2.1 Highspeed MiniMax	205K	$0.60	$2.40	$0.03	$0.0024	$2.40	—
MiniMax M2.5 Highspeed MiniMax	205K	$0.60	$2.40	$0.03	$0.0024	$2.40	—
MiniMax M2.7 Highspeed MiniMax	205K	$0.60	$2.40	$0.06	$0.0024	$2.40	—
Qwen 3.5 122B A10B Alibaba (Qwen)	256K	$0.40	$3.20	—	$0.0024	$2.40	—
Qwen3.6 Plus Alibaba (Qwen)	1M	$0.50	$3.00	—	$0.0025	$2.50	—
Sonar Perplexity	not listed	$1.00	$1.00	—	$0.0025	$2.50	—
ERNIE 5.1 Baidu (ERNIE)	128K	$0.59	$2.65	—	$0.00251	$2.51	—
Reka Flash Reka	not listed	$0.80	$2.00	—	$0.0026	$2.60	—
Kimi K2.5 Moonshot (Kimi)	262K	$0.60	$3.00	$0.10	$0.0027	$2.70	—
Qwen3 235B A22B Alibaba (Qwen)	131K	$0.70	$2.80	—	$0.0028	$2.80	—
QwQ Plus Alibaba (Qwen)	131K	$0.80	$2.40	—	$0.0028	$2.80	—
KAT-Coder-Pro V2.5 Kwaipilot	256K	$0.70	$2.82	$0.14	$0.00282	$2.82	—
LongCat-2.0 Meituan	1M	$0.70	$2.82	$0.01	$0.00282	$2.82	—
Qwen 3.5 397B A17B Alibaba (Qwen)	256K	$0.60	$3.60	—	$0.003	$3.00	—
Qwen3.6 27B Alibaba (Qwen)	256K	$0.60	$3.60	$0.60	$0.003	$3.00	—
Grok Build 0.1 SpaceXAI	256K	$1.00	$2.00	$0.20	$0.003	$3.00	—
GLM-5 Zhipu (Z.ai / GLM)	200K	$1.00	$3.20	$0.20	$0.0036	$3.60	—
Grok 4.20 (0309) Non-Reasoning SpaceXAI	1M	$1.25	$2.50	$0.20	$0.00375	$3.75	—
Grok 4.20 (0309) Reasoning SpaceXAI	1M	$1.25	$2.50	$0.20	$0.00375	$3.75	—
Grok 4.20 Multi-Agent (0309) SpaceXAI	2M	$1.25	$2.50	$0.20	$0.00375	$3.75	—
Grok 4.3 SpaceXAI	1M	$1.25	$2.50	$0.20	$0.00375	$3.75	—
GPT-5.4 mini OpenAI	400K	$0.75	$4.50	$0.07	$0.00375	$3.75	—
Kimi K2.6 Moonshot (Kimi)	262K	$0.95	$4.00	$0.16	$0.0039	$3.90	—
Kimi K2.7 Code Moonshot (Kimi)	262K	$0.95	$4.00	$0.19	$0.0039	$3.90	—
Doubao Seed 2.1 Pro ByteDance (Doubao)	256K	$0.88	$4.42	$0.18	$0.00398	$3.98	—
Baichuan3-Turbo Baichuan	32K	$1.69	$1.69	—	$0.00422	$4.22	—
GLM-5 Turbo Zhipu (Z.ai / GLM)	200K	$1.20	$4.00	$0.24	$0.0044	$4.40	—
GLM-4.5 AirX Zhipu (Z.ai / GLM)	128K	$1.10	$4.50	$0.22	$0.00445	$4.45	—
Qwen3 Coder Plus Alibaba (Qwen)	1M	$1.00	$5.00	—	$0.0045	$4.50	—
Claude Haiku 4.5 Anthropic	200K	$1.00	$5.00	$0.10	$0.0045	$4.50	—
Baichuan-M2-Plus Baichuan	32K	$1.41	$4.22	—	$0.00493	$4.93	—
Baichuan-M3 Baichuan	32K	$1.41	$4.22	—	$0.00493	$4.93	—
GPT-5.6 Luna OpenAI	1.05M	$1.00	$6.00	$0.10	$0.005	$5.00	—
GLM-5.1 Zhipu (Z.ai / GLM)	200K	$1.40	$4.40	$0.26	$0.005	$5.00	—
GLM-5.2 Zhipu (Z.ai / GLM)	1M	$1.40	$4.40	$0.26	$0.005	$5.00	—
Baichuan4 Turbo Baichuan	32K	$2.11	$2.11	—	$0.00528	$5.28	—
Qwen3 Max Alibaba (Qwen)	252K	$1.20	$6.00	—	$0.0054	$5.40	—
Magistral Medium Mistral	documented elsewhere — not on pricing page	$2.00	$5.00	—	$0.0065	$6.50	—
Gemini 3.6 Flash Google	1M	$1.50	$7.50	$0.15	$0.00675	$6.75	—
Mistral Medium 3.5 Mistral	documented elsewhere — not on pricing page	$1.50	$7.50	—	$0.00675	$6.75	—
Reka Core Reka	not listed	$2.00	$6.00	—	$0.007	$7.00	—
Grok 4.5 SpaceXAI	500K	$2.00	$6.00	$0.50	$0.007	$7.00	—
Gemini 2.5 Pro Google	2M	$1.25	$10.0	$0.13	$0.0075	$7.50	—
Gemini 3.5 Flash Google	1M	$1.50	$9.00	$0.15	$0.0075	$7.50	—
Jamba Large 1.7 AI21 Labs	256K	$2.00	$8.00	—	$0.008	$8.00	—
Sonar Deep Research Perplexity	not listed	$2.00	$8.00	—	$0.008	$8.00	—
Sonar Reasoning Pro Perplexity	not listed	$2.00	$8.00	—	$0.008	$8.00	—
Baichuan3-Turbo (128K) Baichuan	128K	$3.38	$3.38	—	$0.00845	$8.45	—
Qwen3.7 Max Alibaba (Qwen)	1M	$2.50	$7.50	—	$0.00875	$8.75	—
GLM-4.5 X Zhipu (Z.ai / GLM)	128K	$2.20	$8.90	$0.45	$0.00885	$8.85	—
Claude Sonnet 5 Anthropic	1M	$2.00	$10.0	$0.20	$0.009	$9.00	—
Command R+ Cohere	128K	$2.50	$10.0	—	$0.010	$10.00	—
GPT-5.3-Codex OpenAI	400K	$1.75	$14.0	$0.17	$0.011	$10.50	—
GPT-5.4 OpenAI	1.05M	$2.50	$15.0	$0.25	$0.013	$12.50	—
GPT-5.6 Terra OpenAI	1.05M	$2.50	$15.0	$0.25	$0.013	$12.50	—
Claude Sonnet 4.5 Anthropic	200K	$3.00	$15.0	$0.30	$0.013	$13.50	—
Claude Sonnet 4.6 Anthropic	1M	$3.00	$15.0	$0.30	$0.013	$13.50	—
Kimi K3 Moonshot (Kimi)	1M	$3.00	$15.0	$0.30	$0.013	$13.50	—
Sonar Pro Perplexity	not listed	$3.00	$15.0	—	$0.013	$13.50	—
Claude Opus 4.5 Anthropic	200K	$5.00	$25.0	$0.50	$0.022	$22.50	—
Claude Opus 4.6 Anthropic	1M	$5.00	$25.0	$0.50	$0.022	$22.50	—
Claude Opus 4.7 Anthropic	1M	$5.00	$25.0	$0.50	$0.022	$22.50	—
Claude Opus 4.8 Anthropic	1M	$5.00	$25.0	$0.50	$0.022	$22.50	—
Claude Opus 5 Anthropic	1M	$5.00	$25.0	$0.50	$0.022	$22.50	—
chat-latest OpenAI	400K	$5.00	$30.0	$0.50	$0.025	$25.00	—
GPT-5.5 OpenAI	1M	$5.00	$30.0	$0.50	$0.025	$25.00	—
GPT-5.6 Sol OpenAI	1.05M	$5.00	$30.0	$0.50	$0.025	$25.00	—
Fugu Ultra Sakana AI	272K	$5.00	$30.0	$0.50	$0.025	$25.00	—
Baichuan4 Baichuan	32K	$14.1	$14.1	—	$0.035	$35.21	—
Claude Fable 5 Anthropic	1M	$10.0	$50.0	$1.00	$0.045	$45.00	—
GPT-5.4 Pro OpenAI	1.05M	$30.0	$180	—	$0.150	$150	—
GPT-5.5 Pro OpenAI	1.05M	$30.0	$180	—	$0.150	$150	—

Common questions.

How the estimate works, where it can drift from your invoice, and what the levers mean.

Q · 01 How accurate are these estimates? +

The calculator multiplies your token counts by each vendor's published list prices, verified against the vendor's own pricing page (date in the header). It models caching, cache-write premiums, batch tiers, and reasoning tokens — but not negotiated enterprise discounts, regional taxes, or provider-side failures and retries. Treat results as a tight planning estimate, then measure a production sample.

Q · 02 What are reasoning tokens and why do they matter? +

Reasoning (thinking) models generate hidden chain-of-thought tokens before the visible answer, and vendors bill them at the output rate. A request that returns 500 visible tokens can quietly bill thousands of reasoning tokens on top. Most calculators ignore this entirely — it's the single most common reason real bills exceed estimates. Set the reasoning field to your model's typical thinking budget.

Q · 03 How does the cache hit-rate math work? +

Input tokens served from cache bill at the model's cached-input price. Cache misses bill at the vendor's cache-write rate where one exists — Anthropic charges 1.25× base input to write a 5-minute cache entry — otherwise at the base input rate. Formula: input_rate = hit × cached_price + (1 − hit) × write_or_base_price. Models with no published cache pricing ignore the slider, and the hint under the field tells you.

Q · 04 Can I combine batch mode with caching? +

Not in this calculator. Published cached-input prices are standard-tier prices; stacking the batch discount on top would double-count. When Batch is on we use the vendor's batch input/output rates and disable cache modeling — a conservative estimate. If a vendor documents combined batch+cache pricing, we'll model it.

Q · 05 How fresh is the pricing data? +

Every model's rates are verified against the vendor's own pricing page and stamped — the date in the page header is the snapshot's last verification date. When a vendor reprices, the whole page (presets, examples, table) recomputes on rebuild. Spotted a stale number? Report it and we re-verify within a day.

Q · 06 Why might my actual bill still differ? +

Variable context growth in agent loops, tool-call payloads, retries, and tokenizer differences between vendors (the same text can tokenize to different counts — some tokenizers run up to ~35% heavier on identical text). The calculator assumes fixed average tokens per request; measure a representative sample with your vendor's usage dashboard for the first weeks in production.

Q · 07 Can I share or export a scenario? +

Yes — Share scenario copies a URL that encodes every input, so a teammate opens the calculator in exactly your state. Export CSV downloads the full 169-model comparison for your current workload. Both are free, no signup.

Q · 08 Are the $0 models really free? +

The free-tier entries (e.g. GLM Flash tiers, Leanstral, Hunyuan Lite) publish a genuine $0 per-token API price, typically with rate limits, capacity queues, or data-use terms attached. They're real options for prototypes and low-volume tools — read the vendor's terms before betting production traffic on one.

Deep dive

Per-model pricing hubs

01.AIYi Lightning AI21 LabsJamba Large 1.7 Alibaba (Qwen)Qwen 3.5 122B A10B AmazonAmazon Nova 2 Lite AnthropicClaude Fable 5 BaichuanBaichuan-M2 Baidu (ERNIE)ERNIE 4.5 Turbo ByteDance (Doubao)Doubao Seed 1.6 CohereAya Expanse 32B DeepSeekDeepSeek V4 Flash GoogleGemini 2.5 Flash KwaipilotKAT-Coder-Air V2.5 MeituanLongCat-2.0 MiniMaxMiniMax M2 MistralCodestral Moonshot (Kimi)Kimi K2.5 OpenAIchat-latest All providersView all models →

Embed

Put this calculator on your site

Free to embed — your readers always see current pricing. The widget links back here for the full comparison table.

<iframe
  src="https://aicost.tools/calculator/llm-api/embed/"
  width="100%" height="760" loading="lazy"
  style="border:1px solid #3d3a32"
  title="LLM API Cost Calculator by AI//COST">
</iframe>

More calculators

The cost toolbox

Every tool runs on the same live-pricing backbone. See the full index →