How prompt caching actually changes the LLM cost math
Anthropic charges 10% of input for cache reads, with a 1.25x write fee. OpenAI auto-caches above 1024 tokens. The math changes which LLM is cheapest -- here is when.
1 post across analysis, opinion, how-to, research, and roundups.
Independent analysis, opinion, and how-to coverage of LLM tooling -- written for engineers who pay the bill. No paid placements, no vendor talking points, no benchmarks lifted from press kits. Every model price comes from the vendor's own page, verified the week of publication; every cost calculation shows its math. The blog sits next to the price desk and the comparisons -- prose covers the questions a table cannot answer (when does a 90% cache discount stop mattering? which workload class breaks GPT vs Claude pricing parity?). Cornerstone analysis ships first; opinion, tools how-to, research summaries, and roundups will join as the catalogue grows. Read for the math, stay for the disagreements.
Anthropic charges 10% of input for cache reads, with a 1.25x write fee. OpenAI auto-caches above 1024 tokens. The math changes which LLM is cheapest -- here is when.