How prompt caching actually changes the LLM cost math
Anthropic charges 10% of input for cache reads, with a 1.25x write fee. OpenAI auto-caches above 1024 tokens. The math changes which LLM is cheapest -- here is when.
1 post in this category.
Where the prose pulls apart pricing tables, agentic workload math, and cache economics that a single number on a vendor page cannot capture. Analysis posts here are anchored to verifiable data -- pricing snapshots from the canonical vendor page on the date of writing, worked examples whose arithmetic is auditable, and conclusions that move when the data moves. We publish these slowly. A single cornerstone analysis covers more ground than five thinkpieces, and we would rather correct one piece than retract three. Each entry carries its own verification date and the URLs of the sources it relied on, so the next person trying to repeat the math has a paper trail to follow.
Anthropic charges 10% of input for cache reads, with a 1.25x write fee. OpenAI auto-caches above 1024 tokens. The math changes which LLM is cheapest -- here is when.