RETIRED256K CONTEXTARCHIVE PRICEREDIRECTS TO GROK-4-3
Grok 4.1 Fast API Pricing
Grok 4.1 Fast is now an archive row: historical pricing was $0.20/M input, $0.05/M cached input, and $0.50/M output. xAI's May 15 retirement guide says 4.1 Fast slugs redirect to Grok 4.3.
Input - per 1M tokens
$0.20/M
Historical 4.1 Fast archive
Output - per 1M tokens
$0.50/M
Archive fast price archive
Cached input
$0.05/M
Prompt cache discount
Effective - agentic blend
$0.11/M
92/8 split - 82% cache
§ 01 / TERMINAL
Run the numbers.
Calculator pre-loaded with Grok 4.1 Fast archive rates. After May 15, 2026, xAI routes retired 4.1 Fast slugs to Grok 4.3 pricing.
$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
—
Words equivalent (English)
—
Effective rate
—
§ 02 / SCENARIOS
Real-world presets.
ARCHIVE
Invoice replay
$0.077/1M in
AGENT
Legacy agent task
$0.010/task
MIGRATION
Migration cost comparison
$0.020/pack
RAG
Archived RAG answer
$0.018/answer
§ 03 / TAPE
Price history.
Input · $0.20/M
Output · $0.50/M
Cached · $0.05/M
NOV 15 Launched at $0.2/M input and $0.5/M outputMAY 18 Retirement verified; slug redirects to Grok 4.3 after May 15, 2026
§ 04 / TOKENIZER
Paste text. See tokens. See cost.
Estimate · grok-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type
This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.
Characters —
Words —
Tokens (estimated) —
Cost as input · uncached —
Cost as output · uncached —
Cost as cached input —
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| Grok 4.1 Fast Current | $0.20 cache $0.05 | $0.50 | $0.11 archive 92/8 | 256K | Archive 4.1 Fast invoices |
| Grok 4.3 | $1.25 cache $0.20 | $2.50 | $0.56 current replacement | 1M | Current Grok default |
| Gemini 2.5 Flash | $0.30 cache $0.03 | $2.50 | $0.27 budget Gemini | 1M | Low-cost multimodal work |
| GPT-5.4 mini | $0.75 cache $0.07 | $4.50 | $0.54 OpenAI mini | 400K | Subagents and lightweight coding |
| DeepSeek V4 Pro | $0.43 cache $0.00 | $0.87 | $0.14 budget reasoning | 1M | Low-cost reasoning alternative |
Frequently asked.
Grok 4.1 Fast pricing questions, with archive status separated from token math.
Q · 01 What was Grok 4.1 Fast's API price? +
This archive page uses
$0.2/M input, $0.05/M cached input, and $0.5/M output. Use it for historical billing and migration math, not fresh production routing.Q · 02 Is Grok 4.1 Fast still available? +
xAI's May 15, 2026 retirement guide lists
grok-4-1-fast-reasoning and grok-4-1-fast-non-reasoning as retired. After retirement, requests redirect to grok-4.3.Q · 03 What should replace it? +
Use
grok-4.3. xAI's current pricing page lists Grok 4.3 at $1.25/M input, $0.20/M cached input, and $2.50/M output.Q · 04 Does prompt caching apply? +
The archive row includes cached input at
$0.05/M. The effective blend assumes 82% cache hits; set cache to off in the calculator for fresh-input workloads.Q · 05 Why keep an archive page? +
Retired model prices still matter for old invoices, migration plans, benchmarks, and generation-to-generation price history. The page is marked as archive so it does not imply current endpoint availability.
Q · 06 How accurate is the tokenizer estimate? +
The widget uses
4.875 characters per token as a planning estimate. Exact billing can vary with language, hidden reasoning, cached prompt boundaries, and tool usage.