LOW COST1M CONTEXTBATCH -50%TEXT + VISION
Qwen 3.5 Flash API Pricing
Qwen 3.5 Flash is Qwen 3.5 cheap long-context tier. Baseline rates are $0.1/M input and $0.4/M output. Pulled directly from alibabacloud.com and re-verified against the pricing page.
Input - per 1M tokens
$0.10/M
Source alibabacloud.com flat
Output - per 1M tokens
$0.40/M
Context 1M flat
Cache N/A
$0.10/M
Cache vendor row not listed
Effective - agentic blend
$0.12/M
92/8 split - 82% cache
§ 01 / TERMINAL
Run the numbers.
Live calculator pre-loaded with Qwen 3.5 Flash rates. Tweak spend, output mix, or cache assumptions to compare it with sibling models.
$ /mo
Workload split
Prompt cache hit rate
Tokens you can process
—
Words equivalent (English)
—
Effective rate
—
§ 02 / SCENARIOS
Real-world presets.
VISION
Invoice extraction
$0.001/doc
CHATBOT
Support assistant
$0.001/turn
RAG
Knowledge base answer
$0.001/query
BULK
Image QA review
$0.001/item
§ 03 / TAPE
Price history.
Input · $0.10/M
Output · $0.40/M
Cached · $0.10/M
FEB 23 Launch price $0.1/M input and $0.4/M outputMAY 18 Live verification kept $0.1/M and $0.4/M
§ 04 / TOKENIZER
Paste text. See tokens. See cost.
Estimate · qwen-tokenizer-estimate · ≈3.85 chars/token Auto-counts as you type
This is a chars-per-token approximation, not a real tokenizer. Actual tokens vary by language, code density, and tool-call overhead — counts are typically ±10–20% off for English prose, more for code or non-Latin scripts. For exact billing, use the vendor's official tokenizer.
Characters —
Words —
Tokens (estimated) —
Cost as input · uncached —
Cost as output · uncached —
Cost as cached input —
| Model | Input /M | Output /M | Effective blended | Context | Best for |
|---|---|---|---|---|---|
| Qwen 3.5 Flash Current | $0.10 | $0.40 | $0.12 agentic 92/8 | 1M | Bulk chat and long-context RAG |
| Qwen3 Max | $1.20 | $6.00 | $1.58 pricier | 252K | Frontier Qwen reasoning |
| Qwen 3.5 Plus | $0.40 | $2.40 | $0.56 pricier | 256K | General Qwen production workloads |
| Qwen 3.5 Flash Current | $0.10 | $0.40 | $0.12 agentic 92/8 | 1M | Bulk chat and long-context RAG |
| Qwen3 VL Plus | $0.20 | $1.60 | $0.31 pricier | 256K | Vision and document understanding |
| Qwen3 VL Flash | $0.05 | $0.40 | $0.08 cheaper | 256K | Vision and document understanding |
| GPT-5.4 mini | $0.75 cache $0.07 | $4.50 | $0.54 pricier | 400K | Mistral production workloads |
| Gemini 2.5 Flash | $0.30 cache $0.03 | $2.50 | $0.27 pricier | 1M | Mistral production workloads |
Frequently asked.
Practical pricing questions, separated from calculator assumptions.
Q · 01 What is Qwen 3.5 Flash priced at? +
Qwen 3.5 Flash is shown at
$0.1/M input and $0.4/M output. The page stores USD per-million-token baseline pricing from alibabacloud.com.Q · 02 Does this page include higher context pricing tiers? +
Alibaba publishes tiered pricing for several Qwen models. This page uses the baseline Singapore / International tier from the queue and snapshot; higher-token tiers are noted in the source page and can be added as a variant later.
Q · 03 Is prompt caching priced separately? +
No separate cache-read price is published for this row, so the calculator treats cached input as
$0.1/M.Q · 04 How is the effective price calculated? +
AI//COST uses the same 92/8 agentic blend everywhere. For Qwen 3.5 Flash, that gives
$0.12/M with only documented cache discounts included.Q · 05 Is there a batch discount? +
Alibaba lists Batch Invocation at 50% off for supported Qwen rows.
Q · 06 Are regional prices different? +
Yes. Alibaba Cloud publishes separate International, Global, US, EU, China (Hong Kong), and Chinese Mainland deployment sections. AI//COST uses the International / Singapore baseline for this queue unless a page explicitly says otherwise.