Last verified 2026-05-19

VISION 72BLEGACY QWEN 2.5131K CONTEXTTEXT + VISIONBATCH -50%

Qwen2.5 VL 72B Instruct API Pricing

Q: What is Qwen2.5 VL 72B Instruct priced at?

Qwen2.5 VL 72B Instruct is listed at $2.8/M input and $8.4/M output in Alibaba Cloud Model Studio's International/Singapore deployment section. The page stores USD per-million-token pricing.

Qwen2.5 VL 72B Instruct is Alibaba's legacy Qwen2.5 vision-language flagship row for compatibility and invoice checks. Alibaba lists the International/Singapore baseline at $2.8/M input and $8.4/M output; newer workloads should compare Qwen3 VL Plus or Qwen3 VL Flash. Pulled directly from alibabacloud.com daily.

Input - per 1M tokens

$2.80/M

Source Alibaba Cloud flat

Output - per 1M tokens

$8.40/M

Context 131K flat

Cache N/A

$2.80/M

Cache no dollar row not listed

Effective - agentic blend

$3.25/M

92/8 split - no cache

§ 01 / TERMINAL

Run the numbers.

Live calculator pre-loaded with current Qwen2.5 VL 72B Instruct rates. Tweak spend, output mix, or cache assumptions and share the URL to share the calculation.

Spend

$ /mo

Workload split

Prompt cache hit rate

Tokens you can process

—

Words equivalent (English)

—

Effective rate

—

Open full calculator (all models · share URL · CSV) →

§ 02 / SCENARIOS

Real-world presets.

VISION

Invoice extraction

$0.024/doc

6,000 in - 800 out~4,255 units/$100

UI QA

Screenshot inspection

$0.042/screen

12,000 in - 1,000 out~2,380 units/$100

CHARTS

Dashboard explain

$0.020/image

5,000 in - 700 out~5,025 units/$100

MULTIMODAL RAG

Visual knowledge answer

$0.064/query

18,000 in - 1,600 out~1,567 units/$100

§ 03 / TOKENIZER

Paste text. See tokens. See cost.

Your text · live count

Calibrated · measured on the vendor's tokenizer · 2026-06-10 Auto-counts as you type

Counts use a chars-per-token calibration measured on the vendor's own published tokenizer (Qwen/Qwen3.5-397B-A17B, 2026-06-10). English prose is typically within a few percent; code and non-Latin scripts tokenize heavier. For billing-exact counts use the vendor's count-tokens API.

Characters 484

Words 75

Tokens (estimated) 93 tokens

Cost as input · uncached $0.000260 USD

Cost as output · uncached $0.000781 USD

Cost as cached input $0.000260 USD

§ 04 / SHELF

Up against the shelf.

All models →

Model	Input /M	Output /M	Effective blended	Context	Best for
Qwen 2.5 VL 72B Instruct Current	$2.80	$8.40	$3.25 agentic 92/8	131K	Legacy visual reasoning compatibility
Qwen3 VL Plus	$0.20	$1.60	$0.31 cheaper	256K	Current vision and document understanding
Qwen3 VL Flash	$0.05	$0.40	$0.08 cheaper	256K	Cheap high-volume vision tasks
Qwen VL Max	$0.80	$3.20	$0.99 cheaper	128K	Legacy Qwen vision flagship
Qwen VL Plus	$0.21	$0.63	$0.24 cheaper	128K	Legacy low-cost vision apps
Qwen3 235B A22B	$0.70	$2.80	$0.87 cheaper	131K	Current open MoE reasoning baseline
GPT-5.4 mini	$0.75 cache $0.07	$4.50	$0.54 cheaper	400K	OpenAI mini coding and CUA
Gemini 2.5 Flash	$0.30 cache $0.03	$2.50	$0.27 cheaper	1M	Google long-context Flash workloads

Frequently asked.

Practical pricing questions, separated from calculator assumptions and regional tiers.

Q · 01 What is Qwen2.5 VL 72B Instruct priced at? +

Qwen2.5 VL 72B Instruct is listed at $2.8/M input and $8.4/M output in Alibaba Cloud Model Studio's International/Singapore deployment section. The page stores USD per-million-token pricing.

Q · 02 What replaced Qwen2.5 VL 72B Instruct? +

Qwen2.5 VL 72B Instruct is a legacy Qwen 2.5 compatibility row. For new workloads, compare Qwen3 VL Plus or Qwen3 VL Flash or the current Qwen3 family before staying on the older SKU.

Q · 03 Does this page use International or Global pricing? +

This page uses Alibaba Cloud Model Studio International deployment pricing, where endpoint and data storage are in Singapore and inference resources are dynamically scheduled globally excluding Chinese Mainland. Global, US, EU, China (Hong Kong), and Chinese Mainland sections can list different prices.

Q · 04 Is prompt caching priced separately? +

Alibaba marks context-cache support on some Qwen rows, but this row does not publish a concrete cache-read dollar price in the pricing table. The calculator therefore treats cached input as the same $2.8/M baseline instead of inventing a discount.

Q · 05 How is the effective price calculated? +

AI//COST uses the same 92/8 agentic blend everywhere. With no separate cache-read price published for this row, Qwen2.5 VL 72B Instruct's effective blended cost is $3.25/M.

Q · 06 Is there a Batch Invocation discount? +

Alibaba documents Batch Invocation as 50% off real-time input and output tokens for supported Qwen rows. The quote tiles show real-time list pricing; batch economics should be treated as a separate calculator variant.

Q · 07 Does Alibaba include a free quota? +

Many International Model Studio rows include a 1 million token free quota that is valid for 90 days after activating Model Studio. Free-quota eligibility is deployment- and model-specific, so production estimates should use the paid list prices shown here.

Reviewed by Yaroslav Vikhariev Founder - AI//COST - Pricing pulled daily from alibabacloud.com - Last verified May 19, 2026

Methodology Report a correction More by Y.V.