Question 1

What is a token and how many tokens does a word have?

Accepted Answer

A token is the unit an AI model processes: usually between half a word and a full word. Rule of thumb: 1,000 tokens ≈ 750 English words. A 20-word sentence is about 26 tokens; a 300-word email is around 400. Models charge for input tokens (your prompt) and output tokens (their answer) separately.

Question 2

Why do cached tokens cost up to 90% less?

Accepted Answer

When your app sends the same system prompt on every request (e.g. "You are a customer support agent for Acme…"), providers store it once and reuse the computation. They only charge full price for the new part (the user's question). OpenAI, Anthropic and Google give 50% to 90% off cached tokens. The calculator above accounts for this via a cache-hit % slider.

Question 3

Which model should I use for a chatbot, code generation or document analysis?

Accepted Answer

Chatbot: GPT-5.4 Mini or Claude Haiku 4.5 for latency; DeepSeek V3.2 for cost. Code: Claude Opus 4.6 and GPT-5.4 still lead the benchmarks, with o3 and DeepSeek R1 for hard reasoning. Document analysis: you need a large window — Gemini 3.1 Pro, Claude Opus or Llama 4 Maverick. Use the Model Finder above for a personalised pick.

Question 4

How do I cut AI costs in my application?

Accepted Answer

Five levers: (1) cache system prompts (saves 50-90%), (2) batch API for non-urgent requests (50% off), (3) smaller model for simple tasks and flagship only when needed (complexity router), (4) trim prompts and cap output tokens, (5) fine-tune or use RAG instead of re-sending the same context every call. At Letbrand we audit each lever and typically cut costs by 40-80%.

Feature	DeepSeek R1 0528	Qwen 3.5 Flash
Provider	DeepSeek	Qwen
Tier	Reasoning	Budget
Input per 1M tokens	$0.45	$0.065
Output per 1M tokens	$2.15	$0.26
Cached input per 1M	$0.225	$0.007
Context window	164K	1M
Speed	Slow	Fast
Vision (image input)	No	No
Function calling	No	Yes
Batch API	No	No

Feature	DeepSeek R1 0528	Qwen 3.5 Flash
Provider	DeepSeek	Qwen
Tier	Reasoning	Budget
Input per 1M tokens	$0.45	$0.065
Output per 1M tokens	$2.15	$0.26
Cached input per 1M	$0.225	$0.007
Context window	164K	1M
Speed	Slow	Fast
Vision (image input)	No	No
Function calling	No	Yes
Batch API	No	No

DeepSeek R1 0528 vs Qwen 3.5 Flash

Quick verdict

Head-to-head

Estimated monthly cost

Which one should I pick?

Related comparisons

Frequently asked questions about AI models and pricing

What is a token and how many tokens does a word have?

Why do cached tokens cost up to 90% less?

Which model should I use for a chatbot, code generation or document analysis?

How do I cut AI costs in my application?

DeepSeek R1 0528 vs Qwen 3.5 Flash

Quick verdict

Head-to-head

Estimated monthly cost

Which one should I pick?

Related comparisons

Frequently asked questions about AI models and pricing

What is a token and how many tokens does a word have?

Why do cached tokens cost up to 90% less?

Which model should I use for a chatbot, code generation or document analysis?

How do I cut AI costs in my application?