Question 1

What is a token and how many tokens does a word have?

Accepted Answer

A token is the unit an AI model processes: usually between half a word and a full word. Rule of thumb: 1,000 tokens ≈ 750 English words. A 20-word sentence is about 26 tokens; a 300-word email is around 400. Models charge for input tokens (your prompt) and output tokens (their answer) separately.

Question 2

Why do cached tokens cost up to 90% less?

Accepted Answer

When your app sends the same system prompt on every request (e.g. "You are a customer support agent for Acme…"), providers store it once and reuse the computation. They only charge full price for the new part (the user's question). OpenAI, Anthropic and Google give 50% to 90% off cached tokens. The calculator above accounts for this via a cache-hit % slider.

Question 3

Which model should I use for a chatbot, code generation or document analysis?

Accepted Answer

Chatbot: GPT-5.4 Mini or Claude Haiku 4.5 for latency; DeepSeek V3.2 for cost. Code: Claude Opus 4.6 and GPT-5.4 still lead the benchmarks, with o3 and DeepSeek R1 for hard reasoning. Document analysis: you need a large window — Gemini 3.1 Pro, Claude Opus or Llama 4 Maverick. Use the Model Finder above for a personalised pick.

Question 4

How do I cut AI costs in my application?

Accepted Answer

Five levers: (1) cache system prompts (saves 50-90%), (2) batch API for non-urgent requests (50% off), (3) smaller model for simple tasks and flagship only when needed (complexity router), (4) trim prompts and cap output tokens, (5) fine-tune or use RAG instead of re-sending the same context every call. At Letbrand we audit each lever and typically cut costs by 40-80%.

Feature	GLM 4.7 Flash	o3
Provider	Z.AI	OpenAI
Tier	Budget	Reasoning
Input per 1M tokens	$0.06	$2
Output per 1M tokens	$0.4	$8
Cached input per 1M	$0.006	$0.5
Context window	202K	200K
Speed	Fast	Slow
Vision (image input)	Yes	Yes
Function calling	Yes	Yes
Batch API	No	Yes

Feature	GLM 4.7 Flash	o3
Provider	Z.AI	OpenAI
Tier	Budget	Reasoning
Input per 1M tokens	$0.06	$2
Output per 1M tokens	$0.4	$8
Cached input per 1M	$0.006	$0.5
Context window	202K	200K
Speed	Fast	Slow
Vision (image input)	Yes	Yes
Function calling	Yes	Yes
Batch API	No	Yes

GLM 4.7 Flash vs o3

Quick verdict

Head-to-head

Estimated monthly cost

Which one should I pick?

Related comparisons

Frequently asked questions about AI models and pricing

What is a token and how many tokens does a word have?

Why do cached tokens cost up to 90% less?

Which model should I use for a chatbot, code generation or document analysis?

How do I cut AI costs in my application?

GLM 4.7 Flash vs o3

Quick verdict

Head-to-head

Estimated monthly cost

Which one should I pick?

Related comparisons

Frequently asked questions about AI models and pricing

What is a token and how many tokens does a word have?

Why do cached tokens cost up to 90% less?

Which model should I use for a chatbot, code generation or document analysis?

How do I cut AI costs in my application?