Model Type Legend
| Badge | Meaning |
|---|
| Flagship | Maximum capability |
| Balanced | Best value/performance |
| Fast | Low latency, cost-efficient |
| Reasoning | Extended thinking mode |
| Open Weights | Self-hostable |
Anthropic Claude
| Model | Version | Context | Output | Input Price | Output Price | Best For |
|---|
| Claude Opus 4.5 | claude-opus-4-5-20251101 | 200K | 64K | $5/1M | $25/1M | Complex reasoning, agents, research |
| Claude Sonnet 4.5 | claude-sonnet-4-5-20250929 | 200K–1M | 64K | $3/1M | $15/1M | Coding (#1 SWE-bench), computer use |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 | 200K | 64K | $1/1M | $5/1M | High-volume, real-time, classification |
Tip: Claude Sonnet 4.5 leads SWE-bench for coding tasks. Use Opus for complex multi-step reasoning.
OpenAI GPT
| Model | Version | Context | Output | Input Price | Output Price | Best For |
|---|
| GPT-5.2 Pro | gpt-5.2-pro | 196K | 32K | $1.25/1M | $10/1M | Enterprise, finance, health |
| GPT-5.2 Thinking | gpt-5.2-thinking | 196K | 32K | $1.25/1M | $10/1M | Multi-step reasoning, math |
| GPT-5.2 Instant | gpt-5.2-instant | 128K | 16K | $0.50/1M | $2/1M | Quick lookups, drafting, high throughput |
Google Gemini
| Model | Version | Context | Output | Input Price | Output Price | Best For |
|---|
| Gemini 2.5 Pro | gemini-2.5-pro | 1M (2M soon) | 65K | $1.25/1M | $10/1M | Ultra-long context, multimodal |
| Gemini 2.5 Flash | gemini-2.5-flash | 1M | 65K | $0.15/1M | $0.60/1M | Best price/performance, documents |
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite | 1M | 32K | $0.10/1M | $0.40/1M | Lowest cost, classification, routing |
Tip: Gemini 2.5 Flash offers the best price/performance ratio for most production workloads.
| Model | Version | Context | Parameters | Architecture | Best For |
|---|
| Llama 4 Maverick | llama-4-maverick-17b-128e | 512K–1M | 400B (17B active) | MoE (128 experts) | Self-hosting, multimodal, multilingual |
| Llama 4 Scout | llama-4-scout-17b-16e | 10M | 109B (17B active) | MoE (16 experts) | Massive context, codebase analysis |
Tip: Llama 4 Scout’s 10M context window is the largest available — great for analyzing entire codebases.
DeepSeek (Open Weights)
| Model | Version | Context | Parameters | Input Price | Output Price | Best For |
|---|
| DeepSeek V3.2 | deepseek-v3.2 | 128K | 671B (37B active) | $0.27/1M | $1.10/1M | Cost-effective general use |
| DeepSeek R1 | deepseek-r1-0528 | 128K | 671B (37B active) | $0.55/1M | $2.19/1M | Math reasoning, chain-of-thought |
xAI Grok & Mistral
| Model | Version | Context | Price (In/Out) | Best For |
|---|
| Grok 4.1 | grok-4.1 | 256K–2M | $3/$15 per 1M | Real-time X data, agentic workflows |
| Mistral Large 3 | mistral-large-2412 | 256K | $2/$6 per 1M | Enterprise EU (GDPR), function calling |
Quick Comparison: LLMs
| Model | Provider | Context | Price (In/Out) | Best Use Case |
|---|
| Claude Opus 4.5 | Anthropic | 200K | $5/$25 | Complex reasoning, research |
| Claude Sonnet 4.5 | Anthropic | 200K–1M | $3/$15 | Coding (#1), production |
| Claude Haiku 4.5 | Anthropic | 200K | $1/$5 | High-volume, real-time |
| GPT-5.2 Pro | OpenAI | 196K | $1.25/$10 | Enterprise, finance |
| Gemini 2.5 Pro | Google | 1M (2M) | $1.25/$10 | Ultra-long context |
| Gemini 2.5 Flash | Google | 1M | $0.15/$0.60 | Best price/perf |
| Llama 4 Scout | Meta | 10M | Self-host | Largest context, open |
| DeepSeek R1 | DeepSeek | 128K | $0.55/$2.19 | Math reasoning, open |
| Grok 4.1 | xAI | 256K–2M | $3/$15 | Real-time X data |
Image Generation Models
| Model | Provider | Max Resolution | Pricing | Best For |
|---|
| DALL·E 3 | OpenAI | 1792×1024 | $0.04–0.08/img | Text rendering, ChatGPT integration |
| Midjourney V7 | Midjourney | 2048×2048 | $10–120/mo | Artistic quality, concept art |
| Flux 1.1 Pro | Black Forest Labs | 4MP (4K Ultra) | $0.04–0.06/img | Photorealism (#1), speed |
| Stable Diffusion 3.5 | Stability AI | Open weights | ~$0.002/img | Self-hosting, LoRA, ControlNet |
| Imagen 3 | Google | 2048×2048 | Vertex AI | Google ecosystem, enterprise |
Tip: Flux 1.1 Pro leads photorealism benchmarks. Midjourney excels at artistic/stylized output.
Video Generation Models
| Model | Provider | Max Duration | Resolution | Native Audio | Pricing | Best For |
|---|
| Sora 2 | OpenAI | 20s | 1080p | Yes (lip-sync) | $0.10–0.15/s | Lip-sync, physical realism |
| Veo 3 | Google | 8s (60s ent.) | 4K | Yes (music/SFX) | $0.20–0.40/s | 4K cinematic, YouTube |
| Runway Gen-4 | Runway | 16s | 4K upscale | No | $12–95/mo | VFX, camera control |
| Kling AI 2.1 | Kuaishou | 2 minutes | 1080p | No | $10/mo | Longest duration, budget |
| Pika 2.0 | Pika Labs | 10s | 1080p | No | $8/mo | Rapid prototyping |
Audio & Voice Models
| Model | Provider | Type | Key Feature | Pricing | Best For |
|---|
| ElevenLabs TTS | ElevenLabs | Voice Synthesis | 29+ languages, cloning | $5–99/mo | Voice synthesis (#1), dubbing |
| Eleven Music | ElevenLabs | Music Generation | Commercially licensed | $5–99/mo | Commercial use, stems export |
| Suno v4 | Suno | Music + Vocals | 4-min full songs | $8–30/mo | Full songs with vocals |
| Udio | Udio | Music + Vocals | 15-min songs | $10–30/mo | Long-form music, remixing |
Decision Flowchart
| If you need… | Use… | Why |
|---|
| MAX INTELLIGENCE | Claude Opus 4.5 or GPT-5.2 Pro | Complex analysis, nuanced writing |
| FAST + CHEAP | Gemini 2.5 Flash or Claude Haiku | High volume, real-time, chat |
| CODE | Claude Sonnet 4.5 | #1 SWE-bench, agentic workflows |
| MATH/REASONING | GPT-5.2 Thinking or DeepSeek R1 | Proofs, competition math |
| LONG CONTEXT | Gemini 2.5 Pro (1-2M) or Llama Scout (10M) | Large codebases, book analysis |
| SELF-HOSTING | Llama 4, DeepSeek V3, Qwen3 | Air-gapped, compliance, cost control |
Key Selection Criteria
Data Privacy & Compliance
- Self-hostable (air-gapped): Llama 4, DeepSeek, Mistral, Qwen3
- SOC 2 compliant APIs: Anthropic, OpenAI, Google
- EU/GDPR focus: Mistral Large 3 (EU-based)
- No training on inputs: Claude API, enterprise tiers
Latency & Throughput
- Fastest TTFT: Gemini Flash, Claude Haiku, GPT-5.2 Instant
- Highest tokens/sec: Groq (Llama), Cerebras, SambaNova
- Best for streaming: All Claude models, Gemini, GPT
- Batch processing: Anthropic Batch API (50% cheaper)
- Best function calling: Claude Sonnet 4.5, GPT-5.2
- Computer use: Claude (native), Operator (GPT)
- MCP support: Claude (native), growing ecosystem
- Parallel tool calls: All major providers
Cost Optimization
- Cheapest quality: DeepSeek ($0.27/1M in), Gemini Flash-Lite ($0.10)
- Best value flagship: Claude Sonnet 4.5 ($3/$15)
- Prompt caching: Anthropic (90% cheaper), Google
- Free tier: Gemini (generous), Mistral (limited)
Architecture Guide (Plain English)
The foundation of all modern AI. Like a smart reader that sees the whole page at once instead of word-by-word.
- Used by: GPT, Claude, Gemini, Llama — virtually all LLMs
Mixture of Experts (MoE)
Model has many “expert” sub-networks but only activates a few per task. Like a hospital with 100 specialists where each patient only sees 2-3 relevant doctors.
- “Active parameters”: The A in “671B-A37B” means only 37B used at once
- Used by: DeepSeek, Llama 4, Grok 4.1, Mistral Large 3
Context Window
How much text the model can “see” at once:
- 128K tokens: ~300 pages, good for most tasks
- 1M tokens: ~2,500 pages, entire codebases
- 10M tokens: ~25,000 pages (Llama 4 Scout)
Reasoning/Thinking Models
Models that “think out loud” before answering. Like showing your work in math class.
- Chain-of-Thought (CoT): Basic “let me think step by step”
- Extended Thinking: Longer, structured reasoning (Claude)
- RL-trained reasoning: Learned via reinforcement learning (DeepSeek R1)
Diffusion (Images/Video)
Starts with random noise and gradually “denoises” into the final image. Like TV static slowly clearing to reveal a picture.
- Used by: Stable Diffusion, DALL·E 3, Midjourney, Flux, Sora, Veo
Open Weights vs. Closed
- Open weights: Download and run yourself (Llama, DeepSeek, Mistral)
- Closed/API-only: Access via API only (Claude, GPT, Gemini)
- Why it matters: Open = self-host for privacy, no API costs (but need hardware)
Key Takeaways
- Maximum capability: Claude Opus 4.5, GPT-5.2 Pro, or Grok 4.1
- Best coding model: Claude Sonnet 4.5 (SWE-bench #1)
- Longest context: Llama 4 Scout (10M), Gemini 2.5 Pro (1-2M)
- Best price/performance: Gemini 2.5 Flash, DeepSeek V3.2
- Self-hosting: Llama 4, DeepSeek, Mistral (open weights)
- Best image photorealism: Flux 1.1 Pro
- Best video quality: Veo 3 (4K), Sora 2 (lip-sync)
- Best voice synthesis: ElevenLabs TTS