AI Models Cheatsheet 2026 | Joshua Schultz Ops Command Center

Model Type Legend

Badge	Meaning
Flagship	Maximum capability
Balanced	Best value/performance
Fast	Low latency, cost-efficient
Reasoning	Extended thinking mode
Open Weights	Self-hostable

Anthropic Claude

Model	Version	Context	Output	Input Price	Output Price	Best For
Claude Opus 4.5	`claude-opus-4-5-20251101`	200K	64K	$5/1M	$25/1M	Complex reasoning, agents, research
Claude Sonnet 4.5	`claude-sonnet-4-5-20250929`	200K–1M	64K	$3/1M	$15/1M	Coding (#1 SWE-bench), computer use
Claude Haiku 4.5	`claude-haiku-4-5-20251001`	200K	64K	$1/1M	$5/1M	High-volume, real-time, classification

Tip: Claude Sonnet 4.5 leads SWE-bench for coding tasks. Use Opus for complex multi-step reasoning.

OpenAI GPT

Model	Version	Context	Output	Input Price	Output Price	Best For
GPT-5.2 Pro	`gpt-5.2-pro`	196K	32K	$1.25/1M	$10/1M	Enterprise, finance, health
GPT-5.2 Thinking	`gpt-5.2-thinking`	196K	32K	$1.25/1M	$10/1M	Multi-step reasoning, math
GPT-5.2 Instant	`gpt-5.2-instant`	128K	16K	$0.50/1M	$2/1M	Quick lookups, drafting, high throughput

Google Gemini

Model	Version	Context	Output	Input Price	Output Price	Best For
Gemini 2.5 Pro	`gemini-2.5-pro`	1M (2M soon)	65K	$1.25/1M	$10/1M	Ultra-long context, multimodal
Gemini 2.5 Flash	`gemini-2.5-flash`	1M	65K	$0.15/1M	$0.60/1M	Best price/performance, documents
Gemini 2.5 Flash-Lite	`gemini-2.5-flash-lite`	1M	32K	$0.10/1M	$0.40/1M	Lowest cost, classification, routing

Tip: Gemini 2.5 Flash offers the best price/performance ratio for most production workloads.

Meta Llama (Open Weights)

Model	Version	Context	Parameters	Architecture	Best For
Llama 4 Maverick	`llama-4-maverick-17b-128e`	512K–1M	400B (17B active)	MoE (128 experts)	Self-hosting, multimodal, multilingual
Llama 4 Scout	`llama-4-scout-17b-16e`	10M	109B (17B active)	MoE (16 experts)	Massive context, codebase analysis

Tip: Llama 4 Scout’s 10M context window is the largest available — great for analyzing entire codebases.

DeepSeek (Open Weights)

Model	Version	Context	Parameters	Input Price	Output Price	Best For
DeepSeek V3.2	`deepseek-v3.2`	128K	671B (37B active)	$0.27/1M	$1.10/1M	Cost-effective general use
DeepSeek R1	`deepseek-r1-0528`	128K	671B (37B active)	$0.55/1M	$2.19/1M	Math reasoning, chain-of-thought

xAI Grok & Mistral

Model	Version	Context	Price (In/Out)	Best For
Grok 4.1	`grok-4.1`	256K–2M	$3/$15 per 1M	Real-time X data, agentic workflows
Mistral Large 3	`mistral-large-2412`	256K	$2/$6 per 1M	Enterprise EU (GDPR), function calling

Quick Comparison: LLMs

Model	Provider	Context	Price (In/Out)	Best Use Case
Claude Opus 4.5	Anthropic	200K	$5/$25	Complex reasoning, research
Claude Sonnet 4.5	Anthropic	200K–1M	$3/$15	Coding (#1), production
Claude Haiku 4.5	Anthropic	200K	$1/$5	High-volume, real-time
GPT-5.2 Pro	OpenAI	196K	$1.25/$10	Enterprise, finance
Gemini 2.5 Pro	Google	1M (2M)	$1.25/$10	Ultra-long context
Gemini 2.5 Flash	Google	1M	$0.15/$0.60	Best price/perf
Llama 4 Scout	Meta	10M	Self-host	Largest context, open
DeepSeek R1	DeepSeek	128K	$0.55/$2.19	Math reasoning, open
Grok 4.1	xAI	256K–2M	$3/$15	Real-time X data

Image Generation Models

Model	Provider	Max Resolution	Pricing	Best For
DALL·E 3	OpenAI	1792×1024	$0.04–0.08/img	Text rendering, ChatGPT integration
Midjourney V7	Midjourney	2048×2048	$10–120/mo	Artistic quality, concept art
Flux 1.1 Pro	Black Forest Labs	4MP (4K Ultra)	$0.04–0.06/img	Photorealism (#1), speed
Stable Diffusion 3.5	Stability AI	Open weights	~$0.002/img	Self-hosting, LoRA, ControlNet
Imagen 3	Google	2048×2048	Vertex AI	Google ecosystem, enterprise

Tip: Flux 1.1 Pro leads photorealism benchmarks. Midjourney excels at artistic/stylized output.

Video Generation Models

Model	Provider	Max Duration	Resolution	Native Audio	Pricing	Best For
Sora 2	OpenAI	20s	1080p	Yes (lip-sync)	$0.10–0.15/s	Lip-sync, physical realism
Veo 3	Google	8s (60s ent.)	4K	Yes (music/SFX)	$0.20–0.40/s	4K cinematic, YouTube
Runway Gen-4	Runway	16s	4K upscale	No	$12–95/mo	VFX, camera control
Kling AI 2.1	Kuaishou	2 minutes	1080p	No	$10/mo	Longest duration, budget
Pika 2.0	Pika Labs	10s	1080p	No	$8/mo	Rapid prototyping

Audio & Voice Models

Model	Provider	Type	Key Feature	Pricing	Best For
ElevenLabs TTS	ElevenLabs	Voice Synthesis	29+ languages, cloning	$5–99/mo	Voice synthesis (#1), dubbing
Eleven Music	ElevenLabs	Music Generation	Commercially licensed	$5–99/mo	Commercial use, stems export
Suno v4	Suno	Music + Vocals	4-min full songs	$8–30/mo	Full songs with vocals
Udio	Udio	Music + Vocals	15-min songs	$10–30/mo	Long-form music, remixing

Decision Flowchart

If you need…	Use…	Why
MAX INTELLIGENCE	Claude Opus 4.5 or GPT-5.2 Pro	Complex analysis, nuanced writing
FAST + CHEAP	Gemini 2.5 Flash or Claude Haiku	High volume, real-time, chat
CODE	Claude Sonnet 4.5	#1 SWE-bench, agentic workflows
MATH/REASONING	GPT-5.2 Thinking or DeepSeek R1	Proofs, competition math
LONG CONTEXT	Gemini 2.5 Pro (1-2M) or Llama Scout (10M)	Large codebases, book analysis
SELF-HOSTING	Llama 4, DeepSeek V3, Qwen3	Air-gapped, compliance, cost control

Key Selection Criteria

Data Privacy & Compliance

Self-hostable (air-gapped): Llama 4, DeepSeek, Mistral, Qwen3
SOC 2 compliant APIs: Anthropic, OpenAI, Google
EU/GDPR focus: Mistral Large 3 (EU-based)
No training on inputs: Claude API, enterprise tiers

Latency & Throughput

Fastest TTFT: Gemini Flash, Claude Haiku, GPT-5.2 Instant
Highest tokens/sec: Groq (Llama), Cerebras, SambaNova
Best for streaming: All Claude models, Gemini, GPT
Batch processing: Anthropic Batch API (50% cheaper)

Tool Use & Agents

Best function calling: Claude Sonnet 4.5, GPT-5.2
Computer use: Claude (native), Operator (GPT)
MCP support: Claude (native), growing ecosystem
Parallel tool calls: All major providers

Cost Optimization

Cheapest quality: DeepSeek ($0.27/1M in), Gemini Flash-Lite ($0.10)
Best value flagship: Claude Sonnet 4.5 ($3/$15)
Prompt caching: Anthropic (90% cheaper), Google
Free tier: Gemini (generous), Mistral (limited)

Architecture Guide (Plain English)

Transformer

The foundation of all modern AI. Like a smart reader that sees the whole page at once instead of word-by-word.

Used by: GPT, Claude, Gemini, Llama — virtually all LLMs

Mixture of Experts (MoE)

Model has many “expert” sub-networks but only activates a few per task. Like a hospital with 100 specialists where each patient only sees 2-3 relevant doctors.

“Active parameters”: The A in “671B-A37B” means only 37B used at once
Used by: DeepSeek, Llama 4, Grok 4.1, Mistral Large 3

Context Window

How much text the model can “see” at once:

128K tokens: ~300 pages, good for most tasks
1M tokens: ~2,500 pages, entire codebases
10M tokens: ~25,000 pages (Llama 4 Scout)

Reasoning/Thinking Models

Models that “think out loud” before answering. Like showing your work in math class.

Chain-of-Thought (CoT): Basic “let me think step by step”
Extended Thinking: Longer, structured reasoning (Claude)
RL-trained reasoning: Learned via reinforcement learning (DeepSeek R1)

Diffusion (Images/Video)

Starts with random noise and gradually “denoises” into the final image. Like TV static slowly clearing to reveal a picture.

Used by: Stable Diffusion, DALL·E 3, Midjourney, Flux, Sora, Veo

Open Weights vs. Closed

Open weights: Download and run yourself (Llama, DeepSeek, Mistral)
Closed/API-only: Access via API only (Claude, GPT, Gemini)
Why it matters: Open = self-host for privacy, no API costs (but need hardware)

Key Takeaways

Maximum capability: Claude Opus 4.5, GPT-5.2 Pro, or Grok 4.1
Best coding model: Claude Sonnet 4.5 (SWE-bench #1)
Longest context: Llama 4 Scout (10M), Gemini 2.5 Pro (1-2M)
Best price/performance: Gemini 2.5 Flash, DeepSeek V3.2
Self-hosting: Llama 4, DeepSeek, Mistral (open weights)
Best image photorealism: Flux 1.1 Pro
Best video quality: Veo 3 (4K), Sora 2 (lip-sync)
Best voice synthesis: ElevenLabs TTS