Open Weight Models Cheatsheet 2026 | Joshua Schultz Ops Command Center

Why Open Weights?

Benefit	Description
Data Privacy	Your data never leaves your infrastructure
Cost Control	No per-token API fees after hardware investment
Customization	Fine-tune with LoRA/QLoRA for your domain
No Rate Limits	Scale throughput with your hardware

Model	Version	Total Params	Active	Context	License	Min Hardware
Llama 4 Maverick	`llama-4-maverick-17b-128e`	400B	17B	512K–1M	Community	4× H100
Llama 4 Scout	`llama-4-scout-17b-16e`	109B	17B	10M	Community	1× H100 (Q4)

Tip: Llama 4 Scout’s 10M context window is the largest available — analyze entire codebases at once.

Model	Version	Total Params	Active	Context	License	Best For
DeepSeek V3.2	`deepseek-v3.2`	671B	37B	128K	MIT	Cost-effective flagship
DeepSeek R1	`deepseek-r1-0528`	671B	37B	128K	MIT	Math & logic reasoning

Tip: DeepSeek R1 has distilled versions (1.5B–70B) that run on consumer hardware with reasoning capabilities.

Model	Version	Params	Context	License	Min Hardware
Qwen 3 235B-A22B	`qwen3-235b-a22b`	235B (22B active)	128K	Apache 2.0	4× A100 80GB
Qwen 3 72B	`qwen3-72b`	72B	128K	Apache 2.0	2× A100 80GB
Qwen 3 32B	`qwen3-32b`	32B	128K	Apache 2.0	1× A100 40GB

Tip: Qwen 3 family uses Apache 2.0 — fully permissive for commercial use with no restrictions.

Model	Version	Params	Context	Modality	Min Hardware
Gemma 3 27B	`gemma-3-27b-it`	27B	128K	Text + Vision	1× A100 40GB
Gemma 3 12B	`gemma-3-12b-it`	12B	128K	Text + Vision	RTX 4090
Gemma 3 4B	`gemma-3-4b-it`	4B	128K	Text + Vision	RTX 3060 12GB

Tip: Gemma 3 includes vision capabilities and supports 140+ languages.

Model	Version	Params	Context	License	Best For
Mixtral 8x22B	`mixtral-8x22b-instruct`	176B (44B active)	64K	Apache 2.0	EU-based (GDPR)
Mistral Nemo	`mistral-nemo-12b`	12B	128K	Apache 2.0	Drop-in replacement
Mistral 7B	`mistral-7b-instruct-v0.3`	7B	32K	Apache 2.0	Ollama default

Model	Version	Params	Context	License	Best For
Phi-4	`phi-4`	14B	16K	MIT	STEM reasoning, math
Phi-3.5 Mini	`phi-3.5-mini-128k`	3.8B	128K	MIT	Ultra-efficient, mobile

Tip: Phi-4 punches well above its weight class on GSM8K and coding benchmarks.

Model	Provider	Params (Active)	Context	License	Min Hardware
Llama 4 Maverick	Meta	400B (17B)	1M	Community	4× H100
Llama 4 Scout	Meta	109B (17B)	10M	Community	1× H100 (Q4)
DeepSeek V3.2	DeepSeek	671B (37B)	128K	MIT	8× H100
DeepSeek R1	DeepSeek	671B (37B)	128K	MIT	8× H100
Qwen 3 235B	Alibaba	235B (22B)	128K	Apache 2.0	4× A100 80GB
Qwen 3 72B	Alibaba	72B	128K	Apache 2.0	2× A100 80GB
Gemma 3 27B	Google	27B	128K	Gemma Terms	1× A100 40GB
Mixtral 8x22B	Mistral	176B (44B)	64K	Apache 2.0	4× A100 80GB
Phi-4	Microsoft	14B	16K	MIT	RTX 3090
Mistral 7B	Mistral	7B	32K	Apache 2.0	RTX 3080

Model	Provider	Params	License	Min VRAM	Best For
Stable Diffusion 3.5	Stability AI	8B	Stability Community	RTX 4090 24GB	Self-hosting, LoRA, ControlNet
Flux.1 Dev	Black Forest Labs	12B	NC License	RTX 4090 24GB	Photorealism, research
Flux.1 Schnell	Black Forest Labs	12B	Apache 2.0	RTX 3090 24GB	Commercial use, speed

Tip: Flux.1 Schnell uses Apache 2.0 license — fully commercial-ready and 10× faster than Dev.

Model	Provider	Type	Languages	License	Min VRAM
Whisper Large v3	OpenAI	Speech-to-Text	99	MIT	RTX 3060
Bark	Suno	Text-to-Speech	13+	MIT	12GB
Tortoise TTS	Community	Text-to-Speech	English	Apache 2.0	RTX 3080
MusicGen Large	Meta	Text-to-Music	N/A	CC-BY-NC	RTX 3090 24GB

RTX 3080 / 3090 / 4090

1-2× A100 / H100

4-8× H100 Cluster

Framework	Best For	Key Feature
vLLM	Production APIs	PagedAttention, high throughput
Ollama	Local inference	One-command setup
llama.cpp	CPU/low VRAM	Quantization, runs on laptops
TGI	HuggingFace	Docker-ready production
SGLang	Structured output	Fast JSON/code generation
ExLlamaV2	Consumer GPUs	Fastest quantized inference

License	Models	Commercial Use
MIT	DeepSeek, Phi, Bark, Whisper	✅ Fully permissive
Apache 2.0	Qwen, Mistral, Mixtral, Flux Schnell	✅ Fully permissive
Llama Community	Llama 4	✅ With restrictions
Gemma Terms	Gemma 3	✅ With restrictions
CC-BY-NC	MusicGen	❌ Non-commercial only
Custom NC	Flux.1 Dev	❌ Non-commercial only

Best overall open model: Llama 4 Maverick or DeepSeek V3.2 — compete with closed-source flagships
Best for reasoning: DeepSeek R1 with MIT license, distilled versions run on consumer hardware
Longest context: Llama 4 Scout (10M tokens) — process entire codebases
Best Apache 2.0 license: Qwen 3 family — fully permissive for commercial use
Best for consumer GPUs: Gemma 3, Mistral 7B, Phi-4 — run on RTX 3080/4090
Best image generation: Flux.1 Schnell (Apache 2.0) or SD 3.5 for commercial use