Why Open Weights?
| Benefit | Description |
|---|
| Data Privacy | Your data never leaves your infrastructure |
| Cost Control | No per-token API fees after hardware investment |
| Customization | Fine-tune with LoRA/QLoRA for your domain |
| No Rate Limits | Scale throughput with your hardware |
| Model | Version | Total Params | Active | Context | License | Min Hardware |
|---|
| Llama 4 Maverick | llama-4-maverick-17b-128e | 400B | 17B | 512K–1M | Community | 4× H100 |
| Llama 4 Scout | llama-4-scout-17b-16e | 109B | 17B | 10M | Community | 1× H100 (Q4) |
Tip: Llama 4 Scout’s 10M context window is the largest available — analyze entire codebases at once.
DeepSeek
| Model | Version | Total Params | Active | Context | License | Best For |
|---|
| DeepSeek V3.2 | deepseek-v3.2 | 671B | 37B | 128K | MIT | Cost-effective flagship |
| DeepSeek R1 | deepseek-r1-0528 | 671B | 37B | 128K | MIT | Math & logic reasoning |
Tip: DeepSeek R1 has distilled versions (1.5B–70B) that run on consumer hardware with reasoning capabilities.
Alibaba Qwen 3
| Model | Version | Params | Context | License | Min Hardware |
|---|
| Qwen 3 235B-A22B | qwen3-235b-a22b | 235B (22B active) | 128K | Apache 2.0 | 4× A100 80GB |
| Qwen 3 72B | qwen3-72b | 72B | 128K | Apache 2.0 | 2× A100 80GB |
| Qwen 3 32B | qwen3-32b | 32B | 128K | Apache 2.0 | 1× A100 40GB |
Tip: Qwen 3 family uses Apache 2.0 — fully permissive for commercial use with no restrictions.
Google Gemma 3
| Model | Version | Params | Context | Modality | Min Hardware |
|---|
| Gemma 3 27B | gemma-3-27b-it | 27B | 128K | Text + Vision | 1× A100 40GB |
| Gemma 3 12B | gemma-3-12b-it | 12B | 128K | Text + Vision | RTX 4090 |
| Gemma 3 4B | gemma-3-4b-it | 4B | 128K | Text + Vision | RTX 3060 12GB |
Tip: Gemma 3 includes vision capabilities and supports 140+ languages.
Mistral AI
| Model | Version | Params | Context | License | Best For |
|---|
| Mixtral 8x22B | mixtral-8x22b-instruct | 176B (44B active) | 64K | Apache 2.0 | EU-based (GDPR) |
| Mistral Nemo | mistral-nemo-12b | 12B | 128K | Apache 2.0 | Drop-in replacement |
| Mistral 7B | mistral-7b-instruct-v0.3 | 7B | 32K | Apache 2.0 | Ollama default |
Microsoft Phi
| Model | Version | Params | Context | License | Best For |
|---|
| Phi-4 | phi-4 | 14B | 16K | MIT | STEM reasoning, math |
| Phi-3.5 Mini | phi-3.5-mini-128k | 3.8B | 128K | MIT | Ultra-efficient, mobile |
Tip: Phi-4 punches well above its weight class on GSM8K and coding benchmarks.
Quick Comparison: LLMs
| Model | Provider | Params (Active) | Context | License | Min Hardware |
|---|
| Llama 4 Maverick | Meta | 400B (17B) | 1M | Community | 4× H100 |
| Llama 4 Scout | Meta | 109B (17B) | 10M | Community | 1× H100 (Q4) |
| DeepSeek V3.2 | DeepSeek | 671B (37B) | 128K | MIT | 8× H100 |
| DeepSeek R1 | DeepSeek | 671B (37B) | 128K | MIT | 8× H100 |
| Qwen 3 235B | Alibaba | 235B (22B) | 128K | Apache 2.0 | 4× A100 80GB |
| Qwen 3 72B | Alibaba | 72B | 128K | Apache 2.0 | 2× A100 80GB |
| Gemma 3 27B | Google | 27B | 128K | Gemma Terms | 1× A100 40GB |
| Mixtral 8x22B | Mistral | 176B (44B) | 64K | Apache 2.0 | 4× A100 80GB |
| Phi-4 | Microsoft | 14B | 16K | MIT | RTX 3090 |
| Mistral 7B | Mistral | 7B | 32K | Apache 2.0 | RTX 3080 |
Image Generation (Open Weights)
| Model | Provider | Params | License | Min VRAM | Best For |
|---|
| Stable Diffusion 3.5 | Stability AI | 8B | Stability Community | RTX 4090 24GB | Self-hosting, LoRA, ControlNet |
| Flux.1 Dev | Black Forest Labs | 12B | NC License | RTX 4090 24GB | Photorealism, research |
| Flux.1 Schnell | Black Forest Labs | 12B | Apache 2.0 | RTX 3090 24GB | Commercial use, speed |
Tip: Flux.1 Schnell uses Apache 2.0 license — fully commercial-ready and 10× faster than Dev.
Audio & Speech (Open Source)
| Model | Provider | Type | Languages | License | Min VRAM |
|---|
| Whisper Large v3 | OpenAI | Speech-to-Text | 99 | MIT | RTX 3060 |
| Bark | Suno | Text-to-Speech | 13+ | MIT | 12GB |
| Tortoise TTS | Community | Text-to-Speech | English | Apache 2.0 | RTX 3080 |
| MusicGen Large | Meta | Text-to-Music | N/A | CC-BY-NC | RTX 3090 24GB |
Hardware Tiers for Self-Hosting
Consumer (8-24GB VRAM)
RTX 3080 / 3090 / 4090
- Mistral 7B (FP16)
- Gemma 3 12B (FP16)
- Phi-4 14B (FP16)
- Qwen 3 32B (Q4)
- Llama 3.3 70B (Q4)
Prosumer (40-80GB VRAM)
1-2× A100 / H100
- Qwen 3 72B (FP16)
- Gemma 3 27B (FP16)
- Llama 4 Scout (Q4)
- DeepSeek R1 distilled 70B
Enterprise (320GB+ VRAM)
4-8× H100 Cluster
- Llama 4 Maverick (FP16)
- DeepSeek V3.2 (FP8)
- DeepSeek R1 (FP8)
- Qwen 3 235B (FP16)
Inference Frameworks
| Framework | Best For | Key Feature |
|---|
| vLLM | Production APIs | PagedAttention, high throughput |
| Ollama | Local inference | One-command setup |
| llama.cpp | CPU/low VRAM | Quantization, runs on laptops |
| TGI | HuggingFace | Docker-ready production |
| SGLang | Structured output | Fast JSON/code generation |
| ExLlamaV2 | Consumer GPUs | Fastest quantized inference |
License Quick Reference
| License | Models | Commercial Use |
|---|
| MIT | DeepSeek, Phi, Bark, Whisper | ✅ Fully permissive |
| Apache 2.0 | Qwen, Mistral, Mixtral, Flux Schnell | ✅ Fully permissive |
| Llama Community | Llama 4 | ✅ With restrictions |
| Gemma Terms | Gemma 3 | ✅ With restrictions |
| CC-BY-NC | MusicGen | ❌ Non-commercial only |
| Custom NC | Flux.1 Dev | ❌ Non-commercial only |
Key Takeaways
- Best overall open model: Llama 4 Maverick or DeepSeek V3.2 — compete with closed-source flagships
- Best for reasoning: DeepSeek R1 with MIT license, distilled versions run on consumer hardware
- Longest context: Llama 4 Scout (10M tokens) — process entire codebases
- Best Apache 2.0 license: Qwen 3 family — fully permissive for commercial use
- Best for consumer GPUs: Gemma 3, Mistral 7B, Phi-4 — run on RTX 3080/4090
- Best image generation: Flux.1 Schnell (Apache 2.0) or SD 3.5 for commercial use