TL;DR
"Open source" in AI is a legal minefield in 2026. Most popular models—Llama 4, Mistral, DeepSeek-V3, Qwen3—use custom licenses with hidden restrictions on commercial use, output training, and deployment scale. This guide maps the entire license landscape from fully permissive (Apache 2.0) to behaviorally restricted (RAIL), provides a decision framework for production deployment, and covers the EU AI Act's new requirements for open-weight model providers. Get it wrong, and you face lawsuits, forced model takedowns, or €35M fines.
Table of Contents
- Key Takeaways
- The 2026 Open Source AI License Landscape
- License Taxonomy and Comparison
- Deep Dive: RAIL (Responsible AI License)
- Deep Dive: Llama Community License
- Deep Dive: DeepSeek and Chinese Model Licenses
- EU AI Act and Open Source
- Compliance Decision Framework
- Production Compliance Checklist
- Common Compliance Traps
- Best Practices
- FAQ
- Summary
- Related Resources
Key Takeaways
- "Open source" ≠ "use however you want": Only Apache 2.0 and MIT models grant truly unrestricted commercial rights. Most AI "open" models use custom licenses with significant restrictions.
- Scale triggers matter: Llama's 700M MAU threshold and similar clauses mean your license obligations change as you grow—what's free at Series A may require licensing at scale.
- RAIL licenses restrict behavior, not code: You can modify the model freely, but you cannot deploy it for surveillance, disinformation, or discrimination—and the definition of "harmful use" keeps expanding.
- EU AI Act creates new obligations: Even open-source models must comply with GPAI transparency rules if training compute exceeds 10^25 FLOPs or if you deploy in high-risk contexts.
- Output training prohibitions are enforceable: Using Llama outputs to train competing models violates the license—and Meta has signaled willingness to enforce this.
The 2026 Open Source AI License Landscape
The term "open source" in AI has diverged dramatically from its meaning in traditional software. When Linux or PostgreSQL say "open source," they mean the OSI (Open Source Initiative) definition: free to use, modify, distribute, and commercialize without restriction. When Meta says Llama is "open source," they mean something fundamentally different.
In 2026, the AI model licensing landscape looks like this:
Truly open source (OSI-compliant):
- OLMo 2 (Apache 2.0) — Allen AI
- Mistral 7B (Apache 2.0) — Mistral AI (early models only)
- DeepSeek-V3 / R1 (MIT) — DeepSeek
"Open weights" with restrictions:
- Llama 4 (Llama Community License) — Meta
- Gemma 2/3 (Gemma Terms of Use) — Google
- Qwen3 (Qwen License / Apache 2.0 depending on variant) — Alibaba
- Command R+ (CC-BY-NC) — Cohere
Responsible AI licensed:
- BLOOM (BigScience RAIL) — BigScience
- Stable Diffusion (CreativeML Open RAIL-M) — Stability AI
- StarCoder2 (BigCode Open RAIL-M) — BigCode
The OSI published its official Open Source AI Definition (OSAID) v1.0 in October 2024, which requires that models provide access to training data information, model architecture, and training code. By this definition, most "open" models—including Llama, Gemma, and Qwen—are not open source. They are "open weight" models with proprietary training processes.
This distinction matters legally. When a license says "open source," courts may interpret it through the OSI lens. When it says "community license" or "research license," different rules apply.
License Taxonomy and Comparison
The Complete License Comparison Matrix
| License | Commercial Use | Modification | Distribution | Use Restrictions | Patent Grant | Key Gotchas |
|---|---|---|---|---|---|---|
| Apache 2.0 | ✅ Unrestricted | ✅ Full | ✅ Full | ❌ None | ✅ Explicit | Must include NOTICE file; patent grant terminates on litigation |
| MIT | ✅ Unrestricted | ✅ Full | ✅ Full | ❌ None | ⚠️ Implicit | No patent clause—potential risk for model architectures with patents |
| BSD-3-Clause | ✅ Unrestricted | ✅ Full | ✅ Full | ❌ None | ⚠️ Implicit | Cannot use contributor names for endorsement |
| GPL-3.0 | ✅ With copyleft | ✅ Copyleft | ✅ Copyleft | ❌ None | ✅ Explicit | Derivative works must also be GPL—rare in AI models |
| AGPL-3.0 | ✅ With copyleft | ✅ Copyleft | ✅ Network copyleft | ❌ None | ✅ Explicit | Network use triggers distribution—API serving counts |
| Llama Community | ⚠️ <700M MAU | ✅ Full | ⚠️ Conditional | ⚠️ Output training ban | ❌ None | Must request license above threshold; no competing model training |
| Gemma Terms | ⚠️ Conditional | ✅ Full | ⚠️ Conditional | ⚠️ Harm restrictions | ❌ None | Cannot use to create competing products; redistribution requires Gemma branding |
| Mistral Research | ❌ Research only | ✅ Research only | ⚠️ Non-commercial | ✅ Research-only | ❌ None | Production use requires commercial agreement |
| RAIL-M | ✅ With restrictions | ✅ Full | ✅ With attachment | ✅ Behavioral | ⚠️ Varies | Must propagate use restrictions to downstream users |
| RAIL-S | ✅ With restrictions | ✅ Full | ✅ With attachment | ✅ Behavioral (source) | ⚠️ Varies | Includes source code copyleft for model code |
| BigScience BLOOM | ✅ With restrictions | ✅ Full | ✅ With attachment | ✅ Behavioral | ❌ None | 13 prohibited use cases; must include model card |
| DeepSeek (MIT) | ✅ Unrestricted | ✅ Full | ✅ Full | ❌ None | ⚠️ Implicit | Genuinely permissive—one of the most commercially friendly |
| Qwen License | ⚠️ <100M MAU | ✅ Full | ⚠️ Conditional | ⚠️ Some restrictions | ❌ None | Threshold varies by model size; some variants use Apache 2.0 |
| Yi License | ⚠️ Conditional | ✅ Full | ⚠️ Must register | ⚠️ Some restrictions | ❌ None | Requires registration for commercial use above threshold |
| CC-BY-NC-4.0 | ❌ Non-commercial | ✅ Full | ✅ With attribution | ⚠️ Non-commercial only | ❌ None | Any commercial use requires separate agreement |
License Categories Explained
Fully Permissive (Apache 2.0, MIT, BSD-3)
These are the gold standard for commercial deployment. You can:
- Use the model in any product without restrictions
- Fine-tune and redistribute without obligation (beyond attribution)
- Build competing products
- Use outputs for training other models
Key models: DeepSeek-V3 (MIT), DeepSeek-R1 (MIT), OLMo 2 (Apache 2.0), early Mistral models (Apache 2.0)
Community/Threshold Licenses (Llama, Qwen, Yi)
These appear permissive but include scale-based triggers:
- Free below user/revenue thresholds
- Require commercial agreements above thresholds
- Often prohibit using outputs to train competing models
- May restrict certain deployment contexts
Responsible AI Licenses (RAIL-M, RAIL-S, BLOOM)
These maintain open access while prohibiting specific harmful uses:
- Allow commercial use and modification
- Behavioral restrictions propagate to derivatives
- Require model cards and documentation
- Specific prohibited use cases listed in appendix
Deep Dive: RAIL (Responsible AI License)
What is RAIL?
RAIL (Responsible AI License) is a license family developed by the BigScience project and RAIL Initiative, specifically designed for the unique challenges of AI artifacts. Traditional software licenses focus on code distribution rights. RAIL adds a new dimension: behavioral use restrictions that apply regardless of how you obtained or modified the model.
RAIL-M vs RAIL-S
| Aspect | RAIL-M (Model) | RAIL-S (Source) |
|---|---|---|
| Applies to | Model weights, configs | Model code, training scripts |
| Copyleft for code | ❌ No | ✅ Yes (source must remain open) |
| Behavioral restrictions | ✅ Yes | ✅ Yes |
| Downstream propagation | ✅ Must include restrictions | ✅ Must include restrictions |
| Commercial use | ✅ Allowed | ✅ Allowed |
The Behavioral Use Restrictions
The core innovation of RAIL is its Attachment A—a list of prohibited uses that licensees must agree to. The standard BigScience RAIL restrictions include:
- Surveillance and tracking — Using the model for mass surveillance, facial recognition without consent, or predictive policing
- Disinformation — Generating fake news, deepfakes for deception, or automated propaganda
- Discrimination — Using outputs for discriminatory decisions in housing, employment, credit, or criminal justice
- Military and weapons — Autonomous weapons systems, military targeting, or nuclear/biological/chemical weapon development
- Exploitation — Child sexual abuse material, non-consensual intimate imagery, or human trafficking
- Deception — Impersonating real individuals without consent, fraudulent communications
- Legal circumvention — Evading legal requirements or generating illegal content
Real Cases of RAIL Violations
Case 1: Stable Diffusion and CSAM (2024-2025)
Stability AI's Stable Diffusion uses a CreativeML Open RAIL-M license that explicitly prohibits generating exploitative content. In 2024, researchers demonstrated that fine-tuned versions could generate CSAM. The RAIL license allowed Stability AI to:
- Issue takedown notices to hosting platforms
- Require downstream distributors to implement safety filters
- Revoke access for violating parties
Without the RAIL restrictions, an Apache 2.0 license would have provided no legal recourse.
Case 2: StarCoder and Malware Generation (2025)
BigCode's StarCoder2, licensed under BigCode Open RAIL-M, includes restrictions against using the model to generate malicious code. When a commercial service was discovered using StarCoder2 to automate exploit generation, the license provided grounds for enforcement action—something a standard MIT or Apache license could not support.
How to Check RAIL Compliance Programmatically
import json
from pathlib import Path
class RAILComplianceChecker:
"""Check if a use case complies with RAIL behavioral restrictions."""
PROHIBITED_CATEGORIES = [
"surveillance",
"disinformation",
"discrimination",
"military_weapons",
"exploitation",
"deception",
"legal_circumvention"
]
def __init__(self, license_path: str):
with open(license_path) as f:
self.license_data = json.load(f)
self.restrictions = self.license_data.get("attachment_a", [])
def check_use_case(self, use_case_description: str, deployment_context: dict) -> dict:
"""
Returns compliance assessment for a given use case.
Args:
use_case_description: Plain text description of intended use
deployment_context: Dict with keys like 'domain', 'users', 'outputs'
"""
violations = []
warnings = []
domain = deployment_context.get("domain", "")
# Check against known restricted domains
if domain in ["law_enforcement", "military", "immigration"]:
violations.append(
f"Domain '{domain}' likely conflicts with RAIL restrictions "
f"on surveillance and discrimination"
)
if deployment_context.get("generates_human_likeness", False):
warnings.append(
"Generating human likenesses may conflict with RAIL "
"deception restrictions if used without consent"
)
if deployment_context.get("automated_decisions", False):
warnings.append(
"Automated decision-making in protected domains "
"requires human oversight per RAIL terms"
)
return {
"compliant": len(violations) == 0,
"violations": violations,
"warnings": warnings,
"recommendation": "Proceed with monitoring" if not violations
else "Do not deploy—seek alternative model or license"
}
# Usage
checker = RAILComplianceChecker("model_license.json")
result = checker.check_use_case(
"Customer service chatbot for e-commerce",
{"domain": "retail", "generates_human_likeness": False, "automated_decisions": False}
)
print(result)
# {'compliant': True, 'violations': [], 'warnings': [], 'recommendation': 'Proceed with monitoring'}
Deep Dive: Llama Community License
The 700 Million MAU Threshold
Meta's Llama Community License (used for Llama 3.x and Llama 4) is deceptively simple. The full text is less than 2 pages. But buried in Section 2 is the critical clause:
"If, on the Llama 4 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta."
What counts toward 700M MAU:
- All products and services across your entire organization
- Products of your affiliates (companies you own >50% of)
- Users of products that incorporate any Llama output
What does NOT count:
- Users of other companies' products that happen to use your API
- Internal employee usage for development/testing
The Output Training Prohibition
Section 4(b) states:
"You will not use Llama 4 or any derivative works of Llama 4 to improve any other large language model (excluding Llama 4 or its derivative works)."
This means:
- ❌ You cannot use Llama 4 outputs as training data for GPT, Claude, or your own model
- ❌ You cannot use Llama 4 to generate synthetic data for training a non-Llama model
- ❌ You cannot distill Llama 4 into a smaller non-Llama model
- ✅ You can distill Llama 4 into a smaller Llama derivative
- ✅ You can use Llama 4 outputs for evaluation/benchmarking (not training)
What Counts as "Competing"?
The license does not use the word "competing" explicitly—it says "any other large language model." This is intentionally broad:
- Building a chatbot product with Llama → ✅ Allowed
- Fine-tuning Llama for your domain → ✅ Allowed
- Using Llama synthetic data to train your own foundation model → ❌ Prohibited
- Using Llama to label data that trains a BERT classifier → ⚠️ Gray area (BERT is arguably not a "LLM")
License Verification Script
#!/bin/bash
# check_llama_compliance.sh
# Quick compliance check for Llama deployment
echo "=== Llama Community License Compliance Check ==="
echo ""
# Check 1: MAU threshold
read -p "Organization monthly active users (all products): " mau
if [ "$mau" -gt 700000000 ]; then
echo "❌ FAIL: MAU exceeds 700M threshold. Contact Meta for commercial license."
echo " → https://llama.meta.com/llama-downloads/"
exit 1
else
echo "✅ PASS: MAU ($mau) below 700M threshold"
fi
# Check 2: Output training
read -p "Are Llama outputs used to train non-Llama models? (y/n): " train_check
if [ "$train_check" = "y" ]; then
echo "❌ FAIL: Output training prohibition violated (Section 4b)"
echo " → Remove Llama-generated data from training pipeline"
exit 1
else
echo "✅ PASS: No prohibited output training detected"
fi
# Check 3: Attribution
read -p "Does your product include 'Built with Llama' attribution? (y/n): " attr_check
if [ "$attr_check" = "n" ]; then
echo "⚠️ WARNING: Attribution recommended but not strictly required"
echo " → Consider adding 'Built with Llama' branding"
else
echo "✅ PASS: Attribution included"
fi
# Check 4: Acceptable Use Policy
read -p "Have you reviewed Meta's Acceptable Use Policy? (y/n): " aup_check
if [ "$aup_check" = "n" ]; then
echo "❌ FAIL: Must comply with Meta's Acceptable Use Policy"
echo " → Review: https://llama.meta.com/use-policy/"
exit 1
else
echo "✅ PASS: AUP reviewed and acknowledged"
fi
echo ""
echo "=== All checks passed. Deployment approved. ==="
Deep Dive: DeepSeek and Chinese Model Licenses
DeepSeek: The MIT Exception
In a landscape of increasingly restrictive licenses, DeepSeek made a bold choice: releasing both DeepSeek-V3 and DeepSeek-R1 under the MIT License. This is genuinely remarkable—it means:
- ✅ Full commercial use with zero restrictions
- ✅ Fine-tune, distill, or merge without limitation
- ✅ Use outputs to train other models (including competitors)
- ✅ No user threshold or scale limitation
- ✅ Redistribute modified versions under any license
The only requirement is preserving the copyright notice. For teams concerned about license compliance, DeepSeek models represent the lowest-risk option in the frontier model class.
Why did DeepSeek choose MIT? The strategic rationale appears to be:
- Maximizing adoption to establish DeepSeek as the default open model
- Avoiding enforcement costs (Meta reportedly has a team dedicated to Llama license compliance)
- Benefiting from community fine-tunes and tooling built on their base
Qwen: The Multi-License Strategy
Alibaba's Qwen series uses a more complex approach:
| Model | License | Commercial Threshold | Key Restrictions |
|---|---|---|---|
| Qwen3-235B | Qwen License | 100M MAU | Must register above threshold |
| Qwen3-32B | Apache 2.0 | None | Fully permissive |
| Qwen3-8B | Apache 2.0 | None | Fully permissive |
| Qwen3-0.6B | Apache 2.0 | None | Fully permissive |
| Qwen2.5-VL | Qwen License | 100M MAU | Must register above threshold |
| QwQ-32B | Apache 2.0 | None | Fully permissive |
Key insight: Qwen uses Apache 2.0 for smaller models (where broad adoption helps their ecosystem) but retains the Qwen License for flagship models (where they want commercial relationships with large deployers).
The Qwen License itself is relatively permissive compared to Llama:
- 100M MAU threshold (vs. Llama's 700M)
- No explicit output training prohibition
- Registration required but generally granted automatically
Yi: Registration-Based Licensing
01.AI's Yi series uses a model-specific license that requires:
- Registration on 01.AI's platform for commercial use
- Compliance with usage guidelines (similar to RAIL but less formal)
- No revenue-sharing or fees for organizations under 100M MAU
Practical Comparison for Production Teams
# license_selector.py
# Help teams choose the right model based on license constraints
LICENSE_PROFILES = {
"deepseek-v3": {
"license": "MIT",
"commercial": True,
"mau_limit": None,
"output_training_ok": True,
"distillation_ok": True,
"risk_level": "minimal"
},
"llama-4-maverick": {
"license": "Llama Community License",
"commercial": True,
"mau_limit": 700_000_000,
"output_training_ok": False,
"distillation_ok": False, # into non-Llama models
"risk_level": "moderate"
},
"qwen3-235b": {
"license": "Qwen License",
"commercial": True,
"mau_limit": 100_000_000,
"output_training_ok": True, # no explicit prohibition
"distillation_ok": True,
"risk_level": "low"
},
"qwen3-32b": {
"license": "Apache 2.0",
"commercial": True,
"mau_limit": None,
"output_training_ok": True,
"distillation_ok": True,
"risk_level": "minimal"
}
}
def evaluate_license_fit(model_id: str, org_mau: int, needs_distillation: bool,
needs_output_training: bool) -> dict:
"""Evaluate if a model's license fits your deployment needs."""
profile = LICENSE_PROFILES.get(model_id)
if not profile:
return {"error": f"Unknown model: {model_id}"}
issues = []
if profile["mau_limit"] and org_mau > profile["mau_limit"]:
issues.append(
f"MAU ({org_mau:,}) exceeds limit ({profile['mau_limit']:,}). "
f"Commercial license required."
)
if needs_distillation and not profile["distillation_ok"]:
issues.append("Distillation into non-derivative models prohibited.")
if needs_output_training and not profile["output_training_ok"]:
issues.append("Using outputs to train other models prohibited.")
return {
"model": model_id,
"license": profile["license"],
"approved": len(issues) == 0,
"issues": issues,
"risk_level": profile["risk_level"]
}
# Example: Series B startup with 5M MAU wanting to distill
result = evaluate_license_fit("llama-4-maverick", org_mau=5_000_000,
needs_distillation=True, needs_output_training=False)
print(result)
# {'model': 'llama-4-maverick', 'license': 'Llama Community License',
# 'approved': False, 'issues': ['Distillation into non-derivative models prohibited.'],
# 'risk_level': 'moderate'}
EU AI Act and Open Source
The EU AI Act creates new obligations specifically for open-source AI models, effective 2025-2026. Understanding where open-source exemptions apply—and where they don't—is critical.
The GPAI Model Framework
Under Article 52, all General Purpose AI (GPAI) models must:
- Maintain and provide technical documentation (training methodology, evaluations)
- Provide information for downstream providers integrating the model
- Establish a policy to comply with EU copyright law
- Publish a sufficiently detailed summary of training data
The Open Source Exemption (Article 53(2))
The EU AI Act provides a limited exemption for open-source GPAI models:
Models made available under a free and open-source licence that provides access to the model architecture, model weights, and information about the training methodology... are exempt from the requirements of Article 53(1)(a), (b), and (c).
What the exemption covers:
- ✅ Reduced documentation requirements
- ✅ No mandatory downstream provider information package
- ✅ No formal copyright compliance policy
What the exemption does NOT cover:
- ❌ Must still publish training data summary (Article 53(1)(d))
- ❌ Must still comply with GPAI obligations if classified as "systemic risk"
- ❌ Deployers still bear full responsibility for their use of the model
The Systemic Risk Threshold
A GPAI model is classified as posing systemic risk if:
- Training compute exceeds 10^25 FLOPs (approximately GPT-4 class or higher)
- OR the European Commission designates it based on capabilities assessment
Models classified as systemic risk must—regardless of license:
- Perform model evaluations including adversarial testing
- Assess and mitigate systemic risks
- Track and report serious incidents
- Ensure adequate cybersecurity protections
Which open models hit the systemic risk threshold?
| Model | Estimated Training FLOPs | Systemic Risk? |
|---|---|---|
| Llama 4 Maverick (400B MoE) | ~10^25 | ⚠️ Borderline |
| Llama 4 Behemoth (2T MoE) | >10^26 | ✅ Yes |
| DeepSeek-V3 (671B MoE) | ~5×10^24 | ❌ Likely below |
| Qwen3-235B | ~3×10^24 | ❌ Likely below |
| Mistral Large 2 (123B) | ~10^24 | ❌ No |
What Counts as "Open Source" Under EU AI Act?
The Act defers to the OSI definition but adds criteria:
- Must use an OSI-approved license OR equivalent
- Must provide access to model weights
- Must provide information on model architecture
- Must provide information on training methodology
Critical implication: The Llama Community License is not OSI-approved (it has use restrictions). Therefore, Llama models may not qualify for the open-source exemption under the EU AI Act. Meta must comply with full GPAI obligations for EU deployment.
Compliance Decision Framework
Use this flowchart to determine your compliance path:
Quick Decision Matrix
| Your Situation | Recommended Models | Avoid |
|---|---|---|
| Startup, <1M MAU, need flexibility | DeepSeek-V3/R1 (MIT), Qwen3-32B (Apache 2.0) | RAIL models if use case is edge-case |
| Enterprise, >100M MAU | DeepSeek (MIT), negotiate Llama/Qwen commercial license | Any threshold-based license without agreement |
| Need to distill into custom model | DeepSeek (MIT), Apache 2.0 models only | Llama (output training prohibition) |
| EU deployment, high-risk context | Any model—but must comply with deployer obligations | Relying on open-source exemption for high-risk uses |
| Safety-critical domain | RAIL models (built-in safety framework) | Fully permissive models without your own safety layer |
Production Compliance Checklist
Use this checklist before deploying any open-weight model to production:
Phase 1: License Identification
- [ ] Download and read the complete license text (not just the summary)
- [ ] Identify the license family (permissive, community, RAIL, custom)
- [ ] Document all restrictions, thresholds, and obligations
- [ ] Check if the license is OSI-approved (matters for EU AI Act)
- [ ] Verify license applies to weights, code, AND outputs separately
Phase 2: Use Case Assessment
- [ ] Map your use case against any behavioral restrictions
- [ ] Calculate organization-wide MAU (all products, all affiliates)
- [ ] Determine if outputs will be used for training other models
- [ ] Identify if deployment context is "high-risk" under EU AI Act
- [ ] Check if model's training compute triggers systemic risk threshold
Phase 3: Technical Compliance
- [ ] Implement attribution requirements (NOTICE files, model cards)
- [ ] Set up MAU monitoring with alerts at 80% of threshold
- [ ] Ensure license propagation in redistributed/fine-tuned versions
- [ ] Document your fine-tuning data sources (EU AI Act requirement)
- [ ] Implement content safety filters if required by license
Phase 4: Ongoing Monitoring
- [ ] Schedule quarterly license review (licenses change with model updates)
- [ ] Monitor MAU growth against thresholds
- [ ] Track regulatory updates (EU AI Act enforcement guidance)
- [ ] Maintain audit trail of license compliance decisions
- [ ] Set up alerts for upstream license changes
Common Compliance Traps
Trap 1: The Synthetic Data Pipeline
Scenario: Your team uses Llama 4 to generate synthetic training data for your proprietary model.
The violation: Llama's output training prohibition explicitly covers this. Even if you don't copy model weights, using Llama-generated text to train a non-Llama model violates Section 4(b).
The fix: Use a MIT-licensed model (DeepSeek-R1) for synthetic data generation, or ensure your target model is a Llama derivative.
Trap 2: The Growth Surprise
Scenario: You deploy with Qwen3-235B at 10M MAU. Your product goes viral and hits 150M MAU within 3 months.
The violation: You exceeded the 100M MAU threshold without obtaining a commercial license.
The fix: Implement automated MAU monitoring with alerts at 50M, 75M, and 90M. Begin license negotiations proactively. Consider deploying Qwen3-32B (Apache 2.0) as a fallback.
Trap 3: The Affiliate Aggregation
Scenario: Your company has 200M MAU across 5 products. You deploy Llama in one product with 50M MAU, thinking you're below the 700M threshold.
The violation: None yet—but the threshold counts ALL products across the licensee and affiliates. If your parent company's total reaches 700M, you need a license for any Llama deployment.
The fix: Track aggregate MAU across your entire corporate structure, not just per-product.
Trap 4: The Downstream Propagation Failure
Scenario: You fine-tune a RAIL-licensed model, strip the license restrictions, and publish on Hugging Face as "Apache 2.0."
The violation: RAIL licenses explicitly require propagating behavioral use restrictions to all downstream recipients. Relicensing without restrictions violates the original terms.
The fix: Always include the RAIL Attachment A (use restrictions) with any redistributed derivative. Your model card must reference the original restrictions.
Trap 5: The EU Deployment Assumption
Scenario: You assume your MIT-licensed DeepSeek deployment is exempt from EU AI Act because it's "open source."
The reality: The open-source exemption reduces the MODEL PROVIDER's obligations. As a DEPLOYER in a high-risk context (hiring, credit, healthcare), you still bear full responsibility under Article 26, including risk management, data governance, transparency, and human oversight.
Best Practices
1. Maintain a License Bill of Materials (LBOM)
Just as you maintain a software bill of materials (SBOM) for dependencies, create a dedicated license tracking document for every AI model in your stack:
# ai-license-bom.yaml
models:
- id: "deepseek-r1-distilled-qwen-32b"
license: "MIT"
source: "https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
commercial_use: true
restrictions: none
mau_threshold: null
last_reviewed: "2026-06-01"
deployed_in: ["reasoning-service", "code-review-agent"]
- id: "llama-4-maverick-17b-128e"
license: "Llama Community License"
source: "https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E"
commercial_use: true
restrictions:
- "MAU < 700M"
- "No output training for non-Llama models"
- "Must comply with Meta Acceptable Use Policy"
mau_threshold: 700000000
last_reviewed: "2026-06-01"
deployed_in: ["customer-chatbot"]
2. Implement Automated License Scanning in CI/CD
# .github/workflows/license-check.yml equivalent
import subprocess
import yaml
def check_model_licenses():
"""Run as part of CI/CD before model deployment."""
with open("ai-license-bom.yaml") as f:
bom = yaml.safe_load(f)
current_mau = get_org_mau() # Your MAU tracking API
for model in bom["models"]:
if model.get("mau_threshold"):
if current_mau > model["mau_threshold"] * 0.8:
raise Exception(
f"WARNING: MAU ({current_mau:,}) approaching threshold "
f"({model['mau_threshold']:,}) for {model['id']}"
)
print("✅ All model license checks passed")
3. Separate Model Selection from Product Decisions
Establish a lightweight "model governance" review for any new model adoption:
- Legal review of license terms (once per license type, not per model)
- Engineering review of compliance mechanisms needed
- Quarterly audit of deployed models vs. current license terms
4. Use Permissive Models for Sensitive Pipelines
For any pipeline where you might:
- Generate training data
- Distill knowledge
- Create derivative models
- Exceed user thresholds
...default to MIT/Apache 2.0 models (DeepSeek, OLMo, small Qwen variants). Reserve restricted-license models for direct user-facing inference only.
5. Document Everything for EU AI Act Compliance
Even if your model is MIT-licensed, if you deploy in the EU:
- Maintain records of fine-tuning data sources
- Document evaluation results and known limitations
- Implement transparency notices when users interact with AI
- Keep records of risk assessments for high-risk deployments
FAQ
Can I fine-tune a RAIL model and release it under MIT?
No. RAIL licenses require that behavioral use restrictions propagate to all derivative works. If you fine-tune BLOOM or Stable Diffusion, your derivative must include the same (or more restrictive) use limitations. You can add restrictions but cannot remove them.
What happens if my organization crosses Llama's 700M MAU threshold?
You must contact Meta to negotiate a commercial license. There is no automatic grace period in the license text. However, practically, Meta has engaged in commercial discussions rather than immediately pursuing litigation. The smart move is to initiate contact at ~500M MAU to have an agreement in place before crossing the threshold.
Does using a model's API (not weights) trigger the same license?
Generally no. If you call DeepSeek's API, you're subject to their API Terms of Service, not the model's MIT license. The MIT license applies to the weights themselves. However, some API ToS may have their own restrictions on output usage. Always read both documents.
Can I use open-source models in the EU's "high-risk" categories?
Yes, but the open-source exemption does NOT apply to the deployer's obligations. If you deploy any model (regardless of license) in high-risk contexts like hiring, healthcare, or law enforcement, you must comply with the full Article 26 deployer requirements: risk management, data quality, transparency, human oversight, and accuracy monitoring.
How do I handle models with unclear licenses?
Some models on Hugging Face have missing, conflicting, or ambiguous license information. Best practice:
- Check the model card, README, and LICENSE file in the repo
- Check the parent model's license (fine-tunes inherit restrictions)
- If genuinely unclear, contact the model author
- Default to the most restrictive interpretation until clarified
- Never assume permissive—document your license determination
Summary
The 2026 open-source AI licensing landscape demands active compliance management, not passive assumptions. The key principles:
- Read the actual license, not the Hugging Face tag. A model labeled "open" may have significant commercial restrictions.
- Track your scale: MAU-based thresholds turn free models into licensed ones as you grow.
- Separate your pipelines: Use permissive models for training data generation and model development. Use restricted models only for direct inference.
- Prepare for EU AI Act: The open-source exemption is narrower than most teams assume. Deployer obligations apply regardless of model license.
- Automate compliance: Build license checks into your CI/CD pipeline and model deployment process.
The good news: genuinely permissive options exist. DeepSeek's MIT-licensed frontier models and Apache 2.0 alternatives from Qwen provide a compliance-minimal path for teams that want maximum flexibility. Choose wisely, monitor continuously, and when in doubt—ask a lawyer.
Related Resources
Further Reading on QubitTool
- LLM Landscape May 2026: DeepSeek vs Qwen vs Llama Comparison — Comprehensive technical comparison of the models discussed in this guide
- EU AI Act Compliance: Developer Safety Checklist — Detailed engineering implementation guide for EU AI Act compliance
Related Glossary Terms
- Generative AI — Understanding the broader model category behind open-weight systems
- LLM (Large Language Model) — Technical foundations of the models covered in this guide
External Resources
- Open Source AI Definition (OSAID) v1.0 — OSI's official definition
- EU AI Act Full Text (Regulation 2024/1689) — Official regulatory text
- RAIL Initiative — Home of the Responsible AI License framework
- Llama Community License Agreement — Meta's official license text
Useful Tools
- QubitTool JSON Formatter — Format and validate your license BOM YAML/JSON files
- QubitTool Text Diff — Compare license versions when models update their terms