TL;DR

The EU AI Act (Regulation 2024/1689) is the world's first comprehensive AI law, and its high-risk AI system obligations take effect on August 2, 2026. If your AI product serves users in the EU—even from servers in Singapore or Virginia—you are in scope. This article provides a concrete engineering checklist: how to classify your system's risk tier, build compliant audit logging, implement bias testing pipelines, generate Annex IV technical documentation, and prepare for conformity assessment. With penalties up to €35M or 7% of global annual turnover, this is not a "nice-to-have"—it is a build-or-block decision for any AI product going global.

Table of Contents

Key Takeaways

  • Long-arm jurisdiction: The EU AI Act applies to any AI system whose output is used within the EU, regardless of where the company or servers are located.
  • Four-tier risk: Systems are classified as Unacceptable, High-risk, Limited-risk, or Minimal-risk—each tier carries different obligations.
  • Annex IV documentation: High-risk systems require 9 categories of technical documentation, from training data lineage to performance metrics.
  • Audit logging is non-negotiable: Article 12 mandates automatic event logging with traceability across the entire AI lifecycle.
  • August 2, 2026 deadline: High-risk system obligations are enforceable in roughly 11 weeks. Start building infrastructure now.

Related reading: If you are building [LLM](/glossary/llm)-powered products, review our earlier guide on LLM Guardrails Engineering for runtime safety patterns that complement compliance infrastructure.


Why This Matters Now

The EU AI Act is not a distant regulation. Here is the current enforcement state as of May 2026:

Milestone Date Status
Regulation enters into force Aug 1, 2024 Done
Prohibited practices banned Feb 2, 2025 Enforceable
AI literacy obligations Feb 2, 2025 Enforceable
GPAI model obligations Aug 2, 2025 Enforceable
High-risk AI system obligations Aug 2, 2026 11 weeks away
Product-embedded AI systems Aug 2, 2027 Upcoming

The "long-arm jurisdiction" clause (Article 2) is critical for developers building products for global markets. If you run an AI-powered SaaS from Tokyo, deploy models on AWS us-east-1, or offer an API consumed by EU customers—you are in scope. This mirrors GDPR's extraterritorial reach, and regulators have shown they enforce it. For a deeper look at how GDPR intersects with AI memory systems, see The Privacy Dilemma of AI Agents.

The penalty structure is designed to hurt:

  • Prohibited practices: Up to €35M or 7% of global annual turnover
  • High-risk non-compliance: Up to €15M or 3% of global annual turnover
  • Incorrect information to authorities: Up to €7.5M or 1% of global annual turnover

Enforcement Timeline

The following Mermaid Gantt chart shows the full enforcement timeline. The critical insight: multiple obligations are already active, and the high-risk deadline is imminent.

gantt title EU AI Act Enforcement Timeline dateFormat YYYY-MM-DD axisFormat %b %Y section Already Active Prohibited Practices Banned :done, pp, 2025-02-02, 2025-02-02 AI Literacy Obligations :done, al, 2025-02-02, 2025-02-02 GPAI Model Obligations :done, gp, 2025-08-02, 2025-08-02 section Upcoming Deadlines High-Risk System Obligations :crit, hr, 2026-08-02, 2026-08-02 Codes of Practice Finalized :active, cp, 2026-05-02, 2026-08-02 Product-Embedded AI Systems :pe, 2027-08-02, 2027-08-02 Full Enforcement - All Provisions :fe, 2027-08-02, 2027-08-02

Risk Classification Decision Tree

The foundation of EU AI Act compliance is determining your system's risk tier. This decision tree codifies the classification logic from Articles 5, 6, and Annex III.

flowchart TD A["AI System Assessment"] --> B{"Does it use subliminal - manipulative - or exploitative techniques?"} B -- Yes --> C["UNACCEPTABLE RISK - Article 5 - System is PROHIBITED"] B -- No --> D{"Does it perform real-time biometric identification in public spaces?"} D -- Yes --> C D -- No --> E{"Does it perform social scoring by public authorities?"} E -- Yes --> C E -- No --> F{"Is it listed in Annex III high-risk categories?"} F -- Yes --> G{"Does it perform profiling of natural persons?"} G -- Yes --> H["HIGH-RISK - Full compliance required"] G -- No --> I{"Is it a safety component of a regulated product?"} I -- Yes --> H I -- No --> J{"Significant risk to health - safety - or fundamental rights?"} J -- Yes --> H J -- No --> K{"Does it interact directly with natural persons?"} K -- Yes --> L["LIMITED RISK - Transparency obligations only"] K -- No --> M["MINIMAL RISK - No mandatory obligations"] style C fill:#ff6b6b,color:#fff style H fill:#ffa94d,color:#fff style L fill:#74c0fc,color:#fff style M fill:#69db7c,color:#fff

Annex III high-risk categories include AI systems used in:

  1. Biometric identification and categorization
  2. Critical infrastructure management
  3. Education and vocational training access
  4. Employment, worker management, and recruitment
  5. Access to essential services (credit scoring, insurance)
  6. Law enforcement
  7. Migration, asylum, and border control
  8. Administration of justice and democratic processes

If your product uses prompt engineering to build interview screening tools, resume parsers, or credit risk assessors—you are almost certainly in the high-risk category.


Engineering Checklist Overview

For high-risk AI systems, the EU AI Act mandates a comprehensive compliance framework. Here is the engineering checklist distilled from Articles 8-15 and Annex IV:

Requirement Article Engineering Deliverable
Risk Management System Art. 9 Continuous risk identification and mitigation pipeline
Data Governance Art. 10 Training data lineage, bias analysis, data quality metrics
Technical Documentation Art. 11 + Annex IV 9-category documentation package
Automatic Logging Art. 12 Immutable audit trail for all system events
Transparency Art. 13 User-facing documentation and instructions for use
Human Oversight Art. 14 Kill switches, override mechanisms, confidence thresholds
Accuracy and Robustness Art. 15 Bias testing, adversarial testing, performance benchmarks
Conformity Assessment Art. 43 Self-assessment or third-party audit + CE marking
Post-Market Monitoring Art. 72 Continuous performance monitoring after deployment

The following sections implement each requirement with code.


Audit Logging Infrastructure

Article 12 requires that high-risk AI systems "shall be designed and developed with capabilities enabling the automatic recording of events (logs)." These logs must be traceable, immutable, and retained for an appropriate period.

Logging Middleware in TypeScript

The following middleware captures every AI inference request with the metadata required by Article 12:

typescript
import { v4 as uuidv4 } from "uuid";
import crypto from "crypto";

interface AuditLogEntry {
  traceId: string;
  timestamp: string;
  systemId: string;
  systemVersion: string;
  inputHash: string;
  outputHash: string;
  modelId: string;
  modelVersion: string;
  userId: string;
  riskTier: "high" | "limited" | "minimal";
  humanOversightTriggered: boolean;
  confidenceScore: number;
  processingTimeMs: number;
  geolocation: string;
  dataRetentionDays: number;
}

function createAuditLogEntry(
  request: AIRequest,
  response: AIResponse,
  metadata: SystemMetadata
): AuditLogEntry {
  return {
    traceId: uuidv4(),
    timestamp: new Date().toISOString(),
    systemId: metadata.systemId,
    systemVersion: metadata.version,
    inputHash: crypto
      .createHash("sha256")
      .update(JSON.stringify(request.input))
      .digest("hex"),
    outputHash: crypto
      .createHash("sha256")
      .update(JSON.stringify(response.output))
      .digest("hex"),
    modelId: metadata.modelId,
    modelVersion: metadata.modelVersion,
    userId: request.userId,
    riskTier: metadata.riskTier,
    humanOversightTriggered: response.confidenceScore < metadata.confidenceThreshold,
    confidenceScore: response.confidenceScore,
    processingTimeMs: response.processingTimeMs,
    geolocation: request.geolocation ?? "unknown",
    dataRetentionDays: 365 * 5,
  };
}

Use tools like UUID Generator to verify trace ID formats during development, and Hash Generator to validate your SHA-256 hashing implementation against known test vectors.

Immutable Log Storage

Logs must be tamper-proof. A common pattern is append-only storage with cryptographic chaining:

typescript
import crypto from "crypto";

interface ChainedLogBlock {
  sequence: number;
  previousHash: string;
  entry: AuditLogEntry;
  blockHash: string;
}

function appendToChain(
  chain: ChainedLogBlock[],
  entry: AuditLogEntry
): ChainedLogBlock {
  const previousHash =
    chain.length > 0 ? chain[chain.length - 1].blockHash : "GENESIS";
  const payload = JSON.stringify({ previousHash, entry });
  const blockHash = crypto
    .createHash("sha256")
    .update(payload)
    .digest("hex");

  const block: ChainedLogBlock = {
    sequence: chain.length,
    previousHash,
    entry,
    blockHash,
  };
  chain.push(block);
  return block;
}

This blockchain-like pattern ensures that any retroactive tampering with audit logs is cryptographically detectable—a key requirement for regulatory audits.


Bias Testing Pipeline

Article 10 (Data Governance) and Article 15 (Accuracy, Robustness, and Cybersecurity) together require that high-risk AI systems be tested for biases across protected demographic groups. This is not a one-time check—it must be part of your CI/CD pipeline.

Bias Metrics Computation in Python

python
from dataclasses import dataclass
from typing import Dict, List
import numpy as np


@dataclass
class BiasReport:
    metric: str
    group_a: str
    group_b: str
    group_a_rate: float
    group_b_rate: float
    disparity_ratio: float
    passes_threshold: bool
    threshold: float


def compute_demographic_parity(
    predictions: List[int],
    sensitive_attr: List[str],
    positive_label: int = 1,
    threshold: float = 0.8,
) -> List[BiasReport]:
    groups: Dict[str, List[int]] = {}
    for pred, attr in zip(predictions, sensitive_attr):
        groups.setdefault(attr, []).append(pred)

    group_rates = {
        g: np.mean([p == positive_label for p in preds])
        for g, preds in groups.items()
    }

    reports = []
    group_names = sorted(group_rates.keys())
    for i, g_a in enumerate(group_names):
        for g_b in group_names[i + 1 :]:
            rate_a = group_rates[g_a]
            rate_b = group_rates[g_b]
            ratio = min(rate_a, rate_b) / max(rate_a, rate_b) if max(rate_a, rate_b) > 0 else 1.0
            reports.append(
                BiasReport(
                    metric="demographic_parity",
                    group_a=g_a,
                    group_b=g_b,
                    group_a_rate=round(rate_a, 4),
                    group_b_rate=round(rate_b, 4),
                    disparity_ratio=round(ratio, 4),
                    passes_threshold=ratio >= threshold,
                    threshold=threshold,
                )
            )
    return reports

CI/CD Integration

Add bias testing as a required pipeline step. Here is a GitHub Actions workflow:

yaml
name: EU AI Act Bias Testing
on:
  pull_request:
    paths:
      - "models/**"
      - "training/**"
      - "config/model-*.yaml"

jobs:
  bias-audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install -r requirements-bias-testing.txt

      - name: Run demographic parity tests
        run: |
          python -m pytest tests/bias/ \
            --tb=short \
            --junitxml=reports/bias-report.xml \
            -v

      - name: Run equalized odds tests
        run: |
          python scripts/bias_audit.py \
            --model-path models/latest/ \
            --test-data data/bias-test-set.csv \
            --threshold 0.8 \
            --output reports/bias-audit.json

      - name: Upload bias report
        uses: actions/upload-artifact@v4
        with:
          name: bias-audit-report
          path: reports/

      - name: Fail on bias threshold violation
        run: |
          python scripts/check_bias_results.py \
            --report reports/bias-audit.json \
            --fail-on-violation

This pipeline ensures that no model update ships without passing bias thresholds. For teams building RAG-based systems, bias testing should also cover retrieval fairness—whether the vector search returns equitable results across demographic groups. See our guide on Harness Engineering Practical Guide for evaluation framework patterns.


Technical Documentation Generator

Annex IV of the EU AI Act specifies 9 categories of technical documentation that must be maintained for high-risk systems. Here is a generator that produces the skeleton:

typescript
interface AnnexIVDocumentation {
  generalDescription: {
    intendedPurpose: string;
    systemArchitecture: string;
    interactionWithHardware: string;
    versionsOfRelevantSoftware: string[];
  };
  detailedDescription: {
    developmentMethodology: string;
    designSpecifications: string;
    systemArchitectureDiagram: string;
    computationalResources: string;
  };
  dataGovernance: {
    trainingDataDescription: string;
    dataSources: string[];
    dataPreparationSteps: string[];
    dataLabelingMethodology: string;
    dataBiasAssessment: string;
  };
  monitoringAndTesting: {
    performanceMetrics: Record<string, number>;
    testingProcedures: string[];
    validationDatasets: string[];
    knownLimitations: string[];
  };
  riskManagement: {
    identifiedRisks: RiskEntry[];
    mitigationMeasures: string[];
    residualRisks: string[];
  };
  humanOversight: {
    oversightMeasures: string[];
    interfaceDescription: string;
    overrideCapabilities: string[];
  };
  accuracyAndRobustness: {
    accuracyMetrics: Record<string, number>;
    robustnessTests: string[];
    cybersecurityMeasures: string[];
  };
  changeLog: ChangeEntry[];
  previousVersions: string[];
}

interface RiskEntry {
  riskId: string;
  description: string;
  likelihood: "low" | "medium" | "high";
  impact: "low" | "medium" | "high";
  mitigation: string;
  status: "open" | "mitigated" | "accepted";
}

function generateAnnexIVSkeleton(
  systemId: string,
  systemName: string
): AnnexIVDocumentation {
  return {
    generalDescription: {
      intendedPurpose: `[REQUIRED] Describe the intended purpose of ${systemName}`,
      systemArchitecture: "[REQUIRED] High-level architecture overview",
      interactionWithHardware: "[REQUIRED] Hardware dependencies and requirements",
      versionsOfRelevantSoftware: ["[REQUIRED] List all software dependencies"],
    },
    detailedDescription: {
      developmentMethodology: "[REQUIRED] Training methodology and approach",
      designSpecifications: "[REQUIRED] Model architecture specifications",
      systemArchitectureDiagram: "[REQUIRED] Link to architecture diagram",
      computationalResources: "[REQUIRED] GPU/TPU requirements and training cost",
    },
    dataGovernance: {
      trainingDataDescription: "[REQUIRED] Statistical summary of training data",
      dataSources: ["[REQUIRED] List all training data sources with provenance"],
      dataPreparationSteps: ["[REQUIRED] Data cleaning and preprocessing pipeline"],
      dataLabelingMethodology: "[REQUIRED] Annotation guidelines and inter-annotator agreement",
      dataBiasAssessment: "[REQUIRED] Demographic bias analysis results",
    },
    monitoringAndTesting: {
      performanceMetrics: {},
      testingProcedures: ["[REQUIRED] Unit tests, integration tests, adversarial tests"],
      validationDatasets: ["[REQUIRED] Hold-out validation set descriptions"],
      knownLimitations: ["[REQUIRED] Known failure modes and edge cases"],
    },
    riskManagement: {
      identifiedRisks: [],
      mitigationMeasures: [],
      residualRisks: [],
    },
    humanOversight: {
      oversightMeasures: ["[REQUIRED] Human review triggers and procedures"],
      interfaceDescription: "[REQUIRED] Operator dashboard and override UI",
      overrideCapabilities: ["[REQUIRED] Kill switch, confidence gating, manual approval"],
    },
    accuracyAndRobustness: {
      accuracyMetrics: {},
      robustnessTests: ["[REQUIRED] Adversarial input testing results"],
      cybersecurityMeasures: ["[REQUIRED] Model extraction, data poisoning protections"],
    },
    changeLog: [],
    previousVersions: [],
  };
}

You can validate the generated JSON structure with a JSON Formatter to ensure it is well-formed before submitting to your compliance management system.


Human-in-the-Loop Override Systems

Article 14 mandates that high-risk AI systems include human oversight mechanisms. This is not a suggestion—it is a technical requirement that must be architecturally enforced.

Confidence-Gated Override Pattern

The most practical pattern is confidence gating: when the model's confidence falls below a threshold, the decision is automatically routed to a human operator.

python
from enum import Enum
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional


class DecisionRoute(Enum):
    AUTO_APPROVE = "auto_approve"
    HUMAN_REVIEW = "human_review"
    AUTO_REJECT = "auto_reject"


@dataclass
class OversightDecision:
    request_id: str
    model_output: str
    confidence: float
    route: DecisionRoute
    human_reviewer: Optional[str] = None
    human_decision: Optional[str] = None
    human_rationale: Optional[str] = None
    override_timestamp: Optional[str] = None
    audit_fields: dict = field(default_factory=dict)


def route_decision(
    request_id: str,
    model_output: str,
    confidence: float,
    auto_approve_threshold: float = 0.95,
    auto_reject_threshold: float = 0.3,
) -> OversightDecision:
    if confidence >= auto_approve_threshold:
        route = DecisionRoute.AUTO_APPROVE
    elif confidence <= auto_reject_threshold:
        route = DecisionRoute.AUTO_REJECT
    else:
        route = DecisionRoute.HUMAN_REVIEW

    return OversightDecision(
        request_id=request_id,
        model_output=model_output,
        confidence=confidence,
        route=route,
        audit_fields={
            "routed_at": datetime.utcnow().isoformat(),
            "auto_approve_threshold": auto_approve_threshold,
            "auto_reject_threshold": auto_reject_threshold,
        },
    )

Kill Switch Implementation

Article 14(4)(e) explicitly requires the ability to "interrupt the operation of the high-risk AI system through a stop button." Here is a minimal implementation:

typescript
interface KillSwitchState {
  systemId: string;
  isActive: boolean;
  deactivatedBy: string | null;
  deactivatedAt: string | null;
  reason: string | null;
  fallbackBehavior: "reject_all" | "human_only" | "cached_responses";
}

class AIKillSwitch {
  private state: KillSwitchState;

  constructor(systemId: string) {
    this.state = {
      systemId,
      isActive: true,
      deactivatedBy: null,
      deactivatedAt: null,
      reason: null,
      fallbackBehavior: "reject_all",
    };
  }

  deactivate(operatorId: string, reason: string): void {
    this.state.isActive = false;
    this.state.deactivatedBy = operatorId;
    this.state.deactivatedAt = new Date().toISOString();
    this.state.reason = reason;
    this.emitAuditEvent("SYSTEM_DEACTIVATED", this.state);
  }

  checkActive(): boolean {
    return this.state.isActive;
  }

  private emitAuditEvent(eventType: string, payload: unknown): void {
    console.log(JSON.stringify({ eventType, payload, timestamp: new Date().toISOString() }));
  }
}

These patterns are directly related to the jailbreak defense mechanisms covered in LLM Jailbreak Analysis and Defense—when a jailbreak is detected, the kill switch can immediately halt the compromised system.


Fundamental Rights Impact Assessment

Article 27 requires deployers of high-risk AI systems to conduct a Fundamental Rights Impact Assessment (FRIA) before deployment. This is similar to a Data Protection Impact Assessment (DPIA) under GDPR but focused on AI-specific fundamental rights.

FRIA Template Structure

yaml
fria:
  metadata:
    system_name: "AI Resume Screening System v2.1"
    system_id: "HRS-2026-001"
    deployer: "TechCorp International"
    assessment_date: "2026-05-16"
    assessor: "Compliance Engineering Team"
    review_frequency: "quarterly"

  scope:
    intended_purpose: "Automated screening of job applications"
    target_population: "Job applicants in EU member states"
    geographic_scope: ["DE", "FR", "NL", "ES", "IT"]
    estimated_affected_persons: 50000
    deployment_timeline: "2026-Q3"

  rights_impact_analysis:
    - right: "Non-discrimination (Article 21 EU Charter)"
      risk_level: "high"
      description: "Model may exhibit bias based on gender or ethnicity in resume language"
      affected_groups: ["gender_minorities", "ethnic_minorities", "age_groups"]
      mitigation:
        - "Demographic parity testing with 0.8 threshold"
        - "Blind resume processing - PII stripped before model inference"
        - "Quarterly bias audit by external assessor"
      residual_risk: "medium"

    - right: "Right to an effective remedy (Article 47 EU Charter)"
      risk_level: "medium"
      description: "Rejected candidates must be able to contest AI-assisted decisions"
      affected_groups: ["all_applicants"]
      mitigation:
        - "Human review mandatory for all rejections"
        - "Explainability report generated per decision"
        - "Appeals process with 30-day SLA"
      residual_risk: "low"

    - right: "Right to privacy (Article 7 EU Charter)"
      risk_level: "medium"
      description: "Processing of personal data in resumes and application materials"
      affected_groups: ["all_applicants"]
      mitigation:
        - "Data minimization - only relevant fields extracted"
        - "Auto-deletion after 180 days per GDPR Article 17"
        - "Encryption at rest and in transit"
      residual_risk: "low"

  oversight_measures:
    human_review_trigger: "All negative decisions"
    operator_training: "40-hour AI oversight certification"
    escalation_path: "Operator > Team Lead > DPO > Legal"

  conclusion:
    overall_risk: "medium"
    deployment_approved: true
    conditions: "Subject to quarterly bias audit and annual FRIA review"

You can validate YAML configuration files like this with YAML to JSON to ensure structural correctness, or use Text Diff to compare FRIA versions across quarterly reviews.


Conformity Assessment and CE Marking

Article 43 defines the conformity assessment procedure. For most high-risk AI systems, this is a self-assessment based on internal checks. However, some categories (biometric identification, critical infrastructure) require third-party assessment by a Notified Body.

Self-Assessment Decision Logic

typescript
type AssessmentRoute = "self_assessment" | "notified_body_required";

interface ConformityResult {
  systemId: string;
  route: AssessmentRoute;
  annexIVComplete: boolean;
  riskManagementComplete: boolean;
  biasTestingPassed: boolean;
  humanOversightImplemented: boolean;
  auditLoggingActive: boolean;
  ceMarkingEligible: boolean;
  blockers: string[];
}

function evaluateConformityReadiness(
  systemId: string,
  category: string,
  checks: ComplianceChecks
): ConformityResult {
  const needsNotifiedBody =
    category === "biometric_identification" ||
    category === "critical_infrastructure";

  const blockers: string[] = [];
  if (!checks.annexIVDocumentation) blockers.push("Annex IV documentation incomplete");
  if (!checks.riskManagementSystem) blockers.push("Risk management system not established");
  if (!checks.biasTestingPassed) blockers.push("Bias testing thresholds not met");
  if (!checks.humanOversightMechanism) blockers.push("Human oversight not implemented");
  if (!checks.auditLogging) blockers.push("Audit logging not active");
  if (!checks.friaCompleted) blockers.push("FRIA not completed");

  return {
    systemId,
    route: needsNotifiedBody ? "notified_body_required" : "self_assessment",
    annexIVComplete: checks.annexIVDocumentation,
    riskManagementComplete: checks.riskManagementSystem,
    biasTestingPassed: checks.biasTestingPassed,
    humanOversightImplemented: checks.humanOversightMechanism,
    auditLoggingActive: checks.auditLogging,
    ceMarkingEligible: blockers.length === 0,
    blockers,
  };
}

The output of this function maps directly to the EU Declaration of Conformity. Once all blockers are resolved and the assessment passes, the system is eligible for CE marking—the regulatory stamp that permits placing the product on the EU market.


AI Literacy Program

Article 4 introduced AI literacy obligations effective February 2, 2025—this is already enforceable. Every provider and deployer must ensure their staff and operators have "sufficient AI literacy" appropriate to the context.

What AI Literacy Means in Practice

This is not about making everyone an ML engineer. It means:

  1. Operators understand the system's capabilities, limitations, and known failure modes
  2. Developers understand the regulatory obligations their code must satisfy
  3. Business stakeholders understand the risk classification and its implications
  4. End users are informed when they interact with an AI system (transparency obligation)

Minimum Training Curriculum

Role Required Knowledge Hours
ML Engineers Full Annex IV documentation, bias testing, logging requirements 16
Backend Engineers Audit logging, kill switch, human oversight API integration 8
Product Managers Risk classification, FRIA process, post-market monitoring 8
Operators System limitations, override procedures, escalation protocols 12
Customer Support Transparency scripts, user rights, complaint handling 4

For engineering teams working with AI Agents, literacy training should also cover autonomous decision-making risks and the additional oversight requirements for agentic systems. Our guide on Harness Engineering provides a framework for evaluating and constraining agent behavior.


Post-Market Monitoring

Article 72 requires a post-market monitoring system proportionate to the nature and risks of the AI system. This must be established before deployment and maintained throughout the system's lifecycle.

Monitoring Dashboard Schema

yaml
post_market_monitoring:
  system_id: "HRS-2026-001"
  monitoring_plan:
    frequency: "continuous"
    review_cadence: "monthly"
    responsible_team: "ML Platform - Compliance"

  metrics:
    performance:
      - name: "accuracy"
        baseline: 0.94
        threshold: 0.90
        current: null
        alert_on: "below_threshold"
      - name: "false_positive_rate"
        baseline: 0.03
        threshold: 0.05
        current: null
        alert_on: "above_threshold"

    fairness:
      - name: "demographic_parity_ratio"
        baseline: 0.92
        threshold: 0.80
        current: null
        alert_on: "below_threshold"
        protected_attributes: ["gender", "ethnicity", "age_group"]

    operational:
      - name: "human_override_rate"
        baseline: 0.08
        alert_on: "above_0.15_or_below_0.02"
        description: "Anomalous rates may indicate model drift or miscalibrated thresholds"
      - name: "mean_confidence_score"
        baseline: 0.87
        alert_on: "below_0.75"

  incident_reporting:
    serious_incident_sla: "72 hours to national authority"
    internal_escalation: "24 hours to compliance team"
    root_cause_analysis: "required within 30 days"

  data_retention:
    audit_logs: "5 years"
    monitoring_reports: "10 years"
    incident_records: "10 years"

When model performance degrades or bias metrics cross thresholds, the monitoring system must trigger an incident report. Article 73 requires that serious incidents be reported to the relevant national authority within 72 hours—similar to GDPR's breach notification timeline.

For teams using vector databases in production RAG systems, post-market monitoring should also track retrieval quality metrics and embedding drift. If your retrieval pipeline degrades, the downstream AI system's compliance posture is directly affected.


Compliance Tip: When implementing compliance logging, you will likely store audit trails in JSON format. Use a JSON Formatter to ensure your audit logs are properly structured and easily readable during compliance reviews.

Further Reading

FAQ

Does the EU AI Act apply to companies outside the EU?

Yes. The EU AI Act has "long-arm jurisdiction" similar to GDPR. If the output of your AI system is used within the EU—regardless of where your company is headquartered or where the model is hosted—you are subject to the regulation. This applies to SaaS products, API services, and embedded AI features alike.

What is the difference between high-risk and limited-risk AI systems?

High-risk systems (Annex III) include AI used in employment, credit scoring, law enforcement, and critical infrastructure. They require full conformity assessment, technical documentation, and CE marking. Limited-risk systems (e.g., chatbots, deepfake generators) only need transparency obligations—users must be informed they are interacting with AI.

How much does EU AI Act non-compliance cost?

Penalties scale by violation type: up to €35M or 7% of global annual turnover for prohibited practices, up to €15M or 3% for high-risk non-compliance, and up to €7.5M or 1% for providing incorrect information to authorities. For SMEs, the lower percentage amount always applies.

When should I start preparing for EU AI Act compliance?

Immediately. Prohibited AI practices and AI literacy obligations are already enforceable since February 2, 2025. GPAI model obligations took effect August 2, 2025. The major deadline for high-risk AI systems is August 2, 2026—roughly 11 weeks away. Building audit infrastructure, bias testing pipelines, and documentation systems takes months.

Can I use open-source models and still be compliant?

Yes, but with caveats. Open-source GPAI models have reduced obligations under Article 53, but if you fine-tune or deploy them in a high-risk context, you inherit the full deployer obligations. You must still maintain technical documentation covering your fine-tuning data, evaluation results, and risk mitigations.


Summary

The EU AI Act is not abstract policy—it is a concrete set of engineering requirements with hard deadlines and significant penalties. For any AI product team targeting the EU market, the compliance checklist comes down to six infrastructure pillars:

  1. Risk classification: Know your tier. Use the decision tree above and document your reasoning.
  2. Audit logging: Implement immutable, traceable logging from day one. Article 12 is non-negotiable.
  3. Bias testing: Integrate demographic parity and equalized odds testing into your CI/CD pipeline.
  4. Human oversight: Build confidence-gated routing and kill switches into your inference architecture.
  5. Technical documentation: Generate and maintain Annex IV documentation as living artifacts, not one-time PDFs.
  6. Post-market monitoring: Deploy continuous monitoring with automated alerting and incident response SLAs.

The August 2, 2026 deadline is 11 weeks away. The good news: most of these requirements align with what responsible AI engineering teams should be building anyway. The difference is that now it is law.

For deeper exploration of the security and evaluation patterns that underpin compliance, continue with the Harness Engineering series—particularly the guides on LLM jailbreak defense and AI agent privacy.