Live · DACH ops
03:47 · QR-2 · Sektor B · 0 anomalies04:03 · QR-7 · Gate 4 · handover ack04:11 · QR-2 · Sektor B · patrol complete · 4.2 km04:14 · Filderstadt · ops ack · all green04:22 · QR-12 · Stuttgart-W · charge cycle 84%04:30 · QR-3 · Karlsruhe · perimeter sweep · pass 3/404:38 · QR-9 · Wien-N · weather check · IP65 nominal04:45 · QR-2 · Sektor B · thermal hit reviewed · benign04:52 · QR-15 · Zürich-O · escalation queue · empty05:00 · all units · shift turnover · zero incidents03:47 · QR-2 · Sektor B · 0 anomalies04:03 · QR-7 · Gate 4 · handover ack04:11 · QR-2 · Sektor B · patrol complete · 4.2 km04:14 · Filderstadt · ops ack · all green04:22 · QR-12 · Stuttgart-W · charge cycle 84%04:30 · QR-3 · Karlsruhe · perimeter sweep · pass 3/404:38 · QR-9 · Wien-N · weather check · IP65 nominal04:45 · QR-2 · Sektor B · thermal hit reviewed · benign04:52 · QR-15 · Zürich-O · escalation queue · empty05:00 · all units · shift turnover · zero incidents
← All articles
robotik

AI Bias in Security: Audit Duty for KRITIS Operators

AI bias in patrol robots: how plant managers test person detection for fairness, meet the EU AI Act and avoid fines.

Dr. Raphael Nagel (LL.M.)
Investor & Author · Founding Partner
Follow on LinkedIn

Bias in person detection is not an academic problem. It produces missed intruders, escalated false alarms against the operator's own workforce, and fine risks under the EU AI Act and NIS-2. Plant managers and security leads at KRITIS sites must measure bias, document it, and correct it verifiably. This text describes the damage classes, the legal framework, the audit procedure, and the TCO consequences.

AI Bias in Security: Definition and Operational Damage

AI bias has three sources: unbalanced training data, incomplete sensor coverage, and faulty labeling pipelines. A detection model trained mostly on Central European inner-city datasets performs poorly at a port terminal at night.

The operational damage splits into three classes. False negatives mean missed intruders at the perimeter. That is the most expensive failure mode. False positives produce unnecessary alerts and erode the guard team's trust in the system. Drift describes the slow degradation of model quality after months of real-world operation, often unnoticed.

Bias is measurable. A confusion matrix per demographic or environmental cohort yields hard numbers: true positive rate, false positive rate, accuracy. Operators who do not collect these values cannot demonstrate conformity under the EU AI Act.

Responsibility: as soon as the operator continues to train the model on site or feeds in sensor data for fine-tuning, the operator assumes provider duties. This is set out explicitly in EU AI Act Art. 25. The manufacturer no longer carries sole liability.

Typical Distortions in Person Detection

Skin tones: RGB-based detectors show up to 12 percent higher error rates on darker skin tones under low light in published benchmarks. [insert source] This is not a design flaw, it is the consequence of unbalanced training corpora.

Body size: models trained exclusively on adult datasets systematically miss persons under 1.50 meters. Relevant for sites with external visitor groups, training participants, or service partners of varying stature.

Workwear: high-visibility vests, PPE protective suits, respirators, and welding helmets alter the silhouette so strongly that the classifier no longer recognizes the person as human. Exactly the personnel expected on site fall outside the detection window.

Pose and motion: persons lying or crouching go unclassified in 18 to 30 percent of cases. [insert source] In an emergency where an employee has fallen, this is the most critical gap.

Weather: rain, fog, snow, and backlight shift the detection threshold systematically. A model validated in spring can lose two thirds of its accuracy in November fog. [insert source]

Person Detection Bias: Thermal and Audio

Thermal detection, as used by the QR-2 with thermal person detection, is more robust against skin tones but sensitive to ambient temperature and clothing insulation. At 35 degrees ambient temperature in summer, the thermal contrast between person and environment drops below 2 Kelvin. [insert source] The classifier loses discriminating power.

Thermal misclassification also occurs with heavily insulated protective clothing. An employee in cold-storage workwear emits almost no heat outward and is barely distinguishable from the background thermally.

Audio classification for glass breaks, screams, or calls for help has been trained on English-language datasets in many commercial models. German, Turkish, or Polish calls are recognized with lower accuracy. Machine noise in industrial parks creates acoustic masking. This distorts classifier outputs systematically.

Consequence: a bias review must document each sensor modality separately. A conformity proof for RGB does not replace the proof for thermal or audio.

Legal Framework: EU AI Act and NIS-2

Autonomous security robotics falls under high-risk AI systems per EU AI Act Annex III. The EU AI Act classifies security AI in public spaces and critical infrastructure as a high-risk system. The duties cover risk management system, data governance, technical documentation, human oversight, and accuracy logging across the full lifecycle.

NIS-2 supplements these duties by sector. The directive requires KRITIS operators to demonstrate risk management of all technical and organizational measures, including AI-based detection [NIS-2 Directive]. An AI component without documented effectiveness review is not defensible under NIS-2.

EN ISO 13482 defines safety requirements for personal service robots and is the reference standard for mobile patrol units. The standard covers mechanical and functional safety. It does not replace a bias review of the detection algorithms in use.

The EU Machinery Regulation 2023/1230 prescribes conformity assessment for autonomous machines with self-learning behavior. This applies to every system that continues to train on site.

Before commissioning, the operator must present a conformity proof that documents demographic and environmental robustness. Details on sectoral thresholds are at KRITIS requirements for operators.

KRITIS AI Audit: How Plant Managers Get Bias Measured

A workable audit procedure follows five steps.

Step 1: define test cohorts. Attributes: gender, stature, skin tone, clothing (civilian, PPE, high-visibility, cold storage), time of day (day, dusk, night), and weather (clear, rain, fog, snow). The result is a cohort matrix with typically 24 to 48 cells.

Step 2: at least 500 annotated test cases per cohort across three months of real-world operation. Synthetic data alone is insufficient, the BBK does not accept it as the sole proof. The BBK requires documented technical protection measures for KRITIS facilities, subject to regular effectiveness reviews.

Step 3: confusion matrix per cohort. Threshold: the difference in true positive rate between the worst and best cohort must not exceed 5 percent. [insert source / normative basis] An operator who fails this threshold carries a documented bias risk.

Step 4: monthly drift monitoring. Automatic alerting on statistically significant shifts in detection rates. Drift is the most common cause of late false negatives.

Step 5: documentation in the conformity file. This file must be available for BBK audits, NIS-2 evidence, and insurance reviews.

Quarero delivers this audit protocol in standardized form for QR-1, QR-2, and QR-3 for KRITIS sites. Plant managers receive the confusion matrices per cohort as signed PDF and machine-readable JSON.

Human in the Loop as a Bias Corrective

Autonomous patrol generates detection. Escalation runs through the control room with human confirmation. This two-stage model reduces false-positive damage without lowering the detection rate.

Control room operators receive a structured feedback form to reclassify false alarms. For each incident, cohort attributes (time of day, weather, clothing, sensor modality) are captured. The feedback feeds into the monthly model update cycle.

Human oversight is not marketing, it is an AI Act duty under Art. 14. A fully autonomous security system without a human escalation stage is not conformant. This follows directly from the regulatory text. Board-level relevance also lies in NIS-2 board liability in detail.

TCO Consequences: What Bias Audits Cost and Save

The initial audit costs a one-time 8,000 to 12,000 euros, depending on site size and cohort matrix. It is included in the Robotics-as-a-Service model. Ongoing monitoring is a flat component of the monthly service fee, around 3,500 euros for the QR-2.

A conventional 24/7 guard post runs 15,000 to 25,000 euros per month for a position staffed around the clock, depending on Manteltarifvertrag and region. There is no audit duty, but also no data consistency and no reproducible detection rates. The full cost comparison is at TCO compared to classic guard service.

EU AI Act fine risk: up to 35 million euros or 7 percent of global annual turnover for high-risk violations (EU AI Act Art. 99). That is the order of magnitude that overrides every TCO comparison.

A documented bias audit also reduces insurance premiums with specialized cyber and property insurers. The first insurers offer discounts of 5 to 12 percent on the property insurance premium when the AI audit report is on file. [insert source]

Thermal Misclassification in the Pilot Phase: 90 Days Operational

The pilot phase follows a fixed pattern.

Day 0 to 14: baseline measurement. Synthetic test persons (mannequins with defined thermal signatures, calibrated audio sources) and real test persons from the guard team and external auditors walk defined routes. Result: initial values of the confusion matrix per cohort.

Day 15 to 60: shift operation with shadow human patrol. The robot patrols autonomously. A human guard walks in parallel and logs every deviation. These 45 days deliver reliable detection data under real conditions, including weather variation.

Day 61 to 90: evaluation of the confusion matrices per cohort. Handover of the audit report to plant management and security leadership. Weekly status reports to security leadership are standard. A monthly board meeting secures communication toward executive management.

After 90 days, the decision on full operation with documented bias conformity is made. Decide earlier and the dataset is not statistically reliable. Decide later and double costs from pilot and ongoing guard service extend.

Plant managers and security leads planning a 90-day pilot with full bias audit protocol submit a pilot request for their site. Quarero responds within two working days with site review and quote.

Translations

Call now+49 711 656 267 63Free quote · 24 hCalculate price →