AI Training Data Security: What CISOs Must Audit

AI training data security for patrol robots: data sources, GDPR, EU AI Act, edge inference, site adaptation. Concrete figures for KRITIS operators.

Dr. Raphael Nagel (LL.M.)

May 25, 2026Investor & Author · Founding Partner

Follow on LinkedIn

When a security robot misses a person, that is a detection error. When it falsely flags a person as a threat, that is a process error. Both errors share the same root cause: the training data. This article addresses CISOs and security managers in KRITIS operations. It explains where the data comes from, how it is audited, and which contractual points must be settled before signature.

AI Training Data Security: Why the Data Base Determines Detection Quality

Models for person and vehicle detection only reach a false-negative rate below 2 percent under real conditions when at least 50,000 annotated frames per object class are available. Below that threshold, the error rate rises not linearly but disproportionately at edge conditions: dusk, rain, partial occlusion.

Poor training data produces the inverse problem. Each false alarm consumes on average 18 minutes of intervention control room time for review, assessment, escalation and documentation. At a false-alarm rate of 12 percent on a site with 200 nightly detection events, this generates 7.2 person-hours of control room load per night. That is the actual cost driver, not the hardware.

Quarero therefore separates strictly between pre-training at the factory and site adaptation at the deployment location. Pre-training provides the object-class base. Site adaptation over four weeks calibrates anomaly detection to the site-specific normality. Both are versioned and audited separately.

Training data provenance is a documentation obligation under EU AI Act Article 10 for high-risk AI systems. Sources, annotation methods, bias tests and representativeness must be verifiable. No model leaves our factory without a signed data sheet on these points. Anyone who does not receive one should not buy.

Next step: technical specification QR-2 outdoor patrol with thermal person detection.

Data Sources: Public Datasets, Synthetic Generation and Customer Data

The base models draw on three sources for general object classes: COCO for persons and vehicles, BDD100K for outdoor scenes with time-of-day and weather annotation, ImageNet for fine classes. Thermal datasets are licensed from FLIR. These public sources cover about 40 percent of the training base.

The remaining 60 percent come from synthetic generation. Unity and Unreal pipelines produce night-vision, rain, fog and backlight scenarios in a volume and variability that real recordings could not achieve. Synthetic data offer pixel-accurate labels and configurable edge cases. They carry the disadvantage of a reality gap that must be closed by real validation data.

Customer data do not flow into base training. If a customer explicitly consents, site data are used pseudonymised and exclusively for the model of that one site. They do not enter a cross-sector pool. That is the most common objection from CISOs, and it is valid: we do not operate federated learning across customer boundaries.

Drone signatures for QR-3 for KRITIS with LiDAR and drone detection come from a dedicated 14-month measurement campaign with 47 drone models. Audio training data for shot detection and glass break are generated through Foley recordings in acoustically controlled rooms, not from real incidents. That is a deliberate refusal of real data, because real shot and burglary signatures cannot be obtained ethically or legally.

GDPR and AI Regulation: Legal Framework for Security Robot Training Data

Person detection is not biometric identification under GDPR Art. 9 as long as no recognition across sessions takes place. Quarero models produce person bounding boxes with classification person/non-person, without identity attributes, without facial features, without gait recognition. There is no facial recognition in the system. That is an architecture decision, not a configuration option.

The EU AI Act classifies perimeter surveillance as a high-risk system when it covers public spaces. For fenced industrial sites, business parks and KRITIS locations, this classification does not apply automatically because there is no public accessibility. The risk assessment must be documented per site. On mixed premises with publicly accessible areas (reception, visitor parking), high-risk obligations apply to those zones.

The data processing agreement with Quarero regulates any use of site data for model improvement explicitly as opt-in. The default is: data remain on site, nothing migrates into our training pipeline. Customers who want model improvement with their own data activate this contractually and can revoke it at any time.

Retention periods are technically enforced: raw video data 72 hours on the robot, metadata on alarm events 90 days, anonymised statistical data 24 months. Model updates are removed from the active training pipeline after 30 days. The Machinery Regulation EU 2023/1230 additionally governs safety requirements for AI-supported autonomous systems. EN ISO 13482 specifies safety requirements for mobile service robotics as a reference framework (ISO 13482).

Further reading on obligations: KRITIS-Dachgesetz checklist 2026.

Annotation and Quality Assurance: How Quarero Labels Training Data

Annotation takes place in Germany by security-vetted personnel. We do not use crowdsourcing platforms such as Mechanical Turk or Scale AI for training data in security contexts. The reason is simple: anyone who sees images of industrial sites, patrol routes and security infrastructure should be vetted. This costs roughly a factor of 4 over offshore annotation and is not negotiable.

Each annotation runs under the four-eyes principle. Edge cases (persons with unusual luggage, maintenance staff in PPE, construction workers with tools) go to a third review instance, which defines new edge-case categories weekly. The annotation error rate is measured weekly, target value below 0.8 percent. If exceeded, the pipeline halts until root-cause analysis is complete.

Bias tests are mandatory before every model release. Detection rate must lie within 3 percent across all demographic classes (gender, estimated age group, skin tone on the Fitzpatrick scale). If the spread is higher, the dataset is reweighted and training is repeated. This is expensive and delays releases, but it is required both legally (AI Act Art. 10) and operationally.

Adversarial testing checks synthetically generated camouflage and occlusion scenarios before production release. This includes anti-detection patterns on clothing, multi-layer occlusion and weather effects that distort human contours.

Edge Inference Instead of Cloud: Data Minimisation as Architectural Principle

Inference runs entirely on the robot. NVIDIA Jetson Orin with 275 TOPS in QR-2 and QR-3 is sufficient for simultaneous RGB detection, thermal evaluation and anomaly classification in real time. No video stream leaves the site. Only metadata (classification, timestamp, position) and alarm frames are transmitted to the customer control room.

This architecture decision has consequences. It prevents cloud-based analytics with continuous streams, which some competitors use as a selling point. In return, it reduces the data-protection attack surface considerably: what is not uploaded cannot be intercepted. For KRITIS operators under NIS-2 compliance, this is the simpler risk assessment.

Model updates are distributed as signed delta updates. The robot verifies the signature chain before applying them. Unsigned or manipulated updates are rejected and reported. There is no federated learning, each site remains isolated on the data side. Local storage is encrypted with AES-256, key rotation every 90 days automatically.

Site Adaptation: Four Weeks of Site Training After Delivery

Phase 1 (week 1) is rule-based, not ML training. Geofencing is configured, patrol routes are taught, exclusion zones are defined. In this phase the robot operates with factory settings and logs all detections for phase 2.

Phase 2 (weeks 2 to 3) is the actual site training. The system learns expected movement patterns: forklift routes, employee shift changes, regular delivery traffic, cleaning staff, external service providers with their time windows. This is not new training of the person detector, but calibration of the anomaly classifier on the basis of existing detections.

Phase 3 (week 4) finalises anomaly detection. The false-alarm rate typically drops from 12 percent (factory delivery) to 1.8 percent (calibrated). That is the main reason why a pilot of less than 30 days delivers no reliable statement on detection quality. Anyone offering a 7-day pilot is showing the uncalibrated machine.

Site data remain physically on the robot and on the customer NAS. They never enter a Quarero cloud. At contract end, all site-specific model weights are demonstrably deleted, and the deletion log goes to the customer. In the Robotics-as-a-Service model, this clause is part of the contract, not a negotiation point.

Supplementary reading: perimeter protection in industrial parks.

Risks: Data Poisoning, Model Theft and Adversarial Attacks

Data poisoning is the attempt to deliberately degrade model behaviour through manipulated training or adaptation data. Protection: the update server signs each model with a hardware security module. The robot rejects unsigned updates or updates with invalid signatures. Training data input into the central pipeline runs through several validation stages with statistical drift tests.

Model theft is addressed by a Trusted Execution Environment on the Jetson Orin. Model weights are kept in an encrypted area and are not unpacked into normal main memory. Physical access to the robot does not allow extraction of the weights without breaking the TEE, which by current standards is commercially uneconomical.

Adversarial patches are the most practically relevant attack scenario. T-shirts with anti-detection patterns can fool certain model generations. We test this attack class every quarter with commercially available and academically published patches. Countermeasures (detection layer for the patches themselves, ensemble classification) are rolled out in regular model updates.

The training data pipeline is audited under BSI IT-Grundschutz with annual external review (BSI/BBK). Incident response in case of suspected data poisoning is defined: rollback to the last verified model within 4 hours, in parallel forensic analysis of the training pipeline over the past 30 days.

What Security Managers Should Audit Before Signature

Six points belong in every due diligence before signing a RaaS contract for security robots.

First: request the training data sheet. It must contain sources, annotation location, annotation procedure and bias test results. A vendor who does not deliver a data sheet either does not have one or does not want to show it. Both are exclusion criteria.

Second: review the DPA annex. Is the use of site data for model improvement opt-in or opt-out? With opt-out, the question is what happens on silence. The standard should be opt-in.

Third: clarify whether raw video data ever leave the site. The answer should be no. If yes (for support cases, model diagnostics), under what conditions, with what retention, with what anonymisation.

Fourth: contractually secure proof of deletion at contract end. The deletion log should cover all site-specific model weights, site data and configurations.

Fifth: agree on a pilot with defined false-positive and false-negative measurement over 30 days. Shorter pilots measure the uncalibrated machine and produce better or worse numbers than later regular operation. An honest measurement requires full site adaptation plus two weeks of regular operation.

Sixth: ask reference customers in the same sector about detection quality, not just about the sales process. The operationally relevant question is: how many false alarms per night after month 3? How many confirmed detections per month? Which edge cases caused problems?

For economic evaluation against classic personnel guarding, the guard service TCO comparison helps. For a concrete specification of your site and a data sheet template, arrange a technical pre-meeting via the contact page. We send the standard DPA annex and a sample training data sheet in advance, so your data protection and legal departments can review them in parallel before the meeting.

Translations

AI Training Data Security: What CISOs Must Audit

AI Training Data Security: Why the Data Base Determines Detection Quality

Data Sources: Public Datasets, Synthetic Generation and Customer Data

GDPR and AI Regulation: Legal Framework for Security Robot Training Data

Annotation and Quality Assurance: How Quarero Labels Training Data

Edge Inference Instead of Cloud: Data Minimisation as Architectural Principle

Site Adaptation: Four Weeks of Site Training After Delivery

Risks: Data Poisoning, Model Theft and Adversarial Attacks

What Security Managers Should Audit Before Signature

Cut false alarm rates on security robots

AI Bias in Security: Audit Duty for KRITIS Operators

Deployment Checklist: Security Robots in 14 Days

Security Robot FAQ: Questions Before Procurement

Security Robot Battery: Runtime and 24/7 Availability