Reality Perception Gap | Evaluating AI-Manipulated Video Detectors under Real-World Distribution Shifts

sec 01

The reality gap

Reality Perception Gap (RPG) exposes the deployment gap in AI-video forensics: detectors tuned on clean clips break when videos are platform re-encoded, post-processed, captured with camera artifacts, locally edited, or generated by frontier models. RPG makes those shifts visible, so reviewers can see where synthetic-video detection fails outside the lab.

sec 02

What is new in Reality Perception Gap

Socially Fragile Content

Crime
Natural Disaster
CCTV
Traffic
War

ii.

Coupled Distribution Shifts

Social Media Compression
Retiming
Center Crop
Camera Optical Artifacts

iii.

Hybrid Manipulation

Character Replacement
Video Extension
Relighting

sec 03

Video demonstrations

Group 01

Socially fragile categories

High-impact domains where manipulated video can cause social harm: combat, traffic, natural disasters, and crime.

Combat footage

Traffic incident

Natural disaster

Crime scene

Group 02

Localized AI edits in real footage

Mixed-source clips where surrounding scene and camera statistics remain real while a region, attribute, or temporal continuation is synthesized.

Character replacement

Video extension

Relit footage

Group 03

Camera optical artifacts

Camera-side and camera-style artifact cases, including generated high-ISO noise, that reveal whether detectors confuse optical imperfections, sensor noise, and focus shifts with synthesis residuals.

Rolling shutter

Chromatic aberration

High ISO noise

Autofocus hunting

sec 04

Detector performance on RPG

Detector	ACC	Macro F1	AUC up	TPR fake	TNR real	TPR at 1% FPR
D3	0.685	0.637	0.689	0.748	0.537	0.012
WaveRep	0.591	0.591	0.803	0.438	0.945	0.237
GenD	0.330	0.287	0.583	0.060	0.955	0.016
VideoFACT	0.562	0.479	0.583	0.562	0.561	0.025
FreqNet	0.404	0.404	0.548	0.285	0.679	0.025
AIGVDet	0.179	0.167	0.546	0.035	0.956	0.008
RINE	0.318	0.269	0.516	0.042	0.958	0.013
UFD	0.351	0.326	0.510	0.113	0.903	0.016

sec 05

Key findings

Compression as adversary

Platform encoding erases many frequency cues detectors are trained to read.

Deployment-grade compression removes fragile high-frequency residuals, exposing detectors that perform well only under clean laboratory conditions.

Generator coverage

Frontier generators, not data volume alone, set the binding constraint.

Per-generator behavior varies substantially, suggesting that future benchmarks need deliberate coverage of modern synthesis pipelines.

Threshold calibration

Default thresholds are unstable across realistic deployment shifts.

Detector operating points can drift toward always-real or always-fake behavior when clean evaluation assumptions are removed.

Optical decoys

Camera-style artifacts can look like generative artifacts.

Rolling shutter, chromatic aberration, generated high ISO noise, and autofocus hunting stress whether models distinguish capture physics from synthesis traces.