VisionLabs

Synthetic-Data Augmentation for Face Recognition - VisionLabs

Face-recognition models trained on real-only data plateau at the long tail of identity - small numbers of images per identity, demographic skew, and limited pose/lighting coverage. Adding synthetic identities is appealing, but only works if the synthetic generator is conditioned on a representative descriptor of each real identity.

2024

Approach

Built the descriptor-extraction stage of a synthetic-augmentation pipeline. Trained an IR-50 + ArcFace baseline on MS1M-RetinaFace-T1 (5M faces, 90K identities), then used IR-101 + CurricularFace to extract descriptors and compute per-identity class centers. Those centers conditioned an Arc2Face diffusion generator to produce per-identity synthetic faces; retraining on mixed real+synthetic data improved accuracy by 30% across 7 benchmarks.

Why class centers, not single descriptors

A single descriptor over-specifies a synthetic identity to one pose-and-lighting configuration. Class centers - averaged across an identity's real images - capture identity separately from nuisance variation, which is what you want a generator to preserve.

What the lift came from

The 30% improvement wasn't from more data - it was from filling the demographic and pose tails that real data couldn't reach. Synthetic identities matter most where real coverage is thinnest.