Why "selective" matters
Recalibrating every channel costs FLOPs proportional to channel count. Fine-grained features only live in a small subset - so learning which channels to recalibrate buys you most of the benefit at a fraction of the compute.
MBZUAI
ConvNeXt-V2 is competitive on ImageNet but loses ground on fine-grained classification (CUB Birds, FGVC Aircraft, FoodX) where class-discriminative features live in narrow channel subspaces. Adding generic attention helps a little but tends to inflate FLOPs without proportionate accuracy gain.
Approach
Designed Selective Channel Recalibration Attention - a lightweight attention module that recalibrates only a learned subset of channels rather than all of them - and integrated it into ConvNeXt-Large-V2. SCRA improved fine-grained accuracy on CUB, Aircraft, and FoodX while keeping total FLOPs within 5% of the baseline.
Recalibrating every channel costs FLOPs proportional to channel count. Fine-grained features only live in a small subset - so learning which channels to recalibrate buys you most of the benefit at a fraction of the compute.