FusedFeatures
Rol Features
DetectionHead
DetectionFeatures
a3
Recognizer
RolAlign
Transformer Encoderwith Dynamic Head
Detector
a1
DilatedSwin Transformer
Refinement Stage ×K
ProposalFeatures
Flatten
Down Sampling
RecognitionConversion
a2