We propose MaskCut approach to generate pseudo-masks for multiple objects in an image. CutLER can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K. CutLER exhibits ...
HOI-DETR is a transformer-based framework for detecting hands, hand-held objects, and their interactions in images and video. Built on the Co-DETR architecture, it adds a lightweight interaction ...