HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

Available in: 中文
2026-04-07T21:31:49.551Z·1 min read
Mixture-of-Experts (MoE) has transformed language models, but applying it to computer vision — particularly object detection — requires a fundamentally different approach. HI-MoE introduces hierarc...

Mixture-of-Experts (MoE) has transformed language models, but applying it to computer vision — particularly object detection — requires a fundamentally different approach. HI-MoE introduces hierarchical, instance-conditioned routing that matches the structure of detection tasks.

The Problem with Existing Vision MoE

Current vision MoE methods operate at the image or patch level — treating all regions equally. This is poorly aligned with object detection, where:

HI-MoE's Two-Stage Routing

StageRouterFunction
Scene RouterLightweightSelects scene-consistent expert subset
Instance RouterPer-queryAssigns each object query to experts within that subset

This hierarchical design preserves sparse computation while better matching the heterogeneous, instance-centric structure of detection.

Key Innovation

Results

On COCO dataset, HI-MoE improves over:

With preliminary specialization analysis on LVIS for long-tail detection.

Why It Matters

↗ Original source · 2026-04-07T00:00:00.000Z
← Previous: Spatially Aware GNNs with Contrastive Learning Improve Power Outage Prediction During Extreme WeatherNext: Tailslayer: C++ Library for Eliminating Tail Latency in RAM Reads →
Comments0