复旦大学数字信号处理与传输实验室的吴胜、金麟同学的论文EventMG: Efficient Multilevel Mamba-Graph Learning for Spatiotemporal Event Representation,被2025 Neural Information Processing Systems (NeurIPS 2025录用



论文摘要:Event cameras offer unique advantages in scenarios involving high speed, low light, and high dynamic range, yet their asynchronous and sparse nature poses significant challenges to efficient spatiotemporal representation learning. Specifically, despite notable progress in the field, effectively modeling the full spatiotemporal context, selectively attending to salient dynamic regions, and robustly adapting to the variable density and dynamic nature of event data remain key challenges. Motivated by these challenges, this paper proposes EventMG, a lightweight, efficient, multilevel Mamba-Graph architecture designed for learning high-quality spatiotemporal event representations. EventMG employs a multilevel approach, jointly modeling information at the micro (single event) and macro (event cluster) levels to comprehensively capture the multi-scale characteristics of event data. At the micro-level, it focuses on spatiotemporal details, employing State Space Model (SSM) based Mamba, to precisely capture long-range dependencies among numerous event nodes. Concurrently, at the macro-level, Component Graphs are introduced to efficiently encode the local semantics and global topology of dense event regions. Furthermore, to better accommodate the dynamic and sparse characteristics of data, we propose the Spatiotemporal-aware Event Scanning Technology (SEST), integrating the Attention-based Perturbation Network (APN) and Multidirectional Scanning Module (MSM), which substantially enhances the model's ability to perceive and focus on key spatiotemporal patterns. Experimental results demonstrate that EventMG achieves performance comparable or superior to state-of-the-art heavyweight models across multiple benchmarks, while maintaining an extremely low parameter count and linear complexity. This validates the effectiveness of the proposed architecture for efficient, multilevel spatiotemporal representation of event data. The code will be made publicly available upon acceptance. 



论文作者:Sheng Wu, Lin Jin, Hui Feng, Bo Hu