Document Type : Original Research Paper
Authors
Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran.
Abstract
Background and Objectives: Multi-object tracking in dense, multi-camera environments remains challenging due to occlusions, lighting variations, and fragmented trajectories. While existing methods rely on hierarchical two-step approaches or complex Bayesian filters, they often fail to fully exploit spatio-temporal correlations or to approach global consistency across cameras and frames. This study aims to address these limitations by proposing a novel graph-based deep learning model for continuous person tracking that independently optimizes spatial and temporal associations.
Methods: The proposed model decomposes multi-camera tracking into two tasks: temporal association (linking objects across frames using velocity and time) and spatial association (aligning objects from multiple viewpoints). A spatio-temporal graph structure is constructed, with nodes representing detected objects and edges encoding relationships. Message Passing Networks (MPNs) iteratively update node and edge features, while a graph consensus fusion module merges spatial and temporal graphs for robust tracking. The model is trained using Focal Loss and evaluated on the Wildtrack and CAMPUS datasets.
Results: The model achieves state-of-the-art performance, with a MOTA score of 85.5% on Wildtrack and 77.4–87.4% on CAMPUS subsets. Key improvements include a 100% MT (mostly tracked) rate and 0% ML (mostly lost) rate on CAMPUS, demonstrating exceptional robustness in occluded and crowded scenes. The IDF1 score (87.2%) highlights superior identity preservation. The decoupled design reduces graph size, which improves scalability.
Conclusion: By decoupling spatial and temporal associations and leveraging graph-based optimization, the proposed model significantly enhances tracking accuracy and reliability in multi-camera settings. This work provides a framework for applications like surveillance and autonomous systems, with future potential for attention mechanisms and adaptive graph integration.
Keywords
- Person tracking
- Multi-camera environment
- Deep learning
- Spatio-temporal features
- graph neural networks
Main Subjects
Send comment about this article