Electronics, Vol. 15, Pages 101: An Enhanced LSTM with Hippocampal-Inspired Episodic Memory for Urban Crowd Behavior Analysis
Electronics doi: 10.3390/electronics15010101
Authors:
Mingshou An
Hye-Youn Lim
Dae-Seong Kang
The increasing frequency and severity of urban crowd disasters underscore a critical need for intelligent surveillance systems capable of real-time crowd anomaly detection and early warning. While deep learning models such as LSTMs, ConvLSTMs, and Transformers have been applied to video-based crowd anomaly detection, they often face limitations in long-term contextual reasoning, computational efficiency, and interpretability. To address these challenges, this paper proposes HiMeLSTM, a crowd anomaly detection framework built around a hippocampal-inspired memory-enhanced LSTM backbone that integrates Long Short-Term Memory (LSTM) networks with an Episodic Memory Unit (EMU). This hybrid design enables the model to effectively capture both short-term temporal dynamics and long-term contextual patterns essential for understanding complex crowd behavior. We evaluate HiMeLSTM on two publicly available crowd-anomaly benchmark datasets (UCF-Crime and ShanghaiTech Campus) and an in-house CrowdSurge-1K dataset, demonstrating that it consistently outperforms strong baseline architectures, including Vanilla LSTM, ConvLSTM, a lightweight spatial–temporal Transformer, and recent reconstruction-based models such as MemAE and ST-AE. Across these datasets, HiMeLSTM achieves up to 93.5% accuracy, 89.6% anomaly detection rate (ADR), and a 0.89 F1-score, while maintaining computational efficiency suitable for real-time deployment on GPU-equipped edge devices. Unlike many recent approaches that rely on multimodal sensors, optical-flow volumes, or detailed digital twins of the environment, HiMeLSTM operates solely on raw CCTV video streams combined with a simple manually defined zone layout. Furthermore, the hippocampal-inspired EMU provides an interpretable memory retrieval mechanism: by inspecting the retrieved episodes and their att ention weights, operators can understand which past crowd patterns contributed to a given decision. Overall, the proposed framework represents a significant step toward practical and reliable crowd monitoring systems for enhancing public safety in urban environments.
Source link
Mingshou An www.mdpi.com
