Recurrent Neural Networks (RNNs) serve as foundational models for sequence data processing. By recursively passing the previous state to the current time step, RNNs can effectively capture temporal dependencies [10]. However, RNNs are prone to issues such as vanishing or exploding gradients when handling long sequences, which limits their performance in modeling long-range dependencies [11]. Long Short-Term Memory networks (LSTMs), an extension of RNNs, address the vanishing gradient problem by introducing gating mechanisms (such as input, forget, and output gates), enabling the model to capture long-term dependencies more effectively [12]. Despite significant improvements over RNNs, LSTMs still face challenges in terms of computational complexity and their ability to model long sequences. Gated Recurrent Units (GRUs) represent a simplified version of LSTMs, with fewer parameters and a higher computational efficiency. In certain tasks, GRUs have demonstrated performance comparable to or even better than that of LSTMs [13]. However, this simplification means that GRUs may be less capable in handling complex temporal dependencies. To further enhance the performance of spatiotemporal sequence forecasting, Convolutional LSTM (ConvLSTM) combines Convolutional Neural Networks (CNNs) with LSTMs, allowing for the simultaneous capture of spatial features and temporal dependencies. This architecture is particularly suitable for tasks such as video prediction and weather data modeling [1]. Despite the success of ConvLSTM in spatiotemporal data modeling, it still faces high computational costs when dealing with long time sequences and struggles with non-uniform spatiotemporal dependencies. TrajGRU, designed specifically for trajectory prediction tasks, adopts the GRU structure and further optimizes spatial information modeling. It effectively predicts the trajectories of moving objects but is limited in modeling complex spatiotemporal dynamics [14]. In contrast, PredRNN introduces spatiotemporal convolutional LSTMs based on ConvLSTM, utilizing this architecture to capture more complex spatiotemporal dependencies in video sequences and achieve better performance in tasks like video prediction [15]. However, PredRNN suffers from high computational costs, especially for long-term forecasting tasks, where the computational resource consumption becomes excessive. Building upon PredRNN, PredRNNv2 introduces a dual-stream LSTM architecture, enabling the model to simultaneously handle both local and global spatiotemporal dependencies, thereby improving forecasting accuracy. This approach has demonstrated superior performance across various tasks, but its more complex network structure brings higher computational overhead [16]. Consequently, when processing large-scale datasets, optimizing and managing computational resources remain critical considerations.
Source link
Wenbin Yu www.mdpi.com