Generative Adversarial Network for Synthesizing Multivariate Time-Series Data in Electric Vehicle Driving Scenarios


1. Introduction

The escalating global energy demand and environmental challenges posed by climate change have intensified the search for clean energy solutions and efficient storage technologies. Lithium-ion batteries (LIBs), with their broad operating temperature range, extended cycle life, and high energy density, have emerged as a cornerstone of the green energy transition. While these batteries power diverse applications from consumer electronics to spacecraft, their role in electric vehicles (EVs) is particularly crucial. Industry and academic researchers have made significant strides in enhancing EV safety, performance, and range. The continuous advancement of EV technologies—including battery systems, autonomous capabilities, and other innovations—underscores their fundamental role in achieving a sustainable future.

State of charge (SOC) estimation is crucial for battery management systems (BMSs), particularly in EV applications. Various estimation methods [1,2,3] have emerged, ranging from simple direct measurements to sophisticated algorithmic approaches. Direct measurement monitors voltage or current, offering simplicity but limited accuracy due to non-linear voltage−SOC relationships and environmental influences. Coulomb counting integrates current overtime for real-time estimation but requires accurate initial SOC to prevent drift. The open-circuit voltage (OCV) method correlates resting voltage with SOC, providing good accuracy but requiring extended rest periods and showing sensitivity to temperature and aging effects.
Advanced estimation techniques have evolved to address these limitations. Kalman filtering approaches [4], including Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF), combine measurement data with battery models for dynamic estimation, offering noise resilience but requiring precise modeling and substantial computation. Observer-based methods, including Luenberger and sliding mode approaches, provide robust performance under dynamic conditions while depending on accurate system models. Electrochemical modeling delivers unmatched accuracy through fundamental equation solving but involves complex implementation and heavy computational load.

Hybrid methods have emerged as a promising solution, combining multiple approaches such as Coulomb counting with Kalman filtering to optimize accuracy, robustness, and computational efficiency. The selection of an appropriate SOC estimation method ultimately depends on application-specific requirements. While basic applications may find simpler methods sufficient, high-performance EVs typically require advanced techniques to ensure reliable operation across diverse conditions.

Recent advances in machine learning have demonstrated significant potential for improving SOC estimation in LIBs without explicit modeling, though these approaches demand substantial computational resources despite their high accuracy with sufficient training data. The complexity and variability of real-world LIBs and EV time-series data present significant challenges for both comprehensive dataset collection and model reliability. Sarda et al. [5] highlighted critical challenges in their comprehensive review, particularly emphasizing non-linear battery behavior and evolving battery management system (BMS) requirements, while Obuli et al. [6] identified Gaussian Process Regression as particularly effective despite challenges with variable driving conditions. Khan et al. [7] advanced the field through innovations in deep neural networks, addressing battery behavior in dynamic environments and proposing novel approaches to architectural design and hyperparameter optimization. Selvaraj et al. [8] further enhanced this progress by implementing Bayesian methods for hyperparameter tuning, demonstrating robust SOC predictions across diverse operational conditions, including varying vehicle velocities and environmental factors.
To address real-world data limitations stemming from the high cost and time demands of data collection, researchers have increasingly turned to data augmentation techniques as a cost-effective alternative to extensive physical testing. Iglesias et al. [9] demonstrated the effectiveness of various augmentation methods, including adversarial and automatic approaches, in generating synthetic data that better represent time-series patterns, building upon Wen et al.’s [10] foundational taxonomy of time-domain, frequency-domain, and advanced decomposition-based methods. Lee et al. [11] introduced Temporal Adversarial Data Augmentation (TADA), leveraging time warping to better simulate real-world variations, while Victor et al. [12] provided a comprehensive framework for data augmentation across multiple data types, demonstrating its effectiveness in various machine learning tasks through detailed case studies. Domain-Adaptive Designable Data Augmentation (DADDA) [13] integrates inverse generation with domain adaptation to facilitate rapid design solutions for advanced EV systems. However, accurately modeling dynamic responses across diverse driving conditions remains a significant challenge.
Generative Adversarial Networks (GANs) have revolutionized data synthesis, enabling the creation of realistic images, audio, videos, and more [14]. These models are widely applied in fields such as cybersecurity for intrusion and anomaly detection [15], medical imaging for data augmentation and disease diagnosis [16], and healthcare for synthesizing patient time-series data [17]. GANs excel at generating synthetic data that closely resemble real-world distributions, providing scalable solutions when collecting real data is costly, hazardous, or ethically challenging [18,19]. The framework consists of two neural networks: a generator that produces synthetic data by mapping random noise to the target data distribution and a discriminator that evaluates whether the input data are real or synthetic. These networks compete in a minimax game where the generator aims to fool the discriminator, and the discriminator seeks to distinguish real data from synthetic. This adversarial training allows GANs to model complex data distributions without explicit density estimation. Time-series GANs have demonstrated significant utility in improving energy storage predictions by integrating physical and digital systems [20]. Advanced architectures such as Progressive GANs and StyleGANs have enabled high-resolution image generation and fine-grained control over synthetic outputs [21]. Despite challenges like training instability and mode collapse [22], GANs continue to evolve, incorporating techniques like reinforcement learning and convolutional neural networks to enhance their capabilities [23]. Their ability to generate high-quality synthetic data has made GANs a cornerstone of unsupervised learning, driving innovations across various domains.
Researchers have made significant strides in generating battery time-series data to improve SOC estimation for LIBs. Wong et al. [24] combined a generative GAN (gGAN) with an SOC estimator, generating realistic battery features like voltage, current, and temperature essential for accurate SOC calculations. However, ensuring the synthetic data accurately reflects real-world conditions remains challenging, as discrepancies can impact model performance. The TS-DCGAN model [25] generates synthetic battery data to enhance SOC estimator training. While this approach addresses the scarcity of high-quality datasets, it requires significant computational resources and expertise for integration into existing frameworks. Similarly, TimeGAN [26] generates synthetic time-series data while maintaining realistic temporal dependencies and feature correlations, though generating long sequences with complex temporal relationships remains difficult.
Recent developments in GAN architecture have focused on addressing specific challenges in battery system applications. Wasserstein GAN (WGAN) [27] adapts to the sequential nature of SOC data, preserving long-term dependencies and improving the accuracy of SOC estimation models. However, synthetic data may still struggle to replicate rare or edge cases, limiting its effectiveness under variable operating conditions. Soo et al. [28] introduced a modified TimeGAN that generates virtual battery datasets to enhance model training with limited real data, though such synthetic data may not fully represent real-world scenarios. ITF-GAN [29] combines a deep autoencoder with GANs to generate synthetic data, while Zhang et al. [30] developed the TimesNet model, a deep learning framework enhanced with Gaussian noise-based data augmentation to simulate diverse operating conditions.

Accurate SOC estimation in LIBs is crucial for the safe and efficient operation of EVs. While laboratory settings offer controlled conditions for SOC characterization, real-world driving presents significant challenges. Variable initial SOCs, temperature fluctuations, unpredictable load variations, and diverse driving patterns distort voltage signals, rendering conventional SOC estimation methods unreliable. Existing studies often simplify scenarios with constant loads or predetermined SOC levels, which do not generalize well to the complexities of real-world driving. This discrepancy between controlled experiments and real-world conditions creates a critical need for robust SOC estimation methods capable of handling diverse and noisy data. The primary objective of this study is to develop a novel data augmentation technique using synthetic data to improve the accuracy and robustness of SOC estimation models in real-world EV applications. Generating synthetic data mitigates the limitations associated with relying exclusively on real-world datasets, which are often constrained in terms of size, diversity, and their ability to represent all possible driving conditions. Training models on a combination of real and synthetic data enhance their generalization capabilities for unseen real-world scenarios, thereby improving the accuracy of SOC estimation under challenging conditions.

GANs have shown promise in generating synthetic data across various domains, including image synthesis and time-series generation. Building upon the Pix2Pix architecture [31], which demonstrated successful image-to-image translation, subsequent works like HR-Pix2Pix [32] and Ambient-Pix2PixGAN [33] extended its capabilities to high-resolution images and medical imaging with noisy inputs, respectively. However, challenges related to training stability and generalization to real-world data persisted [34]. These studies highlight the potential of GANs for data augmentation but also emphasize the importance of careful model design and evaluation to ensure the quality and utility of the generated data. To address the specific challenges of time-series data in EV SOC estimation, TS-p2pGAN, a novel GAN-based framework, was developed for augmenting multivariate time-series data.

This model synthesizes dynamic data representing real-world driving scenarios, enabling more effective training of deep learning-based SOC estimation models. TS-p2pGAN integrates environmental, vehicle, battery, and heating system variables, concatenating them with time-series features to generate synthetic SOC and motion data. It ensures robust temporal dependencies among variables and accommodates varying sequence lengths, offering efficient representations of complex time-series data. By training on historical time-series data, TS-p2pGAN captures temporal patterns and generates plausible future trajectories, enhancing dataset diversity and machine learning model performance, especially when real-world data collection is challenging. This capability makes TS-p2pGAN suitable for a wide range of time-series analysis and prediction tasks.

The key contributions of this study are as follows:

  • Data augmentation framework for EV driving scenarios: A GAN-based framework for synthesizing multivariate time-series data specifically for EV driving scenarios is presented. This framework addresses the limited availability of real-world data by enabling the generation of larger and more diverse datasets suitable for training and evaluating SOC estimation models.

  • TS-p2pGAN model architecture: The TS-p2pGAN model, incorporating an integrated transformation network and a multiscale discriminator, is designed to handle high-dimensional, extended time sequences. This architecture aims to capture complex temporal dependencies within the data and generate synthetic time series that preserve these dependencies.

  • Evaluation protocol using quantitative and qualitative metrics: An evaluation protocol employing both quantitative and qualitative metrics is implemented to assess the quality and characteristics of the generated synthetic data. This protocol provides insights into the model’s performance and facilitates further development.

  • Validation with real-world driving data: The model’s performance is evaluated using data from 70 real-world driving trips, demonstrating its ability to generalize to real-world conditions and its potential for practical application in EV SOC estimation.

The remainder of this paper is organized as follows: Section 2 introduces the dataset and provides details of the TS-p2pGAN architecture and training methodology. Section 3 covers the experimental setup, evaluation metrics, and both quantitative and qualitative results. Finally, Section 4 concludes with a summary of the findings and potential directions for future research.

4. Conclusions

The challenge of limited access to real-world data significantly impacts machine learning applications in EV and power battery dynamics analysis, as traditional time-series data augmentation methods often struggle to maintain essential signal characteristics while expanding datasets. To address this challenge, TS-p2pGAN, a novel model designed for generating variable-length synthetic time-series data while preserving original signal properties, is introduced. The model’s architecture uniquely incorporates a transformation net generator for point-to-point translation, utilizing gradient flow from multiple discriminators to a single generator across various scales to effectively capture complex EV parameter influences, including SOC and motor output torque.

Validation, conducted using an open dataset of 70 EV driving trips with comprehensive battery condition data, demonstrated TS-p2pGAN’s superior performance compared to TTS-GAN and TimeGAN in generating realistic and accurate time-series data. The model particularly excelled in preserving temporal dynamics, crucial for maintaining inter-variable relationships across time sequences. Quantitative analysis revealed impressive results, with RMSE values consistently below 3% and MAE values under 1.5% across all trips. Qualitative assessments through t-SNE and PCA visualizations further confirmed the high fidelity of generated data, while discriminative and predictive capability tests highlighted TS-p2pGAN’s advantages over TimeGAN in time-series generation.

The practical implications of TS-p2pGAN extend beyond data generation, offering significant potential for enhancing SOC and motor torque estimation, ultimately contributing to EV energy consumption optimization. The model’s ability to effectively leverage both spatial and temporal features surpasses traditional methods in learning complex time-series patterns while maintaining data integrity.

Despite these achievements, TS-p2pGAN’s reliance on paired datasets presents a notable limitation in diverse real-world environments where such data are often scarce. Future research could enhance the model’s versatility by exploring integration with unpaired learning approaches, particularly through CycleGAN architectures augmented with physics-informed constraints. While this limitation exists, the model demonstrates remarkable robustness through its ability to generate synthetic parameters that maintain consistency with both physical constraints and vehicle dynamics across various driving conditions. Ultimately, TS-p2pGAN marks a breakthrough in synthetic time-series generation for electric vehicle applications, delivering a framework that successfully balances high fidelity, practical utility, and real-world applicability.

Future work could explore the application of this data augmentation framework across diverse domains, including digital energy management systems for LIBs, autonomous vehicle comfort systems, and broader EV technologies. Real-time implementation and validation of TS-p2pGAN in diverse on-road scenarios are crucial to evaluate its performance under dynamic and variable conditions. Collaborating with industries, such as automotive manufacturers, and integrating the framework into existing control systems can significantly enhance its practical utility. The framework’s ability to generate synthetic parameters that adhere to physical constraints and maintain consistency with underlying system dynamics across different conditions demonstrates its robustness and potential for wide-ranging applications. Furthermore, the framework can be employed to accurately estimate SOC for LIBs’ battery management systems (BMSs), improving their reliability and efficiency. This offers significant contributions to fields requiring high-fidelity synthetic time-series data, particularly in enhancing the performance of LIBs and BMS technologies.



Source link

Shyr-Long Jeng www.mdpi.com