A Multi-Output Ensemble Learning Approach for Multi-Day Ahead Index Price Forecasting


1. Introduction

The stock market is a convoluted sector due to the chaotic and non-linear nature of the stock price. Thus, the precise forecasting of future stock index prices is a stimulating yet demanding task. It is observed that many factors, both economic and non-economic factors, influence the behavior of a stock price [1,2]. Closing prices always appear to be an important measure in stock market decision-making. A better approximation of the closing price beforehand enables investors to make trading decisions. A number of statistical models (AR, ARMA, ARIMA) [3] have been introduced to solve this encounter. However, the evolution of machine learning techniques and their high generalization capabilities have been proven efficient in financial time series forecasting [4]. Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) have been widely used in financial market price forecasting problems.
Drucker et al. [5] developed Support Vector Machine for regression problem (SVR), which opened the gateway for future research perspectives in time series forecasting using SVM networks. Early research shows the efficient application of SVR in financial time series forecasting. Tay et al. [6] first examined the possibility of using SVR in financial forecasting. The experiments were performed over five future contracts which shows the efficiency of SVR over BPNN (Back Propagation Neural Networks) in financial time series forecasting. Kim [7] uses SVR in index price forecasting and shows SVR as a possible alternative in index price forecasting. Pai et al. [8] developed a hybrid structure that assimilates the linear prediction power of ARIMA with the non-linear forecasting merits of SVR in stock price forecasting. The presented work displayed improvement in the forecasting performance after the use of hybrid techniques.
The evolution of Deep Neural Networks stimulated the research on the application of DNNs in financial time series forecasting [9]. Recent studies have witnessed vast productive developments in the stock market price forecasting using Neural Networks. Tsang et al. [10] implemented a back-propagation neural network version (NN5) for stock market price forecasting. The proposed method experimented on Hong Kong Stock and HSBC holdings, which suggested improvement over earlier works. Yu et al. [11] proposed a meta learning approach to produce a nonlinear meta-learning model based on neural networks for financial time series forecasting. This meta modeling approach first generates different base models over feed forward neural networks for different input sets produced from different initial setups and later integrates these base models to get the final meta model. An integration proposal for combining Genetic fuzzy systems with self organizing ANN was proposed by [12]. The proposed approach uses step-wise regression to select the influencing input features for the stock price. Later, it used self organizing ANN to create clusters from the stock historical data. Furthermore, the clusters used as genetic fuzzy system feeds to generate future stock price. A hybrid stock price forecasting model using deep learning algorithms is presented in [13]. The model uses CEEMD, SE, ICA, PSO, and LSTM to simplify data and boost statistical efficiency. The model is tested on four Chinese stock prices and found it accurate and robust. The proposed work integrate granular computing with decomposition and ensemble to build non-stationary data forecasting models. By integrating BPNN with ensemble empirical mode decomposition (EEMD), a new hybrid model is constructed in [14], improving prediction accuracy and robustness. The new hybrid model outperforms the existing EEMD-LSTM model in international gold price series forecasting. A detailed literature review for financial time series forecasting using Deep NNs is found in [9]. The article explores over a period of 15 years from 2005 to 2019 to investigate the applications of DNNs in Financial time series forecasting. Ray et al. [15] presents a hybrid algorithm for forecasting multiple correlated time-series data using a multivariate Bayesian structural time series approach and an M-TCN. The algorithm accurately predicts stock price movements, COVID-19 pre-lockdown data from Nifty stock sectoral indices, and newspaper and social media sentiment. The hybrid model predicts pandemic stock market trends better than benchmark models.
Traditional methods have limitations toward nonlinear high fluctuations in financial price data. Conventional forecasting models, including Autoregressive Integrated Moving Average (ARIMA) and Support Vector Regression (SVR), are proficient for short-term predictions but frequently encounter difficulties with long-term dependencies and nonlinear patterns. Therefore, Zhipeng et al. [16] use candlestick patterns to implement a noise removal process on financial data. Later, a Cooperative Co-evolution infused SVR was used to predict financial time series. Gupta et al. [17] proposed a Twin-SVR based approach for forecasting financial time series to deal with non-stationary noisy data. Lahmir [18] presented a hybrid approach, VMD-PSO-BPNN, for intraday stock price forecasting. Proposed approach first uses Variational Mode Decomposition to generate different variation modes of the price data for input features and later uses Back Propagation Neural Network as predictive learning system with the Particle Swarm Optimization technique is used for initial initial weights optimization of BPNN.
Traditional approaches are based on the assumption that all the features have equal contributions toward the target variable. Nevertheless, the assumption is not always sufficient in the practical world, and individual features have their own importance toward the output variables. Thus, the performance of a learning technique can be improved by assigning different weights to the input features. For instance, Wang et al. [19] showed that the generalization capability and performance of SVM can be improved by assigning different weights to different input features. Following this study, Zhang et al. [20] used rough set theory based information gain to determine the feature weights of a SVM classifier. Liu et al. [21] investigated the application of Grey correlation based feature weighted SVR model for stock price forecasting. The proposed hybrid feature weighted SVR method and the baseline SVR were tested over the stock data from China and the experiments indicated improvements in forecasting performance after incorporating weighted features with support vector regression. Yu et al. [22] presents an LSSVR ensemble learning framework for predicting crude oil prices, taking into account user-defined parameters as uncertain variables. The methodology employs a grid technique for estimating a low upper bound, generates stochastic parameters, and integrates the outcomes.
A review of the literature suggests that it is not possible to develop an individual standalone model that can be appropriate for every stock market. Earlier research less explores the possibility of multi-day ahead forecasting. An alternative proposal is to use diverse techniques as individual forecasting models. Recent works have focused on hybrid combined ensemble models in time series forecasting. The ensemble models make use of the merits of all individual forecasting results to produce one improved ensemble result [23,24,25]. The motivation behind all ensemble models is to consider all the individual models as independent contributors and to intelligently integrate the responses to produce ensemble forecast results. Traditional ensemble models, such as bagging and boosting, apply static weights to base learners, optimizing for a single-step forecast [26]. These methods often struggle with multi-day forecasting, where error propagation is significant. Ensemble learning frameworks address single-output forecasting, where models predict only the next day’s price. For multi-day forecasts, recursive strategies are used, leading to cumulative error. Recursive forecasting, where single-step predictions are iteratively used to predict multi-day ahead prices, introduces cumulative errors [27]. These errors compound over time, reducing accuracy for multi day ahead forecasting. Multi-output forecasting, which predicts multiple future points simultaneously, remains under-explored in ensemble architectures. Although metaheuristic algorithms like Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) have been utilized for hyperparameter tuning [28], their application in dynamic weight optimization for ensemble learning is still constrained. Although Ant Colony Optimization (ACO) has demonstrated promise in combinatorial optimization and path-finding tasks, it is underutilized in financial forecasting.
Inspired from the previous works, this study proposes a new ensemble framework that integrates the importance scores with learning algorithms through a Multi-colony Ant Colony Optimization techniques. This ensemble approach incorporates different importance score generation strategies to determine the importance of an input feature and later uses it to build new feature weighted input spaces. Afterward, it uses three multi-output improved variants of Support Vector Regression as baseline learning algorithms. The three baseline algorithms incorporated in this work (i.e., MO-LSSVR, MO-PSVR and MO- ε -TSVR) have computational advantages over SVR, and earlier research shows improvement in performance toward price forecasting problems [29,30,31,32]. In the final stage, a multi-colony variant of ACO-LD (MACO-LD) [33] is used to combine the hybrid models generated by integrating importance score based weighted features with baseline methods to construct the final multi-output ensemble model. The proposed framework dynamically adjusts the contributions of base learners for each forecast horizon. This ensures that model weights are optimized based on real-time performance, enhancing adaptability to market conditions. The multi-output framework generates multi-day ahead future values in a single pass, thus minimizes error propagation, a critical issue in recursive forecasting. This increases the accuracy of multi-day ahead predictions without relying heavily on iterative processes. The exploration-exploitation mechanism of ACO dynamically fine-tunes ensemble weights, preventing local optima and ensuring that the ensemble evolves with changing market dynamics. This metaheuristic-driven optimization differentiates the framework from traditional ensemble learning methods that rely on static weight assignments.

The main contributions of this proposed study are summarized as follows:

  • This work proposes a new importance score based feature weighted ensemble model for index future price forecasting. The merits of Importance scores, multi-output baseline algorithms, and the intelligent multi colony ACO-LD [33] technique are integrated for index future price forecasting. This method suggests an ensemble alternative for price forecasting problems in an efficient way.
  • Four different types of importance score generation methods (F-test, Relief, Random Forest, and Grey Correlation) are incorporated to generate new feature weighted input feature spaces.

  • Three baseline algorithms (LSSVR, PSVR and ε -TSVR) are integrated with the new input spaces to generate twelve hybrid feature weighted models. Multi-output versions of the baseline learning algorithms are introduced to address multi-day ahead index price forecasting problem.

  • To construct the ensemble model from the hybrid feature weighted models, a multi colony version of ACO-LD optimization technique is introduced to aggregate the hybrid models.

  • To illustrate the performance characteristics, experiments have been performed over eight historical index futures price datasets and the input features for forecasting the future price of an index have been chosen from a vast array of technical indicators.

  • A detailed discussion is performed for both, on improvements of baseline algorithms after integrating importance scores and the comparison of individual forecasting results with the ensemble forecasting results.

The remainder of the paper is organized as follows. Section 2 briefly reviews the related studies. Different feature importance score generation strategies are introduced in Section 3.1. The baseline forecasting algorithms are explained in Section 3.2 and the complete architecture of the proposed ensemble technique is described in Section 4. Section 4.1 gives details about the input feature space and Section 4.2 describes hybrid models, and finally, Section 4.4 explains the construction of proposed ensemble model. Section 5 comprises the experiments over the index future and the detailed discussion for both hybrid feature weighted models and the ensemble model. Finally, the conclusions from the empirical findings are presented in Section 6.

2. Related Works

Recent works demonstrate the prodigious application of models based on support vector machines in financial time series forecasting. Ince et al. [34] used heuristic models of principal component analysis (PCA) and non-parametric hybrid models with SVR for stock price prediction [35]. The presented works use factor analysis and kernel principal components to find the most important input features. Lu et al. [36] propose a two-stage approach for financial forecasting. They first use independent component analysis to deal with high noise in a financial time series and then use support vector regression in forecasting future prices. Hsu et al. [37] introduced a two-stage hybrid design by using a self-organizing map to decompose the input space with a similar statistical distribution and then using SVR in the later stage in forecasting index prices. A multi-kernel learning based hybrid approach is introduced in [38] for price forecasting. It uses semi-definite programming to derive kernel parameters and Lagrange multipliers simultaneously. Kazem et al. [39] proposed a different forecasting model incorporating the firefly algorithm with SVR and chaotic mapping to forecast the stock market prices. The proposed three staged algorithm first reconstructs phase space dynamics of input space, then it uses the firefly algorithm to tune SVR hyper-parameters and later it uses the optimal SVR model in the prediction of the stock market price. A hybrid μ -SVR model is introduced for stock index price forecasting in [40]. The proposed approach uses PCA for input feature selection, and the Brain-Storm Optimization technique is used to select the optimal parameters of μ -SVR. Pai et al. [30] incorporate LSSVR for stock market price forecasting with different causality scenarios, where historical trading data and social media influences are taken as input features, and correlation is used for feature pruning to obtain independent variables.
Hybrid learning methods incorporating heuristic optimization techniques with Support Vector Machine are being introduced by researchers to increase the performance of price forecasting techniques. Rustam et al. [41] investigated the application of SVR and particle swarm optimization together in forecasting stock prices with several technical indicators as input. PSO is used to select appropriate inputs, and later, SVR is used as the prediction model. Zhang et al. [28] proposed a modified Firefly algorithm to increase the convergence speed. The later stage of the proposed work uses this modified FA with SVR as a hybrid structure for stock price forecasting, where the modified FA is used for the optimal selection of parameters for SVR. Recent works indicate that hybrid combined models are proven to be effective and also improve the forecasting performance of single baseline models. Kumar et al. [31] proposed a hybrid system for market trend forecasting. The proposed method uses different importance scores to build hybrid classification models. The performance comparison implied the superiority of hybrid models over baseline methods. Meng et al. [42] introduced a continuous ACO based ensemble approach with SVR for forecasting the price in cloud manufacturing. The efficacy of the methodology is validated using real-world data in the proposed contribution. Significant generalization performance and dependable outcomes in ensemble learning randomness are achieved by the proposed method, as demonstrated by the experimental outcomes. Chen et al. [13] presents a hybrid model for stock price forecasting using machine learning algorithms, including decomposition and ensemble, independent component analysis, particle swarm optimization, and long short-term memory. The model’s accuracy and robustness were tested on four stock prices from the China stock market. Wang et al. [40] introduces a hybrid v-SVR model for forecasting stock price indexes that incorporates principal component analysis and brain storm optimization. The model, which uses correlation analysis and PCA, accurately approximates the actual CSI300 stock price index. Xu et al. [43] explores stock closing price forecasting using clustering and ensemble learning. K-means clustering and SVR based ensemble model is proposed. The proposed hybrid prediction model obtains the best predicting accuracy of the stock price.
A brief description of supporting studies that built the motivation for the proposed work is summarized in Table 1. These findings validate that ensemble methods, which integrate multiple baseline methods, can produce more precise predictions in comparison to individual methods. The findings indicate that utilizing a diverse ensemble is an appropriate method for forecasting the movement of stock prices in the banking industry of South Africa.

6. Conclusions

In this work, a new ensemble approach is proposed for multi-day ahead index future price forecasting problem. The proposed approach uses different importance score generation methods to obtain feature-weighted hybrid learning models. multi-out support vector regression approaches are developed for multi-day ahead forecasting. Finally, the individual feature weighted hybrid forecast models are combined through a continuous multi colony ant colony optimization model (MACO-LD) to construct the ensemble model.

Eight index futures price data are considered to evaluate the performance. A comprehensive array of technical indicators and oscillators have been utilized as the input features in this study. The data sets are trained and tested through a walk forward sliding window approach. First, the experimental findings are investigated to compare importance score based hybrid models with the baseline models, and then the ensemble forecasting model is compared with the parent models. The numerical results, achieved based on different performance metrics, show the superiority of the proposed ensemble model over baseline algorithms and the hybrid feature weighted models. The empirical findings establish that the proposed ensemble model is a promising price forecasting tool and should be applied for the index future price forecasting problem.

The proposed framework signifies a substantial improvement in financial forecasting, overcoming critical shortcomings of conventional ensemble learning via dynamic weight optimization and hybrid model integration. Despite the limitations in computational complexity and hyperparameter sensitivity, future research may refine and augment the model, thereby improving its robustness and applicability across various financial markets. This study establishes a basis for more flexible, interpretable, and precise multi-day forecasting models, enhancing the development of financial time series analysis.



Source link

Kartik Sahoo www.mdpi.com