JRFM, Vol. 18, Pages 662: A Two-Stage Machine Learning Approach to Bankruptcy Prediction: Integrating Full-Feature Modeling and Optimized Feature Selection
Journal of Risk and Financial Management doi: 10.3390/jrfm18120662
Authors:
Masanobu Matsumaru
Hideki Katagiri
Corporate bankruptcy prediction has become increasingly critical amid economic uncertainty. This study proposes a novel two-stage machine learning approach to enhance bankruptcy prediction accuracy, applied to Tokyo Stock Exchange-listed companies. First, models were trained using 173 financial indicators. Second, a wrapper-based feature selection process was employed to reduce dimensionality and eliminate noise, thereby identifying an optimal seven-feature set. Two ensemble learning methods, Random Forest and Light Gradient Boosting Machine (LightGBM), were used. Random Forest correctly predicted 566 bankruptcies using the reduced feature set (88 more than when using all features) compared with 451 by LightGBM (31 more than when using all features). LightGBM is a gradient boosting–based ensemble learning method that employs a leaf-wise tree growth strategy, enabling fast computation and high predictive accuracy, especially in large-scale and high-dimensional datasets. The study also addresses challenges posed by imbalanced data by employing resampling techniques (SMOTE, SMOTE-ENN, and KMeans). Additionally, the need for industry-specific modeling is recognized by constructing models for the six industry sectors. These findings highlight the importance of feature selection and ensemble learning for improving model generalizability and uncovering industry-specific patterns. This study contributes to the field of bankruptcy prediction by providing a robust framework for accurate and interpretable predictions for both academic research and practical applications. Future work will focus on further enhancing prediction accuracy to identify more potential bankruptcies.
Source link
Masanobu Matsumaru www.mdpi.com

