Mathematics, Vol. 13, Pages 1529: Multi-Domain Controversial Text Detection Based on a Machine Learning and Deep Learning Stacked Ensemble


Mathematics, Vol. 13, Pages 1529: Multi-Domain Controversial Text Detection Based on a Machine Learning and Deep Learning Stacked Ensemble

Mathematics doi: 10.3390/math13091529

Authors:
Jiadi Liu
Zhuodong Liu
Qiaoqi Li
Weihao Kong
Xiangyu Li

Due to the rapid proliferation of social media and online reviews, the accurate identification and classification of controversial texts has emerged as a significant challenge in the field of natural language processing. However, traditional text-classification methodologies frequently encounter critical limitations, such as feature sensitivity and inadequate generalization capabilities. This results in a notably suboptimal performance when confronted with diverse controversial content. To address these substantial limitations, this paper proposes a novel controversial text-detection framework based on stacked ensemble learning to enhance the accuracy and robustness of text classification. Firstly, considering the multidimensional complexity of textual features, we integrate comprehensive feature engineering, i.e., encompassing word frequency, statistical metrics, sentiment analysis, and comment tree structure features, as well as advanced feature selection methodologies, particularly lassonet, i.e., a neural network with feature sparsity, to effectively address dimensionality challenges while enhancing model interpretability and computational efficiency. Secondly, we design a two-tier stacked ensemble architecture, which not only combines the strengths of multiple machine learning algorithms, e.g., gradient-boosted decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost), with deep learning models, e.g., gated recurrent unit (GRU) and long short-term memory (LSTM), but also implements the support vector machine (SVM) for efficient meta-learning. Furthermore, we systematically compare three hyperparameter optimization algorithms, including the sparrow search algorithm (SSA), particle swarm optimization (PSO), and Bayesian optimization (BO). The experimental results demonstrate that the SSA exhibits a superior performance in exploring high-dimensional parameter spaces. Extensive experimentation across diverse topics and domains also confirms that our proposed methodology significantly outperforms the state-of-the-art approaches.



Source link

Jiadi Liu www.mdpi.com