Bioengineering, Vol. 12, Pages 1245: A Multi-Task Ensemble Strategy for Gene Selection and Cancer Classification
Bioengineering doi: 10.3390/bioengineering12111245
Authors:
Suli Lin
Zhizhe Lin
Jin Zhang
Man-Fai Leung
Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small number of samples. As a result, selecting informative genes is necessary to improve classification performance and model interpretability. Many existing gene selection methods fail to produce stable and consistent results, especially when training data are limited. To address this, we propose a multi-task ensemble strategy that combines repeated sampling with joint feature selection and classification. The method generates multiple training subsets and applies multi-task logistic regression with ℓ2,1 group sparsity regularization to select a subset of genes that appears consistently across tasks. This promotes stability and reduces redundancy. The framework supports integration with standard classifiers such as logistic regression and support vector machines. It performs both gene selection and classification in a single process. We evaluate the method on simulated and real gene expression datasets. The results show that it outperforms several baseline methods in classification accuracy and the consistency of selected genes.
Source link
Suli Lin www.mdpi.com

