Bioengineering, Vol. 12, Pages 1245: A Multi-Task Ensemble Strategy for Gene Selection and Cancer Classification


Bioengineering, Vol. 12, Pages 1245: A Multi-Task Ensemble Strategy for Gene Selection and Cancer Classification

Bioengineering doi: 10.3390/bioengineering12111245

Authors:
Suli Lin
Zhizhe Lin
Jin Zhang
Man-Fai Leung

Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small number of samples. As a result, selecting informative genes is necessary to improve classification performance and model interpretability. Many existing gene selection methods fail to produce stable and consistent results, especially when training data are limited. To address this, we propose a multi-task ensemble strategy that combines repeated sampling with joint feature selection and classification. The method generates multiple training subsets and applies multi-task logistic regression with ℓ2,1 group sparsity regularization to select a subset of genes that appears consistently across tasks. This promotes stability and reduces redundancy. The framework supports integration with standard classifiers such as logistic regression and support vector machines. It performs both gene selection and classification in a single process. We evaluate the method on simulated and real gene expression datasets. The results show that it outperforms several baseline methods in classification accuracy and the consistency of selected genes.



Source link

Suli Lin www.mdpi.com