Symmetry, Vol. 17, Pages 1536: Strategic Sample Selection in Deep Learning: A Case Study on Violence Detection Using Confidence-Based Subsets
Symmetry doi: 10.3390/sym17091536
Authors:
Francisco Primero Primero
Daniel Cervantes Ambriz
Roberto Alejo Eleuterio
Everardo E. Granda Gutiérrez
Jorge Sánchez Jaime
Rosa M. Valdovinos Rosas
Automated violence detection in images presents a technical and scientific challenge that demands specialized methods to enhance classification systems. This study introduces an approach for automatically identifying relevant samples to improve the performance of neural network models, specifically DenseNet121, with a focus on violence classification in images. The proposed methodology begins with an initial training phase using a balanced dataset (DS1, 6000 images). Based on the model’s output scores (outN), three confidence levels are defined: Safe (outN≥0.9+σ or outN≤0.1−σ), Border (0.5−σ≤outN≤0.5+σ), and Average (0.4−σ≤outN≤0.6+σ). These levels correspond to scenarios with low, moderate, and high prediction error probabilities, respectively, where σ is an adjustable threshold. The Border subset exhibits symmetry around the decision boundary (outN=0.5), capturing maximally uncertain samples, while the Safe regions reflect functional asymmetries in high-confidence predictions. Subsequently, these thresholds are applied to a second dataset (DS2, 5600 images) to extract specialized subsets for retraining (DSSafe, DSBorder, and DSAverage). Finally, the model is evaluated using an independent test set (DStest, 4400 images), ensuring complete data isolation. The experimental results demonstrate that the confidence-based subsets offer competitive performance despite using significantly fewer samples. The Average subset achieved an F1-Score of 0.89 and a g-mean of 0.93 using only 20% of the data, making it a promising alternative for efficient training. These findings highlight that strategic sample selection based on confidence thresholds enables effective training with reduced data, offering a practical balance between performance and efficiency when symmetric uncertainty modeling is exploited.
Source link
Francisco Primero Primero www.mdpi.com