Computational Methods for Information Processing from Natural Language Complaint Processes—A Systematic Review


Typically learn from data from which a wide number of possibilities are derived. Singh and Saha [28] present a method for commerce that seeks to benefit from social networks and shopping websites. They suggest that complaints are usually worked from text, but that advantage can be taken from mixed codes. The authors manually annotate classes such as complaint, emotion, or sentiment from the Product Review dataset, which is a CORPUS of mixed-language complaints consisting of 3711 annotated instances. Then they develop a framework based on Graph Attention Network (GAT) and adding self-attention layers to perform complaint detection (main task), sentiment classification, and emotion recognition simultaneously. They obtained a precision of 72.82% and a Macro-F1 of 71% in the complaint detection task. Alamsyah et al. [27] present a method to classify one million complaints to the dependencies (five dependencies) of a bank in Indonesia. The authors perform text preprocessing, including use of the TF-IDF algorithm, and then use a Convolutional Neural Network to perform the classification. The results show that it achieves 85% Accuracy, although the authors acknowledge that the system is yet to be implemented in a real environment. Hsu et al. [30] present a method to analyze the chief complaints of preschool children to detect influenza-like illnesses to help with physician diagnosis and act quickly in the face of an outbreak. The authors use Deep Learning tools, especially the BERT algorithm, to classify texts. It obtained an Accuracy of 72.87%. Assaf and Srour [32] present a method to analyze occupant complaints in 16 buildings (approximately 6000) and try to forecast thermal complaints as a strategy for predictive maintenance of facilities. The authors used the multilayer perceptron model. Fan et al. [33] present a method consisting of a Deep Cross Domain Network (DCDN), which takes water pollution complaints and classifies whether the complaint has bad intentions or not. They first use the LSTM method to extract the domain features, then the self-attenuation mechanism fuses the shared domain features and private domain features so that finally the multilayer perceptron generates the classification result. They use the Python programming language. Singh et al. [36] present a system for identifying complaints and classifying sentiments. They label a CORPUS with sentiments according to the text written by users; they can be positive, negative, or neutral. Then they use Deep Learning tools, among them AffectiveSpace 2, to determine if the text is a complaint or a sentiment, finding that there is a correlation between these two variables. They obtain an Accuracy of 83.63% and a Macro-F1 score of 81.9% for the complaint identification task. Fan et al. [41] propose an annotation-based text classification method for environmental complaint reporting. They first use a small amount of labeled data to establish the CORPUS of cell vocabulary. Then, the cell vocabulary was expanded into the CORPUS of the pre-trained model. Finally, the TextCNN model was trained to perform automatic labeling and classification of the complaint text. Tong et al. [42] present a method of classifying complaint text from the web based on a character-level Convolutional Neural Networks (CNN). The authors remove negative elements using lexicons, then perform character embedding to encode the characters, then perform feature extraction to reduce dimensionality, and finally classify by means of a convolutional network. Luo et al. [43] take short texts from the 12,345 line of Haikou city to perform text classification given the number of calls. The authors perform experiments to compare FastText, TextCNN, TextRNN, and RCNNN technologies and conclude that the best technology in the experiments was TextCNN. Chen et al. [44] present a method that takes complaints from a tourism page, calculates word frequency, and applies the LDA theme model (Bayesian model) for classification of complaints into their respective categories to contribute to complaint management. Shin et al. [45] present a method for indoor water leakage management. They apply machine learning (ML) to predict the spatial distribution of customer complaints, specifically using the XGBoost and LightGBM models. The authors mention that their tool can contribute to decision-making. Zhong et al. [38] present a method for building quality complaints, which should be classified and resolved quickly. They use Convolutional Neural Networks (CNN) to capture semantic features in texts and then perform automatic classification of the writings into predefined categories.
The authors conclude that compared to support vector machine and Bayes-based classifiers, CNNs perform better. Wang et al. [47] present a method that processes written air pollution complaints in Beijing in the years 2019 and 2020. The authors extract names and addresses of geographical points, as well as times and types of complaints, using Bidirectional Encoder Representations from Transformers (BERT) plus Conditional Random Fields (CRF). They then perform filtering operations and manage to create heat maps to know the most polluted areas more accurately in Beijing to address emergencies more quickly. Chen et al. [51] present an intelligent government complaint prediction method to respond to citizen complaints through Machine Learning (ML) technologies. The system collects complaints and integrates them since it performs label correction to refine the labels and, in some cases, unifies them into one category. With the refined data, the central server processes solutions to the complaints through classification algorithms. The authors mention that their major contribution is to apply text classification, as well as label correction, to better train the classification method.

The advantage of Machine Learning methods are that they are capable of learning from large amounts of linguistic data, thus recognizing the relationships between words, phrases, and sentences in texts without the need for explicit rules. The disadvantage of this type of method is that it is not understandable by humans, which makes it difficult to diagnose the reason for false positives or negatives in the developed systems; this implies increasing the effort in the construction of the training sets.



Source link

J. C. Blandón Andrade www.mdpi.com