1. Introduction
Predictive analytics and real-time decision-making made possible by the integration of ML, DL, and time series analysis (TSA) approaches in smart farming have the potential to completely transform agricultural processes. However, putting these cutting-edge technologies into practice presents several difficulties, including integrating disparate data sources, guaranteeing robustness to environmental unpredictability, and optimizing resource allocation. In the context of smart farming, this manuscript attempts to explore how well ML, DL, and TSA may improve crop output prediction, disease detection, and irrigation management while considering the unique challenges and limitations of agricultural contexts.
Some of the key contributions of this research manuscript are as follows:
An extensive overview of ML and DL algorithms in the context of smart farming, featuring a comprehensive classification of the current data within this field.
An exploration of applications in smart farming and an overview of the ML and DL approaches, including Linear Regression, Support Vector Machines (SVMs), and artificial neural networks (ANNs).
A rigorous overview of time series models using multi-head attention enabled by transformers.
2. Related Work
The integration of DL and ML technology has brought about a radical shift in precision agriculture, also known as smart farming, in recent years. With an emphasis on the use of ML and DL approaches, this literature review examines the state of research, developments, and applications in the field of smart farming.
2.1. Overview of Machine Learning Algorithms in the Context of Precision Farming
ANN and SVM are mostly used to estimate the quantity of berries per cluster in grapes. Several factors are included in growth analysis, and one of the most basic measurements is plant height. Plant breadth and the quantity of leaves per plant are other frequently used measurements. The goal of farming is yield, and a plentiful harvest can only be obtained if the crop is allowed to grow under the right conditions. A Knowledge and Data-Driven Model (KDDM) was presented as a descriptive tool for plants. It includes data on climate, growth media (e.g., fertilizer usage), and plant development metrics (e.g., biomass output and fruit setting).
where w is a weight vector, x is input vector, and b is bias.
where .
The Lagrangian function uses a new ‘slack variable’ denoted by . In the field of yield forecasting, the SVM kernel functions can help solve non-linear regression problems more successfully with accurate forecasting performance. However, there are times when the SVM regression approach matches the training set too closely, which is referred to as the “over-fitting problem”. Under these conditions, the forecasting performance of testing data is lower than that of training data. Another issue with SVMs is that, when compared with more interpretable methods like linear or logistic regression, SVM regression is significantly more challenging for farming domain specialists to understand. However, due to their improved performance, deep learning techniques are being used more and more in recent research on smart farming. Because CNNs are more effective instruments for image analysis and recognition, they have been used for these kinds of analyses.
2.2. Overview of Deep Learning Algorithms in the Context of Precision Farming
2.3. Overview of Time Series Analytical Approaches in the Context of Precision Farming
Time series analytical techniques are essential for offering insights into many facets of agricultural operations in the context of precision farming. Time series models that consider variables like weather, irrigation, and fertilization can be used to analyze historical data on agricultural yields. Future crop yields can be predicted with the use of predictive modeling approaches such as Autoregressive Integrated Moving Average (ARIMA) and machine learning algorithms. To forecast future weather conditions, time series analysis is used to examine historical weather data. With this information, farmers can organize their operations and lessen the effect of unfavorable weather events on crop production.
3. Principles and Methods
The well-known PRISMA standards, which outline how to gather and evaluate data from existing research, are followed in this study. A useful tool for assisting researchers in the compilation of reviews and meta-analyses is the PRISMA statement, a systematic framework consisting of 27 items in the form of a checklist. Every stage of the systematic review process is covered in detail in this part, which is in perfect harmony with the four core stages of the PRISMA approach. The detail of the PRISMA framework is explained below in the form of subsections: Identification Phase, Screening Phase, Eligibility Phase, Inclusion Phase, and PRISMA Overview, which gives a comprehensive grasp of the methodical methodology used in this investigation.
3.1. Identification Phase
The search query was modified to fit the syntax of each digital repository, and it included records indexed in the designated repositories (Web of Science (WoS) and Scopus) through 15 December 2024. To retrieve all possible variations of a given keyword, the wildcard symbol (“*”) was added to the end of some of the terms in the search string. Keywords from separate groups in the search string were connected using the boolean “AND” and keywords from the same group were connected using the boolean “OR”.
3.2. Screening Phase
3.3. Eligibility Phase
3.4. Inclusion Phase
The studies that satisfy all inclusion requirements and are incorporated into the SLR are represented by the inclusion phase. The same author manually conducted this screening.
3.5. PRISMA Overview
The Web of Science (WoS) database was used to gather published articles on the subject. The bibliometric analysis was conducted using the WoS database, specifically the database’s primary collection. More than 68 million records from 1900 to the present day are included in this database. By adding the field tags “agricultur*”, “farm*”, “crop”, “machine learning”, “deep learning”, “time series”, “application”, “implementation”, “case study”, and “precision” to the title (TIT) and abstract (ABS), we employed an advanced search. On 15 December 2024, the search was carried out. It contained the Emerging Sources Citation Index (ESCI), the Social Science Citation Index (SSCI), and the Science Citation Index—Expanded (SCI-E).
4. Impact of Machine Learning Algorithms on Crop Choice and Oversight
ML is used to impart knowledge to machines. By classifying data like training and testing, ML entails transferring knowledge into machines. After using training instances throughout the training phase, the program is used to obtain consistent results for fresh data. Testing cases are used to validate the model after training. Supervised and unsupervised learning are the two primary classifications of ML techniques. In supervised learning, program participants are under the supervision of a supervisor. Various supervised learning techniques are used, including DTs, SVM, KNN, hidden Markov models, Bayesian networks, identification distributions, and others.
Using an approach called unsupervised machine learning, a computer is fed a lot of data to look for patterns in it. Unsupervised methods aid in revealing hidden patterns within the data. Computer science and statistics are combined in machine learning to enhance prediction abilities. KNN, self-organizing maps, hierarchical clustering, partial-based clustering, and K-Means clustering are a few examples. Numerous features are included in historical data, such as pH, temperature, humidity, precipitation, phosphorus, potassium levels, wind speed, zinc, and organic carbon. The features may be in the form of binary or numerical in terms of category. ML is used by the irrigation system in several areas, including soil management, crop management on demand, plant disease detection, and crop quality management.
4.1. Generating a Comprehensible Decision Tree
4.2. Contribution Regarding Support Vector Machine
4.3. KNN Algorithm Based on Multi-Dimension Tree
5. Impact of Deep Learning Algorithms on Crop Choice and Oversight
In the research arena, there is a strong correlation between AI, ML, and AL. Within ML, which itself is a subset of AI, lies deep learning. Deep learning learns the data by utilizing hidden layers. ML techniques are appropriate for simple data problems. When dealing with complex or disorganized data, deep learning approaches are essential. In a DL architecture, the typical layers are input, hidden, and output. Examples of DL architectures include RNN, CNN, and others. The next subsections look at several DL algorithms that different researchers have suggested for crop management and selection.
5.1. Single-Layer Feedforward Approaches
The ELM algorithm’s fundamental steps are listed below:
In the first step, take the bias factor and random weight matrices.
The weight matrix and bias sizes are (j × k) and (1 × k), where j means how many hidden nodes there are and k means how many input nodes.
Ascertain the hidden layer’s output matrix. The output matrix of the first hidden layer is achieved by taking the transpose of the weight matrix and multiplying X, which stands for the training data.
Decide which feature to activate. It is allowed to use any activation function, such as RELU or SoftMax, and others.
Find the Moore–Penrose pseudoinverse. There are several methods available for computing the generalized inverse of H by Moore–Penrose. These methods could include, but are not limited to, orthogonal projection, iterative methods, orthogonal value decomposition, and orthogonalization (SVD).
Find the Moore–Penrose pseudoinverse and the output weight matrix beta, which is a function of the output.
5.2. Clustering Approach with Radial Basis Function
Since vectors that are adjacent to one another in Euclidean space should fall into the same neuron’s receptive field, K-Means clustering is used to locate the centers of hidden neurons. To set the number of cluster centers, select “K”. Select K randomly chosen locations from the dataset to act as the K centroids of the data. Determine the centroid in the dataset that is closest to each given position. For each centroid, find the average of all the points that are closest to it. Convert each centroid value to the corresponding average. The range of receptive fields is chosen so that the input vector’s domain is entirely enclosed by the neurons’ receptive fields. The value of sigma is found using the biggest “d” distance between two buried neurons. The K-Means clustering method is used to determine the locations of RBF centers. The rate of convergence of RBFs with a single hidden layer is significantly faster than that of multilayer perceptrons (MLPs). In low-dimensional data, RBF networks are generally preferred over MLP when deep feature extraction is not required and the results are directly related to the input vector component. RBFs are resilient learning models in contrast to most machine learning models. They are also universal approximators.
6. Overview of Time Series Analysis on Crop Choice and Oversight
Predicting any issue, event, or variable requires a thorough grasp of the elements influencing it; estimations of agricultural crop yield are no exception. India’s agricultural output is greatly influenced by several elements, such as enough rainfall, timely use of pesticides and fertilizers, a nice climate and environment, and farmer subsidies.
To predict output for the upcoming years, smart farming may employ RNN networks. Projected crop production will decrease food sufficiency in future years. Additionally, producers can utilize RNN networks to forecast agricultural prices and determine the profit or loss of a particular crop in the years to come.
Difficulties Encountered in the Analysis of Time Series Data
In agriculture, time series analysis plays a critical role in estimating future crop yield based on demand. One of the main issues with time series analysis is the overfitting of the data. Overfitting of time series models is a common occurrence; hence, managing it is an important effort. Managing missing data: Incorrect results will arise if time series data contains any missing values. Thus, the primary goal of data preparation is to prepare data that is devoid of missing values. Short-term forecasts can benefit from using the ARIMA model. Any long-term projections could have inaccurate outcomes. RNN can be applied to any long-term prediction. When utilizing RNN for time series, missing values do not really matter. Compared with ARIMA, the calculation cost for agriculture time series prediction utilizing RNN will be higher.
7. Discussion
The integration of AI into agricultural practices represents a paradigm shift in addressing global challenges such as food security, climate change, and resource optimization. This review highlights the transformative potential of AI-driven technologies, particularly ML, DL, and time series analysis, in advancing sustainable crop production. The findings underscore how these tools enhance decision-making, improve crop yield predictions, and enable precision agriculture. However, their adoption also raises critical questions about scalability, ethical considerations, and the balance between technological innovation and traditional farming practices.
One major obstacle to AI-driven solutions is still their scalability. Smallholder farmers, who make up a sizable share of the world’s agricultural workforce, frequently do not have access to high-speed internet, sophisticated equipment, or technical know-how. For instance, in distant areas with unstable electricity, DL models that require GPU clusters are not feasible. Prioritizing mobile-based platforms and lightweight models (such as edge AI and federated learning) is necessary to democratize AI.
Ethical issues are also quite important. When third-party AI companies aggregate farm-specific data, data privacy concerns surface, and farmers may be subject to abuse. Furthermore, traditional or organic farming methods may be marginalized by algorithmic bias, such as models that were mostly trained on data from industrialized farms. Frameworks must be established by policymakers to provide fair access to AI tools while preserving farmer autonomy.
7.1. Current Challenges
Even though ML and DL have significantly improved smart farming, a number of obstacles prevent its widespread use the following:
High-quality, labeled datasets are essential for ML and DL models. However, because of limited historical records, weather fluctuations, and sensor failures, data collection in smart farming is frequently uneven.
A lot of ML/DL models need a lot of processing power and cloud-based resources, which might be expensive and out of reach for small-scale farmers in isolated places.
Farmers frequently lack the technical know-how to comprehend intricate ML/DL models, which raises questions about adoption and confidence. Making decisions is made more difficult by the fact that DL models are black boxes.
Because of differences in climate, soil, and crop kinds, ML/DL models are frequently less effective in diverse agricultural settings than they are in controlled ones.
Cybersecurity risks, such as data breaches and sensor manipulation, are increased when IoT devices and cloud-based ML/DL systems are integrated into agriculture.
Standardized rules governing AI-driven smart farming are lacking, especially when it comes to data ownership, bias in AI models, and moral issues with automated decision-making.
7.2. Future Development Directions
Future Development Directions To overcome these challenges and enhance the impact of ML and DL in smart farming, future research and development efforts should focus on the following:
Developing better interpretable ML/DL models that offer farmers clear and useful insights would boost use and trust in Explainable AI (XAI).
Improving Federated Learning and Edge Using edge computing with federated learning to implement on-device processing would lessen reliance on cloud computing while maintaining privacy and enabling real-time decision-making.
In a variety of farming scenarios, combining data from satellites, drones, Internet of Things sensors, and historical records can increase the accuracy and resilience of ML/DL forecasts.
DL architectures that are optimized for low-power devices will allow for wider deployment, particularly in areas with limited resources.
Strong encryption, blockchain technology, and anomaly detection techniques can be used to improve data security and stop illegal access to agricultural systems.
Responsible adoption of AI-driven smart farming depends on the establishment of international regulatory standards that guarantee sustainability, equity, and moral AI use.
8. Conclusions
The agricultural sector has undergone significant transformation through technological advancement, with machine learning (ML) emerging as a critical enabler for optimizing crop management and selection. This systematic review evaluates ML methodologies applied in agricultural research between 2010 and 2023, highlighting prevalent techniques such as SVM, KNN, fuzzy neural networks, Autoregressive Integrated Moving Average (ARIMA), decision trees, ensemble learning, and random forests. Each approach exhibits distinct advantages and limitations, prompting researchers to increasingly adopt hybrid frameworks that integrate multiple ML or deep learning architectures to enhance predictive accuracy and operational efficiency.
To ensure methodological rigor, this review adhered to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework, employing a structured protocol for the identification, screening, and inclusion of relevant studies. Additionally, the analysis explored methodologies in Natural Language Processing (NLP) for agricultural data interoperability and evaluated Python-based web development frameworks (V 3.10) for deploying scalable model interfaces.
By synthesizing these insights, this review serves as a valuable resource for researchers seeking to advance precision agriculture. It provides a foundation for developing integrated ML architectures tailored to crop optimization, refining multilingual data translation systems, and designing user-centric web applications for real-time agricultural decision support. The findings underscore the potential for interdisciplinary innovation, encouraging further exploration of synergistic models to address evolving challenges in sustainable crop management.
Source link
Zulfiqar Ali www.mdpi.com