Applications of Machine Learning Algorithms in Agriculture
Abstract
Bu çalışmada bir tarım veri seti üzerinde yaygın olarak kullanılan makine öğrenmesi algoritmalarından en doğru tahmin üreten model tespiti yapılmıştır. Tespit edilen model ile tarımsal ürünlerin toprak çeşitleri için daha verimli ürün seçimi veya alternatif yetiştirilebilecek tarım ürünleri konusunda kullanıcıya tavsiye veren bir tahminleme yazılımı yapılmıştır. Topraktaki azot, potasyum, fosfor, nem, ph, sıcaklık ortalaması, yağış miktarı ortalaması gibi özellikleri içeren 22 farklı tarım ürününün bulunduğu açık kaynak lisanslı bir veri seti kullanılmıştır. Sınıflama işlemini Orange yazılımı kullanarak destek vektör makineleri, naive bayes sınıflandırıcı, karar ağaçları, adaboost, olasılıksal dereceli azalma, lojistik regresyon, çok katmanlı algılayıcı, k-en yakın komşu ve rastgele orman algoritmaları modellenmiştir. Bu dokuz farklı modelin Orange yazılımı kullanılarak eğitim ve test süresi, AUC değeri, doğruluk skoru, F1 skoru, kesinlik değerleri, hassasiyet değerleri, kayıp oranları ve özgüllük değerleri hesaplanmıştır. Bu sonuçlara göre rastgele orman en iyi model olarak tespit edilmiştir. Rastgele orman modelinde %99.5 doğruluk oranı, F1 skoru 0.995, AUC değeri 1.000, kesinlik değeri 0.995, hassasiyet değeri 0.995, özgüllük değeri olarak rastgele orman 1.000 en başarılı algoritma olarak tespit edilmiştir. Buna karşın Naive bayes modeli en kısa sürede sınıflama işlemini yapan model olarak saptanmıştır. Kayıp oranı olarak en az skor 0.048 ile lojistik regresyon modelinde tespit edilmiştir. Tarımsal faaliyetlerle ilgilenenlerin veya araştırmacıların kullanabileceği python dili ve kütüphaneleri kullanılarak rastgele orman modeli ile basit bir ara yüze sahip ürün tahminleme uygulaması yazılmıştır.
In this study, the model that produces the most accurate prediction from the widely used machine learning algorithms on an agricultural data set was determined. With the determined model, a prediction software has been developed that advises the user on the selection of agricultural products more efficient for the soil types or on alternative agricultural products that can be grown. An open source licensed data set containing 22 different agricultural products, including properties such as nitrogen, potassium, phosphorus, moisture, pH, average temperature, average precipitation in the soil, was used. Support vector machines, naive bayes classifier, logistic regression, decision trees, adaboost, stochastic gradient descent, multilayer perceptron, k-nearest neighbor and random forest algorithms were modeled using Orange software. Training and testing time, AUC value, accuracy score, F1 score, precision values, sensitivity values, loss rates and specificity values were calculated using Orange software of these nine different models. According to these results, random forest was determined as the best model. In the random forest model, 99.5% accuracy rate, F1 score 0.995, AUC value 1.000, precision value 0.995, sensitivity value 0.995, random forest 1.000 as specificity value were determined as the most successful algorithm. On the other hand, the naive bayes model was determined as the model that performs the classification process in the shortest time. The least score as the loss rate was determined in the logistic regression model with 0.048. A product prediction application with a simple interface was written with the random forest model using the python language and libraries that can be used by those interested in agricultural activities or researchers.
In this study, the model that produces the most accurate prediction from the widely used machine learning algorithms on an agricultural data set was determined. With the determined model, a prediction software has been developed that advises the user on the selection of agricultural products more efficient for the soil types or on alternative agricultural products that can be grown. An open source licensed data set containing 22 different agricultural products, including properties such as nitrogen, potassium, phosphorus, moisture, pH, average temperature, average precipitation in the soil, was used. Support vector machines, naive bayes classifier, logistic regression, decision trees, adaboost, stochastic gradient descent, multilayer perceptron, k-nearest neighbor and random forest algorithms were modeled using Orange software. Training and testing time, AUC value, accuracy score, F1 score, precision values, sensitivity values, loss rates and specificity values were calculated using Orange software of these nine different models. According to these results, random forest was determined as the best model. In the random forest model, 99.5% accuracy rate, F1 score 0.995, AUC value 1.000, precision value 0.995, sensitivity value 0.995, random forest 1.000 as specificity value were determined as the most successful algorithm. On the other hand, the naive bayes model was determined as the model that performs the classification process in the shortest time. The least score as the loss rate was determined in the logistic regression model with 0.048. A product prediction application with a simple interface was written with the random forest model using the python language and libraries that can be used by those interested in agricultural activities or researchers.
Description
Keywords
Ziraat, Agriculture
Turkish CoHE Thesis Center URL
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
105