Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia

The use of predictive model is useful for preventing and controlling disease out-break. This can be done by analysing weather behavior in relation to disease occurrence. In Malaysia, leptospirosis disease is the one of the higher number of cases that reported for past 7 years, and the absence of und...

Full description

Saved in:
Bibliographic Details
Main Author: Rahmat, Mohamad Fariq
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104249/1/MOHAMAD%20FARIQ%20BIN%20RAHMAT%20-%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.104249
record_format uketd_dc
spelling my-upm-ir.1042492023-07-25T01:58:50Z Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia 2019-11 Rahmat, Mohamad Fariq The use of predictive model is useful for preventing and controlling disease out-break. This can be done by analysing weather behavior in relation to disease occurrence. In Malaysia, leptospirosis disease is the one of the higher number of cases that reported for past 7 years, and the absence of understanding and modelling studies that allows development of an early warning system. In this study, predictive model is developed using machine learning to capture the relation between weather variables such as temperature, sum of rainfall, and relative humidity, and Leptospira occurrence. The aim of this study is to predict the occurrence of Leptospirosis in Seremban district using a machine learning and meteorological data as input. The first objective of the study is to investigate the best time lags for each weather variable using feature selection methods. The second objective is to develop, train and test a neural network model for disease prediction based on the selected features. Feature selection was conducted using two methods: firstly, though correlation analysis, and secondly through graphical and non-graphical Exploratory Data Analysis (EDA). The neural network model is developed using Backpropagation training, optimizing the number of hidden layers and hidden nodes. The success is measured using accuracy, sensitivity, and specificity of the model. Correlation analysis has shown that Seremban district has higher correlation with disease occurrence when sum of rainfall at lag 4 until 16 weeks and temperature at lag 1 week, while by using EDA has shown Seremban can have high correlation with leptospirosis occurrence when the temperature at lag 16 weeks and sum of rainfall at lag 12 until 20 weeks. This study also shown the predictive model can achieve high accuracy between 80% to 84% when the input variables were following the feature selection that have been made by EDA and the number of hidden neurons is 10. In conclusion, this study is able to show the trend of the environmental variable in predicting the leptospirosis occurrence at different time lag. Besides, by having this predictive model, it helps the public health not only to predict the occurrence of the disease, but it can prevent from the outbreak start to spread to the community by giving the early warning based on the weather status in future. Imaging systems in meteorology Meteorological instruments - Malaysia Leptospirosis 2019-11 Thesis http://psasir.upm.edu.my/id/eprint/104249/ http://psasir.upm.edu.my/id/eprint/104249/1/MOHAMAD%20FARIQ%20BIN%20RAHMAT%20-%20IR.pdf text en public masters Universiti Putra Malaysia Imaging systems in meteorology Meteorological instruments - Malaysia Leptospirosis Ishak, Asnor Juraiza
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Ishak, Asnor Juraiza
topic Imaging systems in meteorology
Meteorological instruments - Malaysia
Leptospirosis
spellingShingle Imaging systems in meteorology
Meteorological instruments - Malaysia
Leptospirosis
Rahmat, Mohamad Fariq
Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
description The use of predictive model is useful for preventing and controlling disease out-break. This can be done by analysing weather behavior in relation to disease occurrence. In Malaysia, leptospirosis disease is the one of the higher number of cases that reported for past 7 years, and the absence of understanding and modelling studies that allows development of an early warning system. In this study, predictive model is developed using machine learning to capture the relation between weather variables such as temperature, sum of rainfall, and relative humidity, and Leptospira occurrence. The aim of this study is to predict the occurrence of Leptospirosis in Seremban district using a machine learning and meteorological data as input. The first objective of the study is to investigate the best time lags for each weather variable using feature selection methods. The second objective is to develop, train and test a neural network model for disease prediction based on the selected features. Feature selection was conducted using two methods: firstly, though correlation analysis, and secondly through graphical and non-graphical Exploratory Data Analysis (EDA). The neural network model is developed using Backpropagation training, optimizing the number of hidden layers and hidden nodes. The success is measured using accuracy, sensitivity, and specificity of the model. Correlation analysis has shown that Seremban district has higher correlation with disease occurrence when sum of rainfall at lag 4 until 16 weeks and temperature at lag 1 week, while by using EDA has shown Seremban can have high correlation with leptospirosis occurrence when the temperature at lag 16 weeks and sum of rainfall at lag 12 until 20 weeks. This study also shown the predictive model can achieve high accuracy between 80% to 84% when the input variables were following the feature selection that have been made by EDA and the number of hidden neurons is 10. In conclusion, this study is able to show the trend of the environmental variable in predicting the leptospirosis occurrence at different time lag. Besides, by having this predictive model, it helps the public health not only to predict the occurrence of the disease, but it can prevent from the outbreak start to spread to the community by giving the early warning based on the weather status in future.
format Thesis
qualification_level Master's degree
author Rahmat, Mohamad Fariq
author_facet Rahmat, Mohamad Fariq
author_sort Rahmat, Mohamad Fariq
title Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_short Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_full Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_fullStr Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_full_unstemmed Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_sort feature selection methods based on meteorological data for prediction of leptospirosis occurrence in seremban, malaysia
granting_institution Universiti Putra Malaysia
publishDate 2019
url http://psasir.upm.edu.my/id/eprint/104249/1/MOHAMAD%20FARIQ%20BIN%20RAHMAT%20-%20IR.pdf
_version_ 1776100425286549504