An approximate functional dependencies (AFD) based approach to improve skyline queries computation and missing values estimation of skylines on crowdsourced-enabled incomplete database /

Data incompleteness becomes a frequent phenomenon in contemporary non-trivial database applications such as web autonomous databases, incomplete databases, big data and crowd-sourced mobile databases. Processing queries over these incomplete databases impose several challenges that negatively influe...

Full description

Saved in:
Bibliographic Details
Main Author: Swidan, Marwa Behjat (Author)
Format: Thesis
Language:English
Published: Kuala Lumpur : Kulliyyah of Information and Communication Technology, International Islamic University Malaysia, 2021
Subjects:
Online Access:http://studentrepo.iium.edu.my/handle/123456789/10725
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data incompleteness becomes a frequent phenomenon in contemporary non-trivial database applications such as web autonomous databases, incomplete databases, big data and crowd-sourced mobile databases. Processing queries over these incomplete databases impose several challenges that negatively influence processing the queries. Most importantly, the query results derived from incomplete databases are also incomplete as certain values of the query result are not present. Result incompleteness may lead to misguiding the user in multi-criteria decision-making and decision support systems. Skyline queries are one of the most prominent queries applied over these recommendation and decision-making systems. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible substitute values using the crowd resources. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced platform incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result according to the user's preferences. Thus, an efficient and cost-effective approach for estimating the missing values of the skylines on crowd-sourced enabled incomplete databases is necessary which is achieved by exploiting the available data and the implicit relationships in the database before referring to the crowd is needed. This thesis proposes a new approach for estimating the missing values of the skylines over incomplete databases. The approach attempts to eliminate the unwanted tuples from the initial incomplete database using data filtration to simplify the value estimation process. Furthermore, the approach utilizes the remaining data and exploits the implicit relationships between the attributes to impute the missing values of the skylines. The approach employs the principle of mining attribute correlations to generate a set of approximate functional dependencies (AFDs) that assist in generating the estimated values. Also, the proposed approach aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Certain factors that influence the data processing such as monetary cost, time latency and accuracy are considered when working on the crowd-sourced platform to estimate the missing values of the skylines. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches. The proposed approach simplifies the process of missing value estimation for the skylines with a total reduction of up to 80% in the number of the values to be considered for the estimation in the initial incomplete database. Furthermore, the experimental results have also shown that the proposed solution has achieved the lowest relative error rate between the real missing and the estimated values in comparison with the other recent approach. Most importantly, our proposed strategy is capable of estimating up to 40% of the total missing values with accuracy up to 90% by exploiting the available data in the initial incomplete database. Lastly, the results of the experiments have also demonstrated that our approach has significantly decreased the monetary cost and the time latency involved when estimating the missing values of the skylines using crowd-sourced databases.
Physical Description:xv, 163 leaves : colour illustrations ; 30cm.
Bibliography:Includes bibliographical references (leaves 152-162).