Skyline query approaches in static and dynamic incomplete databases /
Skyline queries attempts to return the superior data items from a database which are not being dominated by any other data items. In some real-world databases where data might not be complete, i.e. data items often have missing values in one or more dimensions, applying skyline algorithms designed f...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
Kuala Lumpur :
Kulliyyah of Information and Communication Technology, International Islamic University Malaysia,
2018
|
Subjects: | |
Online Access: | Click here to view 1st 24 pages of the thesis. Members can view fulltext at the specified PCs in the library. |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Skyline queries attempts to return the superior data items from a database which are not being dominated by any other data items. In some real-world databases where data might not be complete, i.e. data items often have missing values in one or more dimensions, applying skyline algorithms designed for complete data is inappropriate due to the fact that missing values leads to losing transitivity property of skyline method which would raise the issue of cyclic dominance. Nevertheless, several research works have been conducted focusing on the issue of processing skyline queries in static incomplete database, whereby data rarely changes. However, an efficient approach is yet to be proposed aiming at reducing the number of pairwise comparisons to identify skylines in static incomplete database. Moreover, the problem might be more worsen in cloud environment whereby database relations are spread over many datacentres and remote access is needed to identify the skylines. Collecting data from these remote datacentres without prior filtration is undesirable and results in transferring unnecessary large amount of data. In addition, in some real-life database systems, database might be under frequent update operations such as insert. This insert operation keeps the content of the database to be dynamic in which the contents always updated with new data items. This insert operation will definitely result in invalidating the skyline results and, therefore, re-evaluating skylines on the updated database must be performed to identify the new skyline answer. Re-applying skyline method on the entire updated database is impractical and leads to prohibitive cost due to the exhaustive pairwise comparisons. This thesis proposes an efficient skyline approach which is able to derive the skylines in a database with incomplete data. The proposed approach exploits the idea of sorting and clustering data as well as employs the concept of generating domination power (dp) to eliminate the dominated data items which in turn reduces the number of pairwise comparisons to identify the skylines. Two optimization techniques have been used in the proposed approach to eliminate many dominated data items before applying skyline technique. The approach is extended to process skyline queries in cloud incomplete databases in which database relations are distributed among many remote datacentres. The approach attempts to identify the skylines with the purpose of reducing the pairwise comparisons, processing time and amount of data transfer. In addition to that, in this thesis, an approach is proposed to process skyline queries in incomplete database with update operation (insert). The approach tries to derive the new skylines after inserting data items into the database with the goal to reduce the search space by avoiding re-evaluating skylines on the entire database. The idea of the proposed approach relies on performing a progressive scan on a subset of a database to re-identify the skylines based on the new contents. Many experiments on synthetic and real datasets have been accomplished. The results showed that our proposed approach for processing skyline queries in incomplete database has reduced the number of pairwise comparisons and the processing time compared to the previous approaches. Besides, for cloud databases, our approach for processing skyline queries achieved a significant reduction in the processing time and network cost compared to the previous approaches. Lastly, the results for processing skyline queries on incomplete database with insert operation have shown that our approach outperforms the previous approaches in terms of number of pairwise comparisons and processing time. |
---|---|
Physical Description: | xvii, 189 leaves : illustrations ; 30cm. |
Bibliography: | Includes bibliographical references (leaves 178-189). |