Comparative study between regular expression and google similarity index for instance based schema matching /

Schema matching is considered as one of the essential phases of database integration. The aim of the schema matching process is to identify the correlation between Schemas which help later in the data integration process. The main issue concern during schema matching is how to support the merging de...

全面介紹

Saved in:
書目詳細資料
主要作者: Alzeber, Mogahed
格式: Thesis
語言:English
出版: Gombak, Selangor : Kulliyyah of Informaton and Communication Technology, International Islamic University Malaysia, 2016
主題:
在線閱讀:http://studentrepo.iium.edu.my/handle/123456789/5650
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Schema matching is considered as one of the essential phases of database integration. The aim of the schema matching process is to identify the correlation between Schemas which help later in the data integration process. The main issue concern during schema matching is how to support the merging decision by providing the correspondence between attributes through syntactic and semantic heterogeneous in data sources. There have been a lot of attempts in the literature toward utilizing database instances to detect the correspondence between attributes during schema matching process. Many schema matching approaches based on instances have been proposed aiming at improving the accuracy of the matching process. We observed that no single technique managed to provide accurate matching for different types of data. In other words, some of the techniques treat numeric values as strings. This will negatively influence the process of discovering the match and further on the quality of match results. Similarly, other techniques treat textual instance, as numeric, and this will also impact the quality of the match result. Thus, a comparative study between syntactic and semantic techniques is needed. The study should emphasize on analyzing these techniques deeply in order to determine the strengths and weaknesses of each technique. This thesis aims at developing two schema matching techniques, namely: (i) regular expression and (ii) Google similarity to identify the match between attributes for numeric, alphabetic and mix instances. Furthermore, comparing these techniques and evaluate their performance empirically. Several analyses have been conducted on real and synthetic datasets to evaluate the performance of the schema matching techniques considered in this thesis with respect to Precision (P), Recall (R) and F-Measure.
實物描述:xi, 121 leaves : ill. ; 30cm.
參考書目:Includes bibliographical references (leaves 114-121).