Comparative study between regular expression and google similarity index for instance based schema matching /

Schema matching is considered as one of the essential phases of database integration. The aim of the schema matching process is to identify the correlation between Schemas which help later in the data integration process. The main issue concern during schema matching is how to support the merging de...

Full description

Saved in:
Bibliographic Details
Main Author: Alzeber, Mogahed
Format: Thesis
Language:English
Published: Gombak, Selangor : Kulliyyah of Informaton and Communication Technology, International Islamic University Malaysia, 2016
Subjects:
Online Access:http://studentrepo.iium.edu.my/handle/123456789/5650
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Schema matching is considered as one of the essential phases of database integration. The aim of the schema matching process is to identify the correlation between Schemas which help later in the data integration process. The main issue concern during schema matching is how to support the merging decision by providing the correspondence between attributes through syntactic and semantic heterogeneous in data sources. There have been a lot of attempts in the literature toward utilizing database instances to detect the correspondence between attributes during schema matching process. Many schema matching approaches based on instances have been proposed aiming at improving the accuracy of the matching process. We observed that no single technique managed to provide accurate matching for different types of data. In other words, some of the techniques treat numeric values as strings. This will negatively influence the process of discovering the match and further on the quality of match results. Similarly, other techniques treat textual instance, as numeric, and this will also impact the quality of the match result. Thus, a comparative study between syntactic and semantic techniques is needed. The study should emphasize on analyzing these techniques deeply in order to determine the strengths and weaknesses of each technique. This thesis aims at developing two schema matching techniques, namely: (i) regular expression and (ii) Google similarity to identify the match between attributes for numeric, alphabetic and mix instances. Furthermore, comparing these techniques and evaluate their performance empirically. Several analyses have been conducted on real and synthetic datasets to evaluate the performance of the schema matching techniques considered in this thesis with respect to Precision (P), Recall (R) and F-Measure.
Physical Description:xi, 121 leaves : ill. ; 30cm.
Bibliography:Includes bibliographical references (leaves 114-121).