An approach for matching relational database schemas

Database schema integration aims at providing a uniform and consistent view called global schema, over a set of autonomous and heterogeneous data sources, so that data residing in different sources can be accessed as if it was in a single schema. Schema matching is the most crucial phase in schema i...

Full description

Saved in:
Bibliographic Details
Main Author: Karasneh, Yaser Mohammad
Format: Thesis
Language:English
Published: 2011
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/26989/1/FSKTM%202011%2023R.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Database schema integration aims at providing a uniform and consistent view called global schema, over a set of autonomous and heterogeneous data sources, so that data residing in different sources can be accessed as if it was in a single schema. Schema matching is the most crucial phase in schema integration that needs considerable attention as the outcomes from this phase influence the correctness and completeness of the integrated schemas (global schemas). Manually specifying schema matches is a tedious, time consuming, error-prone, and therefore expensive process, which is a growing problem given the rapidly increasing number of data sources to integrate. Thus, automating this process, which attempts to achieve faster and less labor-intensive, has been one of the main tasks in schema integration. Although several solutions have been proposed, but they are still limited, as they do not explore most of the available information related to schemas and thus affect the result of integration. This thesis presents an approach for matching heterogeneous relational databases’ schemas that utilizes most of the information related to schemas. Our solution takes into consideration both the structural and semantic heterogeneities and offers data/schema integration without user intervention. Six matchers have been introduced, namely (i) Name of the Databases’ Schemas Matcher (NDSM), (ii) Relation Schema Matcher (RSM), (iii) Attribute Name Matcher (ANM), (iv) Data Type Matcher (DTM), (v) Constraint Matcher (CM), and (vi) Instance Data Matcher (IDM). Matching the databases’ schemas based on the name of databases’ schemas, the name of relation schemas and the name of attributes are accomplished using two methods, namely: n-gram and synonym. Besides, our solution is domain independent as it does not rely on any specific rules of a particular domain and hence a predefined knowledge of the domain is not required. This thesis also shows that the produced integrated schemas (global schema) maintained the properties of the initial input schemas and also the characteristics of the relational model. Our approach achieved P with 91%, R with 84%, and F with 88% for the biomedical domain and P with 82%, R with 70%, and F with 76% for the hospital domain which is the highest percentage gained compared to when less elements are considered during the matching process.