Dual indexing and mutual summation based keyword search method for XML databases

XML has become the most common means for publication, storage and exchange of data over the Internet. As a result, a huge amount of information is stored and represented in XML, and research on keyword search in XML documents is on the increase as it allows users to find information they are interes...

Full description

Saved in:
Bibliographic Details
Main Author: Sethuramalingam, Selvaganesan
Format: Thesis
Published: 2014
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:XML has become the most common means for publication, storage and exchange of data over the Internet. As a result, a huge amount of information is stored and represented in XML, and research on keyword search in XML documents is on the increase as it allows users to find information they are interested in without having to know the underlying database schema or complex query language. In XML keyword search, the accurate identification of user search intention and ranking of the result in the presence of keyword ambiguities have been challenging problems. In this thesis, we propose a XML keyword search using Dual indexing and Mutual summation based Algorithm (XDMA) to address the problems in XReal and other XML keyword search approaches. Our proposed approach builds dual indices, namely, tag information table and data node information table for structural node and data node in XML database respectively. Moreover, we propose a keyword search technique to select all possible T-typed nodes for a given query using the two-level matching between the two indices. Using this search technique, keywords in a given query can be identified and distinguished as tags or data while searching for an input query. Furthermore, another new keyword ambiguity, Ambiguity 4, i.e., A keyword can exist as the name of a tag for node types having different data (text) values and vice versa, is identified and addressed in this thesis. Subsequently, a new concept called mutual summation is proposed for a pair of random variables. By incorporating the concept of dependence of two indices and the concept of mutual summation, we define the Mutual Score (MScore) between selected tags and query keywords to find the desired node of type T.