Comparison and fusion of retrieval schemes based on different structures, similarity measures and weighting schemes
Many retrieval models and techniques can be applied to retrieve theses that are most relevant to certain queries or concepts. It has been found that different retrieval methods often retrieve different sets of relevant documents. It is therefore anticipated that a particular retrieval method will us...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2006
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/4067/1/MohammedSalemFaragWahlanMFSKSM2006.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Many retrieval models and techniques can be applied to retrieve theses that are most relevant to certain queries or concepts. It has been found that different retrieval methods often retrieve different sets of relevant documents. It is therefore anticipated that a particular retrieval method will usually retrieve some relevant theses not retrieved by other methods. Therefore in this study, different methods are used in the theses retrieval, based on different thesis structures, different similarity measures and different weighting schemes. The theses used in this study are collected from FSKSM postgraduate library. Many operations have been applied on the collected theses such as digitizing, stop words removal, stemming and building index. The results from these operations are stored in a database. In this study, 85 theses and 30 queries are used. The comparisons between query and theses were made using five similarity measures with seven weighting schemes using different thesis structures. The results show that the use of bibliography gives poorer results compared to the use of title and abstract alone. In the weighting schemes combinations, the results show that weighting schemes using Cosine and Tanimoto perform well individually but did not do well in the combinations and weighting schemes using Forbes and Russell similarity measures do not do well individually but did well in the combination. In the similarity measures combinations, the results show that the best combination was Cosine using LTU weighting scheme with Russell using LOGG weighting scheme using title structure but using abstract structure, the best combination was Cosine using TFIDF weighting scheme with Forbes using ATFA weighting scheme but it has less performance than the combination of Cosine using LTU weighting scheme with Russell using LOGG weighting scheme using title structure. The overall results show that the best thesis structure is title and the best similarity measure is Cosine with LTU weighting scheme. |
---|