Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan

Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately transl...

Full description

Saved in:
Bibliographic Details
Main Author: Azizan, Azilawati
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/78545/1/78545.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uitm-ir.78545
record_format uketd_dc
spelling my-uitm-ir.785452023-05-29T04:20:16Z Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan 2022 Azizan, Azilawati Programming. Rule-based programming. Backtrack programming Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately translate their information needs into correct query words or phrases for a search engine to utilize. This becomes harder when users search for domain-specific information as, in most cases, users are unable to identify the keywords that are appropriate for the domain in the search query. As such, the search engine is unable to locate the relevant documents. This causes users to reformulate the query multiple times in the hopes of retrieving a more relevant set of search results. To address this issue, many researchers propose the use of query reformulation, query refinement, query expansion, or query disambiguation to intentionally build better queries and retrieve more relevant results. However, most of strategies employed to tackle this issue; such as the query log, rhetorical structure, thesaurus, WordNet, ontology, and user profiles; require extensive sources, risky and are time consuming. Therefore, more effective and simpler techniques are needed to obtain better search results as well reduce the need of query reformulation (QR). To that end, this study applied a search engine framework which employs standard methodology in Information Retrieval (IR) to evaluate several reformulation strategies and proposes an operative and effective QR strategy to locate domain-specific information. The fruit domain; specifically, durian; was chosen as the case study. An investigation was first conducted to prove that the issues present at the time of the study as well as the selected domain were still pertinent. Several popular commercial search engines were examined to determine their current search performance in locating domain-specific information on the Web. A group of users was then selected to conduct a task-based search to examine how users structured their queries to obtain the search intent. The results indicated that the most popular search engine (Google) only had an average of P@10 score of 0.463 and mean average precision (MAP) score of 0.649 when searching for durian-related information. The results of the task-based search showed that 84.82% of users reformulate their queries, clearly indicating that users do not obtain relevant search results on the first few tries. As such, several QR strategies that may produce better search results were investigated. Nine strategies were examined by using features, such as query keywords, ontology, the characteristic category of the domain, and the domain name. These features were manipulated using techniques, such as ‘generalization’, ‘specification’, and ‘new’. Of the nine strategies examined, three outperformed the baseline. Combining query keywords with ontology significantly surpassed the baseline MAP score by 2.65%. More interestingly, the characteristic category of the domain, which is considerably simpler and easier to use, also outperformed the baseline MAP score by 2.63%. The findings of this study contribute to the field of IR, through the performance of search engines, user behaviour, test collection and reformulation strategies in searching for domain specific informatio 2022 Thesis https://ir.uitm.edu.my/id/eprint/78545/ https://ir.uitm.edu.my/id/eprint/78545/1/78545.pdf text en public phd doctoral Universiti Teknologi MARA (UiTM) Faculty of Computer and Mathematical Sciences Abu Bakar, Zainab
institution Universiti Teknologi MARA
collection UiTM Institutional Repository
language English
advisor Abu Bakar, Zainab
topic Programming
Rule-based programming
Backtrack programming
spellingShingle Programming
Rule-based programming
Backtrack programming
Azizan, Azilawati
Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
description Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately translate their information needs into correct query words or phrases for a search engine to utilize. This becomes harder when users search for domain-specific information as, in most cases, users are unable to identify the keywords that are appropriate for the domain in the search query. As such, the search engine is unable to locate the relevant documents. This causes users to reformulate the query multiple times in the hopes of retrieving a more relevant set of search results. To address this issue, many researchers propose the use of query reformulation, query refinement, query expansion, or query disambiguation to intentionally build better queries and retrieve more relevant results. However, most of strategies employed to tackle this issue; such as the query log, rhetorical structure, thesaurus, WordNet, ontology, and user profiles; require extensive sources, risky and are time consuming. Therefore, more effective and simpler techniques are needed to obtain better search results as well reduce the need of query reformulation (QR). To that end, this study applied a search engine framework which employs standard methodology in Information Retrieval (IR) to evaluate several reformulation strategies and proposes an operative and effective QR strategy to locate domain-specific information. The fruit domain; specifically, durian; was chosen as the case study. An investigation was first conducted to prove that the issues present at the time of the study as well as the selected domain were still pertinent. Several popular commercial search engines were examined to determine their current search performance in locating domain-specific information on the Web. A group of users was then selected to conduct a task-based search to examine how users structured their queries to obtain the search intent. The results indicated that the most popular search engine (Google) only had an average of P@10 score of 0.463 and mean average precision (MAP) score of 0.649 when searching for durian-related information. The results of the task-based search showed that 84.82% of users reformulate their queries, clearly indicating that users do not obtain relevant search results on the first few tries. As such, several QR strategies that may produce better search results were investigated. Nine strategies were examined by using features, such as query keywords, ontology, the characteristic category of the domain, and the domain name. These features were manipulated using techniques, such as ‘generalization’, ‘specification’, and ‘new’. Of the nine strategies examined, three outperformed the baseline. Combining query keywords with ontology significantly surpassed the baseline MAP score by 2.65%. More interestingly, the characteristic category of the domain, which is considerably simpler and easier to use, also outperformed the baseline MAP score by 2.63%. The findings of this study contribute to the field of IR, through the performance of search engines, user behaviour, test collection and reformulation strategies in searching for domain specific informatio
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Azizan, Azilawati
author_facet Azizan, Azilawati
author_sort Azizan, Azilawati
title Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
title_short Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
title_full Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
title_fullStr Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
title_full_unstemmed Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
title_sort evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / azilawati azizan
granting_institution Universiti Teknologi MARA (UiTM)
granting_department Faculty of Computer and Mathematical Sciences
publishDate 2022
url https://ir.uitm.edu.my/id/eprint/78545/1/78545.pdf
_version_ 1783736256064126976