A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | en_US |
Subjects: | |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-usim-ddms-13189 |
---|---|
record_format |
uketd_dc |
spelling |
my-usim-ddms-131892024-05-29T05:49:23Z A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search Bourair Sadik Mohamad Taqi Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers tend to utilize data logs which can observe several transactions that are performed between the user and the search engine. Such data logs contain a huge amount of heterogeneous data such as URLs visited by the user, queries, clicks, document ranking and other significant information about the user details. Another one of these technologies is web search results clustering which return meaningful labelled clusters from a set of Web snippets retrieved from any Web search engine for a given user's query. Search result clustering aims to improve searching for information from the potential huge amount of search results. These search results consist of URLs, titles, and snippets (descriptions or summaries) of web pages. However, there is a serious limitation lies behind the clustering techniques which can be represented by the static mechanism of adjusting the number of cluster. This would inappropriately fit the search results which are usually dynamic in accordance to the typed query. Therefore, this study aims to propose a hybrid method of centroid-based clustering and meta-heuristic for the personalized web search. First, the traditional clustering methods namely Kmeans, K-medoids and Correlation clustering will be applied with three similarity measures which are Cosine, Dice and Jaccard for mining data logs and clustering web search results. Several pre-processing steps such as transformation, normalization, tokenization, and stemming were performed to turn the data into an appropriate format. The sensitivity to initial values, cluster centers and the specified number of clusters and underutilization of semantic features of the traditional clustering algorithms reduce their performance. Second, to improve the results of the clustering methods, this research propose enhanced centroid based clustering methods for personalized web search engine with new hybrid semantic similarity measure that exploit the richness of the semantic features. Finally, the hybrid clustering methods will be applied which combine a novel genetic algorithm with centroid based clustering methods for clustering data logs and web search results. The proposed methods were evaluated using the common information retrieval metrics of Precision, Recall and F-measure. The AOL standard dataset is used for evaluating web data logs clustering. ODP-239 and MORESQUE are used as the main gold standards for the evaluation of search results clustering algorithms. The experimental results show that the proposed methods outperformed all other clustering methods by a large margin for both clustering data logs and web search results over all datasets. In addition, results show that proposed methods are promising approaches which can make search results more understandable to the users and yield promising benefits in terms of personalization. Future work might examine the application of meta-heuristic with clustering for real-time personalized web search that can take the advantage of GA to dynamically assign number of cluster in accordance to the typed query. Universiti Sains Islam Malaysia 2018-03 Thesis en_US https://oarep.usim.edu.my/handle/123456789/13189 https://oarep.usim.edu.my/bitstreams/7c619760-5b5c-4558-9186-488af602b4fe/download 8a4605be74aa9ea9d79846c1fba20a33 Computer algorithms -- Application software Online algorithms Data Mining -- methods. |
institution |
Universiti Sains Islam Malaysia |
collection |
USIM Institutional Repository |
language |
en_US |
topic |
Computer algorithms -- Application software Online algorithms Data Mining -- methods. |
spellingShingle |
Computer algorithms -- Application software Online algorithms Data Mining -- methods. Bourair Sadik Mohamad Taqi A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search |
description |
Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers tend to utilize data logs which can observe several transactions that are performed between the user and the search engine. Such data logs contain a huge amount of heterogeneous data such as URLs visited by the user, queries, clicks, document ranking and other significant information about the user details. Another one of these technologies is web search results clustering which return meaningful labelled clusters from a set of Web snippets retrieved from any Web search engine for a given user's query. Search result clustering aims to improve searching for information from the potential huge amount of search results. These search results consist of URLs, titles, and snippets (descriptions or summaries) of web pages. However, there is a serious limitation lies behind the clustering techniques which can be represented by the static mechanism of adjusting the number of cluster. This would inappropriately fit the search results which are usually dynamic in accordance to the typed query. Therefore, this study aims to propose a hybrid method of centroid-based clustering and meta-heuristic for the personalized web search. First, the traditional clustering methods namely Kmeans, K-medoids and Correlation clustering will be applied with three similarity measures which are Cosine, Dice and Jaccard for mining data logs and clustering web search results. Several pre-processing steps such as transformation, normalization, tokenization, and stemming were performed to turn the data into an appropriate format. The sensitivity to initial values, cluster centers and the specified number of clusters and underutilization of semantic features of the traditional clustering algorithms reduce their performance. Second, to improve the results of the clustering methods, this research propose enhanced centroid based clustering methods for personalized web search engine with new hybrid semantic similarity measure that exploit the richness of the semantic features. Finally, the hybrid clustering methods will be applied which combine a novel genetic algorithm with centroid based clustering methods for clustering data logs and web search results. The proposed methods were evaluated using the common information retrieval metrics of Precision, Recall and F-measure. The AOL standard dataset is used for evaluating web data logs clustering. ODP-239 and MORESQUE are used as the main gold standards for the evaluation of search results clustering algorithms. The experimental results show that the proposed methods outperformed all other clustering methods by a large margin for both clustering data logs and web search results over all datasets. In addition, results show that proposed methods are promising approaches which can make search results more understandable to the users and yield promising benefits in terms of personalization. Future work might examine the application of meta-heuristic with clustering for real-time personalized web search that can take the advantage of GA to dynamically assign number of cluster in accordance to the typed query. |
format |
Thesis |
author |
Bourair Sadik Mohamad Taqi |
author_facet |
Bourair Sadik Mohamad Taqi |
author_sort |
Bourair Sadik Mohamad Taqi |
title |
A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search |
title_short |
A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search |
title_full |
A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search |
title_fullStr |
A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search |
title_full_unstemmed |
A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search |
title_sort |
hybrid method of centroid-based clustering and meta-heuristic for personalized web search |
granting_institution |
Universiti Sains Islam Malaysia |
_version_ |
1812444744592130048 |