A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search

Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers...

Full description

Saved in:
Bibliographic Details
Main Author: Bourair Sadik Mohamad Taqi
Format: Thesis
Language:en_US
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-usim-ddms-13189
record_format uketd_dc
spelling my-usim-ddms-131892024-05-29T05:49:23Z A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search Bourair Sadik Mohamad Taqi Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers tend to utilize data logs which can observe several transactions that are performed between the user and the search engine. Such data logs contain a huge amount of heterogeneous data such as URLs visited by the user, queries, clicks, document ranking and other significant information about the user details. Another one of these technologies is web search results clustering which return meaningful labelled clusters from a set of Web snippets retrieved from any Web search engine for a given user's query. Search result clustering aims to improve searching for information from the potential huge amount of search results. These search results consist of URLs, titles, and snippets (descriptions or summaries) of web pages. However, there is a serious limitation lies behind the clustering techniques which can be represented by the static mechanism of adjusting the number of cluster. This would inappropriately fit the search results which are usually dynamic in accordance to the typed query. Therefore, this study aims to propose a hybrid method of centroid-based clustering and meta-heuristic for the personalized web search. First, the traditional clustering methods namely Kmeans, K-medoids and Correlation clustering will be applied with three similarity measures which are Cosine, Dice and Jaccard for mining data logs and clustering web search results. Several pre-processing steps such as transformation, normalization, tokenization, and stemming were performed to turn the data into an appropriate format. The sensitivity to initial values, cluster centers and the specified number of clusters and underutilization of semantic features of the traditional clustering algorithms reduce their performance. Second, to improve the results of the clustering methods, this research propose enhanced centroid based clustering methods for personalized web search engine with new hybrid semantic similarity measure that exploit the richness of the semantic features. Finally, the hybrid clustering methods will be applied which combine a novel genetic algorithm with centroid based clustering methods for clustering data logs and web search results. The proposed methods were evaluated using the common information retrieval metrics of Precision, Recall and F-measure. The AOL standard dataset is used for evaluating web data logs clustering. ODP-239 and MORESQUE are used as the main gold standards for the evaluation of search results clustering algorithms. The experimental results show that the proposed methods outperformed all other clustering methods by a large margin for both clustering data logs and web search results over all datasets. In addition, results show that proposed methods are promising approaches which can make search results more understandable to the users and yield promising benefits in terms of personalization. Future work might examine the application of meta-heuristic with clustering for real-time personalized web search that can take the advantage of GA to dynamically assign number of cluster in accordance to the typed query. Universiti Sains Islam Malaysia 2018-03 Thesis en_US https://oarep.usim.edu.my/handle/123456789/13189 https://oarep.usim.edu.my/bitstreams/7c619760-5b5c-4558-9186-488af602b4fe/download 8a4605be74aa9ea9d79846c1fba20a33 Computer algorithms -- Application software Online algorithms Data Mining -- methods.
institution Universiti Sains Islam Malaysia
collection USIM Institutional Repository
language en_US
topic Computer algorithms -- Application software
Online algorithms
Data Mining -- methods.
spellingShingle Computer algorithms -- Application software
Online algorithms
Data Mining -- methods.
Bourair Sadik Mohamad Taqi
A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
description Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers tend to utilize data logs which can observe several transactions that are performed between the user and the search engine. Such data logs contain a huge amount of heterogeneous data such as URLs visited by the user, queries, clicks, document ranking and other significant information about the user details. Another one of these technologies is web search results clustering which return meaningful labelled clusters from a set of Web snippets retrieved from any Web search engine for a given user's query. Search result clustering aims to improve searching for information from the potential huge amount of search results. These search results consist of URLs, titles, and snippets (descriptions or summaries) of web pages. However, there is a serious limitation lies behind the clustering techniques which can be represented by the static mechanism of adjusting the number of cluster. This would inappropriately fit the search results which are usually dynamic in accordance to the typed query. Therefore, this study aims to propose a hybrid method of centroid-based clustering and meta-heuristic for the personalized web search. First, the traditional clustering methods namely Kmeans, K-medoids and Correlation clustering will be applied with three similarity measures which are Cosine, Dice and Jaccard for mining data logs and clustering web search results. Several pre-processing steps such as transformation, normalization, tokenization, and stemming were performed to turn the data into an appropriate format. The sensitivity to initial values, cluster centers and the specified number of clusters and underutilization of semantic features of the traditional clustering algorithms reduce their performance. Second, to improve the results of the clustering methods, this research propose enhanced centroid based clustering methods for personalized web search engine with new hybrid semantic similarity measure that exploit the richness of the semantic features. Finally, the hybrid clustering methods will be applied which combine a novel genetic algorithm with centroid based clustering methods for clustering data logs and web search results. The proposed methods were evaluated using the common information retrieval metrics of Precision, Recall and F-measure. The AOL standard dataset is used for evaluating web data logs clustering. ODP-239 and MORESQUE are used as the main gold standards for the evaluation of search results clustering algorithms. The experimental results show that the proposed methods outperformed all other clustering methods by a large margin for both clustering data logs and web search results over all datasets. In addition, results show that proposed methods are promising approaches which can make search results more understandable to the users and yield promising benefits in terms of personalization. Future work might examine the application of meta-heuristic with clustering for real-time personalized web search that can take the advantage of GA to dynamically assign number of cluster in accordance to the typed query.
format Thesis
author Bourair Sadik Mohamad Taqi
author_facet Bourair Sadik Mohamad Taqi
author_sort Bourair Sadik Mohamad Taqi
title A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
title_short A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
title_full A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
title_fullStr A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
title_full_unstemmed A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
title_sort hybrid method of centroid-based clustering and meta-heuristic for personalized web search
granting_institution Universiti Sains Islam Malaysia
_version_ 1812444744592130048