Document clustering based on inverse document frequency measure

Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. Recently, document clustering has been put forth as an alternative method of organizing the results of retrieval. It been...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Wan Faridah Hanum, Wan Yaacob
التنسيق:	أطروحة
اللغة:	eng eng
منشور في:	2005
الموضوعات:	HF5001-6182 Business
الوصول للمادة أونلاين:	https://etd.uum.edu.my/1367/1/WAN_FARIDAH_HANUM_BT._WAN_YAACOB.pdf https://etd.uum.edu.my/1367/2/1.WAN_FARIDAH_HANUM_BT._WAN_YAACOB.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-uum-etd.1367
record_format	uketd_dc
spelling	my-uum-etd.13672019-11-12T02:13:09Z Document clustering based on inverse document frequency measure 2005-04-07 Wan Faridah Hanum, Wan Yaacob Yusoff, Nooraini Faculty of Information Technology Faculty of Information Technology HF5001-6182 Business Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. Recently, document clustering has been put forth as an alternative method of organizing the results of retrieval. It been proposed for use in navigating and browsing document collections, and discovers hidden similarity and key concepts. It also summarize a large amount of document using key or common attributes of cluster and can be used to categorize document databases. This paper describes several narrative clustering techniques such as Porter algorithm, Gusfield algorithm, similarity based on document hierarchy and Inverse Document Frequency (IDF), which intersect the documents in a cluster to determine the set of words (or phrases) shared by all the documents in the cluster. This study proposes document clustering based on IDF, where it is assumes that importance of a keyword in calculating similarity measures is inversely proportional to the total number of documents that contain it. IDF is easy to understand, has a geometric interpretation, term weighing shown to help clustering, allow partial matching and returns ranked documents. An important finding in this study, where 30 cases of documents tested with the IDF algorithm, and the results are divided into three category; correct cluster, incorrect cluster, and unknown cluster. 2005-04 Thesis https://etd.uum.edu.my/1367/ https://etd.uum.edu.my/1367/1/WAN_FARIDAH_HANUM_BT._WAN_YAACOB.pdf application/pdf eng validuser https://etd.uum.edu.my/1367/2/1.WAN_FARIDAH_HANUM_BT._WAN_YAACOB.pdf application/pdf eng public http://sierra.uum.edu.my/record=b1170635~S1 masters masters Universiti Utara Malaysia
institution	Universiti Utara Malaysia
collection	UUM ETD
language	eng eng
advisor	Yusoff, Nooraini
topic	HF5001-6182 Business
spellingShingle	HF5001-6182 Business Wan Faridah Hanum, Wan Yaacob Document clustering based on inverse document frequency measure
description	Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. Recently, document clustering has been put forth as an alternative method of organizing the results of retrieval. It been proposed for use in navigating and browsing document collections, and discovers hidden similarity and key concepts. It also summarize a large amount of document using key or common attributes of cluster and can be used to categorize document databases. This paper describes several narrative clustering techniques such as Porter algorithm, Gusfield algorithm, similarity based on document hierarchy and Inverse Document Frequency (IDF), which intersect the documents in a cluster to determine the set of words (or phrases) shared by all the documents in the cluster. This study proposes document clustering based on IDF, where it is assumes that importance of a keyword in calculating similarity measures is inversely proportional to the total number of documents that contain it. IDF is easy to understand, has a geometric interpretation, term weighing shown to help clustering, allow partial matching and returns ranked documents. An important finding in this study, where 30 cases of documents tested with the IDF algorithm, and the results are divided into three category; correct cluster, incorrect cluster, and unknown cluster.
format	Thesis
qualification_name	masters
qualification_level	Master's degree
author	Wan Faridah Hanum, Wan Yaacob
author_facet	Wan Faridah Hanum, Wan Yaacob
author_sort	Wan Faridah Hanum, Wan Yaacob
title	Document clustering based on inverse document frequency measure
title_short	Document clustering based on inverse document frequency measure
title_full	Document clustering based on inverse document frequency measure
title_fullStr	Document clustering based on inverse document frequency measure
title_full_unstemmed	Document clustering based on inverse document frequency measure
title_sort	document clustering based on inverse document frequency measure
granting_institution	Universiti Utara Malaysia
granting_department	Faculty of Information Technology
publishDate	2005
url	https://etd.uum.edu.my/1367/1/WAN_FARIDAH_HANUM_BT._WAN_YAACOB.pdf https://etd.uum.edu.my/1367/2/1.WAN_FARIDAH_HANUM_BT._WAN_YAACOB.pdf
_version_	1747827131143946240

Document clustering based on inverse document frequency measure

مواد مشابهة