Using wordnet to enhance feature selection in automated text categorization
the field of automated text categorization, the large dimensionality of the feature space is a major problem as it involves extensive computations. Feature selection is one of the approaches to reduce the dimensionality of the feature space. This research explores the use of WordNet (Miller et al...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2004
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/12604/1/Stephanie%20Chua%20Hui%20Li%20ft.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-unimas-ir.12604 |
---|---|
record_format |
uketd_dc |
spelling |
my-unimas-ir.126042023-05-10T06:57:33Z Using wordnet to enhance feature selection in automated text categorization 2004 Chua, Stephanie Hui Li T Technology (General) the field of automated text categorization, the large dimensionality of the feature space is a major problem as it involves extensive computations. Feature selection is one of the approaches to reduce the dimensionality of the feature space. This research explores the use of WordNet (Miller et al., 1990), a lexical database, for performing feature selection for an automated text categorization system. The WordNet-based approach employs lexical and semantics information for feature selection. WordNet allows the selection of terms that are lexically and semantically representative of a category of documents, as opposed to statistical approaches traditionally used for feature selection. f' We proposed three WordNet based approaches for feature selection. The first one is to use the WordNet nouns approach that selects all nouns in WordNet that occur in each category as features. The second approach is based on lexical semantics that selects synonymous terms that co-occur in a category while the third approach is a combination of the lexical semantics approach with statistical feature selection methods. The lexical semantics approach performed better than the WordNet nouns approach with more than 40% of reduction in feature space in the experiments using the Reuters-21578 dataset. The lexical semantics approach also outperformed popular statistical feature selection methods, namely, Chi-Square (Chi2) and Information Gain (IG). The combined approach has improved the performance of the statistical methods. WordNet has successfully been used to enhance feature selection, highlighting the possibility of determining semantic features automatically. The limitations of the lexical semantics approach are also highlighted, proposing an improved framework and an extension to overcome them. Universiti Malaysia Sarawak, (UNIMAS) 2004 Thesis http://ir.unimas.my/id/eprint/12604/ http://ir.unimas.my/id/eprint/12604/1/Stephanie%20Chua%20Hui%20Li%20ft.pdf text en validuser masters Universiti Malaysia Sarawak, (UNIMAS) Faculty of Computer Science and Information Technology. |
institution |
Universiti Malaysia Sarawak |
collection |
UNIMAS Institutional Repository |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Chua, Stephanie Hui Li Using wordnet to enhance feature selection in automated text categorization |
description |
the field of automated text categorization, the large dimensionality of the feature
space is a major problem as it involves extensive computations. Feature selection is
one of the approaches to reduce the dimensionality of the feature space. This research
explores the use of WordNet (Miller et al., 1990), a lexical database, for performing
feature selection for an automated text categorization system. The WordNet-based
approach employs lexical and semantics information for feature selection. WordNet
allows the selection of terms that are lexically and semantically representative of a
category of documents, as opposed to statistical approaches traditionally used for
feature selection. f'
We proposed three WordNet based approaches for feature selection. The first one is
to use the WordNet nouns approach that selects all nouns in WordNet that occur in
each category as features. The second approach is based on lexical semantics that
selects synonymous terms that co-occur in a category while the third approach is a
combination of the lexical semantics approach with statistical feature selection
methods.
The lexical semantics approach performed better than the WordNet nouns approach
with more than 40% of reduction in feature space in the experiments using the
Reuters-21578 dataset. The lexical semantics approach also outperformed popular
statistical feature selection methods, namely, Chi-Square (Chi2) and Information
Gain (IG). The combined approach has improved the performance of the statistical
methods. WordNet has successfully been used to enhance feature selection, highlighting the possibility of determining semantic features automatically. The
limitations of the lexical semantics approach are also highlighted, proposing an
improved framework and an extension to overcome them. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Chua, Stephanie Hui Li |
author_facet |
Chua, Stephanie Hui Li |
author_sort |
Chua, Stephanie Hui Li |
title |
Using wordnet to enhance feature selection in automated text categorization |
title_short |
Using wordnet to enhance feature selection in automated text categorization |
title_full |
Using wordnet to enhance feature selection in automated text categorization |
title_fullStr |
Using wordnet to enhance feature selection in automated text categorization |
title_full_unstemmed |
Using wordnet to enhance feature selection in automated text categorization |
title_sort |
using wordnet to enhance feature selection in automated text categorization |
granting_institution |
Universiti Malaysia Sarawak, (UNIMAS) |
granting_department |
Faculty of Computer Science and Information Technology. |
publishDate |
2004 |
url |
http://ir.unimas.my/id/eprint/12604/1/Stephanie%20Chua%20Hui%20Li%20ft.pdf |
_version_ |
1783728112282894336 |