Building a French Stemmer Using a Dictionary of French Root Words

In this thesis, a strong French stemming algorithm based on a dictionary of French root words is developed. Four modules are observed for this purpose. The first module deals with the development of a list of French root words, and a list of affixes, that is, prefixes, suffixes, and prefix-suffix...

全面介绍

Saved in:
书目详细资料
主要作者: Idi, Fulayi
格式: Thesis
语言:English
English
出版: 1999
主题:
在线阅读:http://psasir.upm.edu.my/id/eprint/9628/1/FSKTM_1999_2_IR.pdf
标签: 添加标签
没有标签, 成为第一个标记此记录!
id my-upm-ir.9628
record_format uketd_dc
spelling my-upm-ir.96282023-11-28T03:12:36Z Building a French Stemmer Using a Dictionary of French Root Words 1999-07 Idi, Fulayi In this thesis, a strong French stemming algorithm based on a dictionary of French root words is developed. Four modules are observed for this purpose. The first module deals with the development of a list of French root words, and a list of affixes, that is, prefixes, suffixes, and prefix-suffix pairs. The second module removes the punctuation from words to be stemmed. It also removes stop words from the corpus to be stemmed. After this second module, words are noise-free, and this leads to the third module, that is, the stemming proper. The stemming order adopted is prefix, then suffix, and finally prefix-suffix pairs. Any word to be stemmed is first compared to a dictionary of French root words to check if it is a root word. Then, the actual stemming process is performed. The stemming algorithm constructed is tested using selected criteria, among which are inflection removal, prefix stripping and suffix stripping. For all these tests, the new French stemming algorithm performs better than the existing French stemmer, Savoy's stemmer. Tests are also carried out to check the performance of the new French stemmer in terms of understemming, overstemming, ambiguous stemming and dictionary error. The new French stemmer has fewer understemming, overstemming, and ambiguous stemming than Savoy's stemmer. However, the new stemmer has more dictionary born errors than Savoy's stemmer. French language 1999-07 Thesis http://psasir.upm.edu.my/id/eprint/9628/ http://psasir.upm.edu.my/id/eprint/9628/1/FSKTM_1999_2_IR.pdf text en public masters Universiti Putra Malaysia French language Faculty of Computer Science and Information Technology Ahmad, Fatimah English
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
English
advisor Ahmad, Fatimah
topic French language


spellingShingle French language


Idi, Fulayi
Building a French Stemmer Using a Dictionary of French Root Words
description In this thesis, a strong French stemming algorithm based on a dictionary of French root words is developed. Four modules are observed for this purpose. The first module deals with the development of a list of French root words, and a list of affixes, that is, prefixes, suffixes, and prefix-suffix pairs. The second module removes the punctuation from words to be stemmed. It also removes stop words from the corpus to be stemmed. After this second module, words are noise-free, and this leads to the third module, that is, the stemming proper. The stemming order adopted is prefix, then suffix, and finally prefix-suffix pairs. Any word to be stemmed is first compared to a dictionary of French root words to check if it is a root word. Then, the actual stemming process is performed. The stemming algorithm constructed is tested using selected criteria, among which are inflection removal, prefix stripping and suffix stripping. For all these tests, the new French stemming algorithm performs better than the existing French stemmer, Savoy's stemmer. Tests are also carried out to check the performance of the new French stemmer in terms of understemming, overstemming, ambiguous stemming and dictionary error. The new French stemmer has fewer understemming, overstemming, and ambiguous stemming than Savoy's stemmer. However, the new stemmer has more dictionary born errors than Savoy's stemmer.
format Thesis
qualification_level Master's degree
author Idi, Fulayi
author_facet Idi, Fulayi
author_sort Idi, Fulayi
title Building a French Stemmer Using a Dictionary of French Root Words
title_short Building a French Stemmer Using a Dictionary of French Root Words
title_full Building a French Stemmer Using a Dictionary of French Root Words
title_fullStr Building a French Stemmer Using a Dictionary of French Root Words
title_full_unstemmed Building a French Stemmer Using a Dictionary of French Root Words
title_sort building a french stemmer using a dictionary of french root words
granting_institution Universiti Putra Malaysia
granting_department Faculty of Computer Science and Information Technology
publishDate 1999
url http://psasir.upm.edu.my/id/eprint/9628/1/FSKTM_1999_2_IR.pdf
_version_ 1794018854991560704