A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers

Nowadays, integrating natural language processing techniques on software systems has caught many researchers’ attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying specifi...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Alanee, Nahla
التنسيق:	أطروحة
اللغة:	English
منشور في:	2016
الموضوعات:	Bayesian statistical decision theory
الوصول للمادة أونلاين:	http://psasir.upm.edu.my/id/eprint/91752/1/FSKTM%202016%2032%20IR.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-upm-ir.91752
record_format	uketd_dc
spelling	my-upm-ir.917522022-01-17T07:49:05Z A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers 2016-12 Alanee, Nahla Nowadays, integrating natural language processing techniques on software systems has caught many researchers’ attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying specific portions of the source code. One of the most important information lies on such source code is the identifiers (e.g. Student). Unlike the traditional text processing, the identifiers in the source code is formed as multi-word such as ‘Employee-Name’. Such multi-words are not divided using white space, instead it can be formed using special characters (e.g. Employee_ID), CamelCase (e.g. EmployeeName) or using abbreviations (e.g. EmpNm). This makes the process of extracting such identifiers more challenging. Several approaches have been performed to resolve the problem of splitting multi-word identifiers. However, there is still room for improvement in terms of accuracy. Such improvement can be represented by utilizing more robust features that have the ability to analyses the morphology of identifiers. Therefore, this study aims to propose a hybrid method of feature extraction and Naïve Bayes classifier in order to separate multi-word identifiers within source code. The dataset that has been used in this study is a benchmark-annotated data that contains large number of Java codes. Multiple experiments have been conducted in order to evaluate the proposed features independently and with combinations. Results shown that the combination of all features have obtained the best accuracy by achieving 64.7% of f-measure. Such finding implies the usefulness of the proposed features in terms of discriminating multi-word identifiers. Bayesian statistical decision theory 2016-12 Thesis http://psasir.upm.edu.my/id/eprint/91752/ http://psasir.upm.edu.my/id/eprint/91752/1/FSKTM%202016%2032%20IR.pdf text en public masters Universiti Putra Malaysia Bayesian statistical decision theory Azmi Murad, Masrah Azrifah
institution	Universiti Putra Malaysia
collection	PSAS Institutional Repository
language	English
advisor	Azmi Murad, Masrah Azrifah
topic	Bayesian statistical decision theory
spellingShingle	Bayesian statistical decision theory Alanee, Nahla A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers
description	Nowadays, integrating natural language processing techniques on software systems has caught many researchers’ attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying specific portions of the source code. One of the most important information lies on such source code is the identifiers (e.g. Student). Unlike the traditional text processing, the identifiers in the source code is formed as multi-word such as ‘Employee-Name’. Such multi-words are not divided using white space, instead it can be formed using special characters (e.g. Employee_ID), CamelCase (e.g. EmployeeName) or using abbreviations (e.g. EmpNm). This makes the process of extracting such identifiers more challenging. Several approaches have been performed to resolve the problem of splitting multi-word identifiers. However, there is still room for improvement in terms of accuracy. Such improvement can be represented by utilizing more robust features that have the ability to analyses the morphology of identifiers. Therefore, this study aims to propose a hybrid method of feature extraction and Naïve Bayes classifier in order to separate multi-word identifiers within source code. The dataset that has been used in this study is a benchmark-annotated data that contains large number of Java codes. Multiple experiments have been conducted in order to evaluate the proposed features independently and with combinations. Results shown that the combination of all features have obtained the best accuracy by achieving 64.7% of f-measure. Such finding implies the usefulness of the proposed features in terms of discriminating multi-word identifiers.
format	Thesis
qualification_level	Master's degree
author	Alanee, Nahla
author_facet	Alanee, Nahla
author_sort	Alanee, Nahla
title	A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers
title_short	A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers
title_full	A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers
title_fullStr	A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers
title_full_unstemmed	A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers
title_sort	hybrid method of feature extraction and naive bayes classification for splitting identifiers
granting_institution	Universiti Putra Malaysia
publishDate	2016
url	http://psasir.upm.edu.my/id/eprint/91752/1/FSKTM%202016%2032%20IR.pdf
_version_	1747813688587321344

A hybrid method of feature extraction and Naive Bayes classification for splitting identifiers

مواد مشابهة