Computational stylometric model for oath and oath-like expressions in Quranic text
Computational stylometry is to analyze a written text based on measurable style markers. The current major challenges in stylometric analysis are to be found in terms of scalability, which concerns mainly in authorship attribution applications as how to handle short texts in terms of identifying aut...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/60534/1/FSKTM%202014%2032IR.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-upm-ir.60534 |
---|---|
record_format |
uketd_dc |
institution |
Universiti Putra Malaysia |
collection |
PSAS Institutional Repository |
language |
English |
topic |
Computational linguistics - Research Computational linguistics - Research |
spellingShingle |
Computational linguistics - Research Computational linguistics - Research Alqurneh, Ahmad Computational stylometric model for oath and oath-like expressions in Quranic text |
description |
Computational stylometry is to analyze a written text based on measurable style markers. The current major challenges in stylometric analysis are to be found in terms of scalability, which concerns mainly in authorship attribution applications as how to handle short texts in terms of identifying author; generalization, which investigates property transfer that happen among different text; and explanation, which focuses on support to increase understanding rather than maximizing performance. To deal with these issues in the domain of Quran, the definitions of scalability, generalizations, and explanations issues are to reshape to be compatible with Quran area. Hence, we illustrate the problem within the domain of oaths and oath-like expressions in the Quran due to its indirect declaration in the context of its literary style as implicit oaths. Therefore, in this research, the scalability issue concerns on how far stylometric features are scalable to detect implicit oaths that can lead to the identification of the oath taker. Generalization issue concerns on measuring any style properties transfer between implicit and explicit oaths that emerge the difference in style characteristics of oath takers. Finally, explanation concerns whether stylometry is able to achieve better understating for the stylistics knowledge of oath compare to the linguistic studies. Quranic oath is a swear from Quranic verses used to emphasize the importance or truthfulness of a concept that follows the oath in the verse. Oaths are multifaceted and rich expressions. Oath expressions in Quranic texts are with implicit and explicit forms. Explicit form of oath expression is based on existing of swearing verb while, implicit form is devoid from the swearing verb. This lead to several interpretations that comes from a variety of aspects. While the Quranic oath has been widely studied from theoretical base in Islamic Studies, it has not being treated from computational perspective. This research proposes a new computational stylometric model for oath and oath like expression detection for implicit and explicit oaths. The model is to detect explicit oaths as well as implicit oaths, inspect any common stylistics properties between the two forms and, has well explanation of the detected oath aspects. This work proposes an oath-like expression detection algorithm (OLEDA) to develop a computational stylometric model for oaths based on stylometric applicationspecific features which are structural and content-specific features. The selection of such features is because they can be defined in certain text domains or languages. The aim of these features is to detect both forms of oaths, i.e implicit as well as explicit. Subsequently, the application-specific features are evaluated in terms of their activity towards oath detection through a series of machine learning experiments using various classifiers such as the Bayesian network. To improve oath detection by OLEDA with scalable features in stylometry, character features are added, particularly character n-gram features, bigram and trigram. Such features are used to select any special characters that commence the oath statement to handle implicit oath in achieving scalability. To differentiate the oath takers of the implicit and explicit oaths, we examine any common or uncommon properties transfer between them. For this, we performed additional stylometric analysis using syntactic and lexical features. Syntactic features enable us to extract a new feature based on the rewrite rule frequencies. This feature is used to split the oath statements into chunks, where each chunk will be assigned to its corresponding morphological meaning with reference to a standard syntactic Treebank. Next, is the lexical features bag, which include token-based features, short words features, word n-gram features, vocabulary richness functions, Hapax Legomena, Hapax Dislegomena, and frequent function words that discriminate oath-takers in achieving generalization. Finally, in achieving better explanation on oaths, this research performed two-fold validation of the proposed stylometric model to oath and oath-like expressions. One, because oaths have not being studied from the computational perspective, the stylometric model is compared against an existing linguistic model of oaths. Two, various experimental results from the proposed model are compared against expert evaluation. The results showed that the proposed stylometric model of oaths has scalability in detecting implicit oaths, obtains dissimilar generalization levels in discriminating oath takers, and better adding in oath expressions explanation. |
format |
Thesis |
qualification_level |
Doctorate |
author |
Alqurneh, Ahmad |
author_facet |
Alqurneh, Ahmad |
author_sort |
Alqurneh, Ahmad |
title |
Computational stylometric model for oath and oath-like expressions in Quranic text |
title_short |
Computational stylometric model for oath and oath-like expressions in Quranic text |
title_full |
Computational stylometric model for oath and oath-like expressions in Quranic text |
title_fullStr |
Computational stylometric model for oath and oath-like expressions in Quranic text |
title_full_unstemmed |
Computational stylometric model for oath and oath-like expressions in Quranic text |
title_sort |
computational stylometric model for oath and oath-like expressions in quranic text |
granting_institution |
Universiti Putra Malaysia |
publishDate |
2014 |
url |
http://psasir.upm.edu.my/id/eprint/60534/1/FSKTM%202014%2032IR.pdf |
_version_ |
1747812278286155776 |
spelling |
my-upm-ir.605342018-05-08T03:10:47Z Computational stylometric model for oath and oath-like expressions in Quranic text 2014-10 Alqurneh, Ahmad Computational stylometry is to analyze a written text based on measurable style markers. The current major challenges in stylometric analysis are to be found in terms of scalability, which concerns mainly in authorship attribution applications as how to handle short texts in terms of identifying author; generalization, which investigates property transfer that happen among different text; and explanation, which focuses on support to increase understanding rather than maximizing performance. To deal with these issues in the domain of Quran, the definitions of scalability, generalizations, and explanations issues are to reshape to be compatible with Quran area. Hence, we illustrate the problem within the domain of oaths and oath-like expressions in the Quran due to its indirect declaration in the context of its literary style as implicit oaths. Therefore, in this research, the scalability issue concerns on how far stylometric features are scalable to detect implicit oaths that can lead to the identification of the oath taker. Generalization issue concerns on measuring any style properties transfer between implicit and explicit oaths that emerge the difference in style characteristics of oath takers. Finally, explanation concerns whether stylometry is able to achieve better understating for the stylistics knowledge of oath compare to the linguistic studies. Quranic oath is a swear from Quranic verses used to emphasize the importance or truthfulness of a concept that follows the oath in the verse. Oaths are multifaceted and rich expressions. Oath expressions in Quranic texts are with implicit and explicit forms. Explicit form of oath expression is based on existing of swearing verb while, implicit form is devoid from the swearing verb. This lead to several interpretations that comes from a variety of aspects. While the Quranic oath has been widely studied from theoretical base in Islamic Studies, it has not being treated from computational perspective. This research proposes a new computational stylometric model for oath and oath like expression detection for implicit and explicit oaths. The model is to detect explicit oaths as well as implicit oaths, inspect any common stylistics properties between the two forms and, has well explanation of the detected oath aspects. This work proposes an oath-like expression detection algorithm (OLEDA) to develop a computational stylometric model for oaths based on stylometric applicationspecific features which are structural and content-specific features. The selection of such features is because they can be defined in certain text domains or languages. The aim of these features is to detect both forms of oaths, i.e implicit as well as explicit. Subsequently, the application-specific features are evaluated in terms of their activity towards oath detection through a series of machine learning experiments using various classifiers such as the Bayesian network. To improve oath detection by OLEDA with scalable features in stylometry, character features are added, particularly character n-gram features, bigram and trigram. Such features are used to select any special characters that commence the oath statement to handle implicit oath in achieving scalability. To differentiate the oath takers of the implicit and explicit oaths, we examine any common or uncommon properties transfer between them. For this, we performed additional stylometric analysis using syntactic and lexical features. Syntactic features enable us to extract a new feature based on the rewrite rule frequencies. This feature is used to split the oath statements into chunks, where each chunk will be assigned to its corresponding morphological meaning with reference to a standard syntactic Treebank. Next, is the lexical features bag, which include token-based features, short words features, word n-gram features, vocabulary richness functions, Hapax Legomena, Hapax Dislegomena, and frequent function words that discriminate oath-takers in achieving generalization. Finally, in achieving better explanation on oaths, this research performed two-fold validation of the proposed stylometric model to oath and oath-like expressions. One, because oaths have not being studied from the computational perspective, the stylometric model is compared against an existing linguistic model of oaths. Two, various experimental results from the proposed model are compared against expert evaluation. The results showed that the proposed stylometric model of oaths has scalability in detecting implicit oaths, obtains dissimilar generalization levels in discriminating oath takers, and better adding in oath expressions explanation. Computational linguistics - Research Qurʼan - Language, style 2014-10 Thesis http://psasir.upm.edu.my/id/eprint/60534/ http://psasir.upm.edu.my/id/eprint/60534/1/FSKTM%202014%2032IR.pdf text en public doctoral Universiti Putra Malaysia Computational linguistics - Research Qurʼan - Language, style |