Computational stylometric model for oath and oath-like expressions in Quranic text

Computational stylometry is to analyze a written text based on measurable style markers. The current major challenges in stylometric analysis are to be found in terms of scalability, which concerns mainly in authorship attribution applications as how to handle short texts in terms of identifying aut...

Full description

Saved in:
Bibliographic Details
Main Author: Alqurneh, Ahmad
Format: Thesis
Language:English
Published: 2014
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/60534/1/FSKTM%202014%2032IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.60534
record_format uketd_dc
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
topic Computational linguistics - Research
Computational linguistics - Research

spellingShingle Computational linguistics - Research
Computational linguistics - Research

Alqurneh, Ahmad
Computational stylometric model for oath and oath-like expressions in Quranic text
description Computational stylometry is to analyze a written text based on measurable style markers. The current major challenges in stylometric analysis are to be found in terms of scalability, which concerns mainly in authorship attribution applications as how to handle short texts in terms of identifying author; generalization, which investigates property transfer that happen among different text; and explanation, which focuses on support to increase understanding rather than maximizing performance. To deal with these issues in the domain of Quran, the definitions of scalability, generalizations, and explanations issues are to reshape to be compatible with Quran area. Hence, we illustrate the problem within the domain of oaths and oath-like expressions in the Quran due to its indirect declaration in the context of its literary style as implicit oaths. Therefore, in this research, the scalability issue concerns on how far stylometric features are scalable to detect implicit oaths that can lead to the identification of the oath taker. Generalization issue concerns on measuring any style properties transfer between implicit and explicit oaths that emerge the difference in style characteristics of oath takers. Finally, explanation concerns whether stylometry is able to achieve better understating for the stylistics knowledge of oath compare to the linguistic studies. Quranic oath is a swear from Quranic verses used to emphasize the importance or truthfulness of a concept that follows the oath in the verse. Oaths are multifaceted and rich expressions. Oath expressions in Quranic texts are with implicit and explicit forms. Explicit form of oath expression is based on existing of swearing verb while, implicit form is devoid from the swearing verb. This lead to several interpretations that comes from a variety of aspects. While the Quranic oath has been widely studied from theoretical base in Islamic Studies, it has not being treated from computational perspective. This research proposes a new computational stylometric model for oath and oath like expression detection for implicit and explicit oaths. The model is to detect explicit oaths as well as implicit oaths, inspect any common stylistics properties between the two forms and, has well explanation of the detected oath aspects. This work proposes an oath-like expression detection algorithm (OLEDA) to develop a computational stylometric model for oaths based on stylometric applicationspecific features which are structural and content-specific features. The selection of such features is because they can be defined in certain text domains or languages. The aim of these features is to detect both forms of oaths, i.e implicit as well as explicit. Subsequently, the application-specific features are evaluated in terms of their activity towards oath detection through a series of machine learning experiments using various classifiers such as the Bayesian network. To improve oath detection by OLEDA with scalable features in stylometry, character features are added, particularly character n-gram features, bigram and trigram. Such features are used to select any special characters that commence the oath statement to handle implicit oath in achieving scalability. To differentiate the oath takers of the implicit and explicit oaths, we examine any common or uncommon properties transfer between them. For this, we performed additional stylometric analysis using syntactic and lexical features. Syntactic features enable us to extract a new feature based on the rewrite rule frequencies. This feature is used to split the oath statements into chunks, where each chunk will be assigned to its corresponding morphological meaning with reference to a standard syntactic Treebank. Next, is the lexical features bag, which include token-based features, short words features, word n-gram features, vocabulary richness functions, Hapax Legomena, Hapax Dislegomena, and frequent function words that discriminate oath-takers in achieving generalization. Finally, in achieving better explanation on oaths, this research performed two-fold validation of the proposed stylometric model to oath and oath-like expressions. One, because oaths have not being studied from the computational perspective, the stylometric model is compared against an existing linguistic model of oaths. Two, various experimental results from the proposed model are compared against expert evaluation. The results showed that the proposed stylometric model of oaths has scalability in detecting implicit oaths, obtains dissimilar generalization levels in discriminating oath takers, and better adding in oath expressions explanation.
format Thesis
qualification_level Doctorate
author Alqurneh, Ahmad
author_facet Alqurneh, Ahmad
author_sort Alqurneh, Ahmad
title Computational stylometric model for oath and oath-like expressions in Quranic text
title_short Computational stylometric model for oath and oath-like expressions in Quranic text
title_full Computational stylometric model for oath and oath-like expressions in Quranic text
title_fullStr Computational stylometric model for oath and oath-like expressions in Quranic text
title_full_unstemmed Computational stylometric model for oath and oath-like expressions in Quranic text
title_sort computational stylometric model for oath and oath-like expressions in quranic text
granting_institution Universiti Putra Malaysia
publishDate 2014
url http://psasir.upm.edu.my/id/eprint/60534/1/FSKTM%202014%2032IR.pdf
_version_ 1747812278286155776
spelling my-upm-ir.605342018-05-08T03:10:47Z Computational stylometric model for oath and oath-like expressions in Quranic text 2014-10 Alqurneh, Ahmad Computational stylometry is to analyze a written text based on measurable style markers. The current major challenges in stylometric analysis are to be found in terms of scalability, which concerns mainly in authorship attribution applications as how to handle short texts in terms of identifying author; generalization, which investigates property transfer that happen among different text; and explanation, which focuses on support to increase understanding rather than maximizing performance. To deal with these issues in the domain of Quran, the definitions of scalability, generalizations, and explanations issues are to reshape to be compatible with Quran area. Hence, we illustrate the problem within the domain of oaths and oath-like expressions in the Quran due to its indirect declaration in the context of its literary style as implicit oaths. Therefore, in this research, the scalability issue concerns on how far stylometric features are scalable to detect implicit oaths that can lead to the identification of the oath taker. Generalization issue concerns on measuring any style properties transfer between implicit and explicit oaths that emerge the difference in style characteristics of oath takers. Finally, explanation concerns whether stylometry is able to achieve better understating for the stylistics knowledge of oath compare to the linguistic studies. Quranic oath is a swear from Quranic verses used to emphasize the importance or truthfulness of a concept that follows the oath in the verse. Oaths are multifaceted and rich expressions. Oath expressions in Quranic texts are with implicit and explicit forms. Explicit form of oath expression is based on existing of swearing verb while, implicit form is devoid from the swearing verb. This lead to several interpretations that comes from a variety of aspects. While the Quranic oath has been widely studied from theoretical base in Islamic Studies, it has not being treated from computational perspective. This research proposes a new computational stylometric model for oath and oath like expression detection for implicit and explicit oaths. The model is to detect explicit oaths as well as implicit oaths, inspect any common stylistics properties between the two forms and, has well explanation of the detected oath aspects. This work proposes an oath-like expression detection algorithm (OLEDA) to develop a computational stylometric model for oaths based on stylometric applicationspecific features which are structural and content-specific features. The selection of such features is because they can be defined in certain text domains or languages. The aim of these features is to detect both forms of oaths, i.e implicit as well as explicit. Subsequently, the application-specific features are evaluated in terms of their activity towards oath detection through a series of machine learning experiments using various classifiers such as the Bayesian network. To improve oath detection by OLEDA with scalable features in stylometry, character features are added, particularly character n-gram features, bigram and trigram. Such features are used to select any special characters that commence the oath statement to handle implicit oath in achieving scalability. To differentiate the oath takers of the implicit and explicit oaths, we examine any common or uncommon properties transfer between them. For this, we performed additional stylometric analysis using syntactic and lexical features. Syntactic features enable us to extract a new feature based on the rewrite rule frequencies. This feature is used to split the oath statements into chunks, where each chunk will be assigned to its corresponding morphological meaning with reference to a standard syntactic Treebank. Next, is the lexical features bag, which include token-based features, short words features, word n-gram features, vocabulary richness functions, Hapax Legomena, Hapax Dislegomena, and frequent function words that discriminate oath-takers in achieving generalization. Finally, in achieving better explanation on oaths, this research performed two-fold validation of the proposed stylometric model to oath and oath-like expressions. One, because oaths have not being studied from the computational perspective, the stylometric model is compared against an existing linguistic model of oaths. Two, various experimental results from the proposed model are compared against expert evaluation. The results showed that the proposed stylometric model of oaths has scalability in detecting implicit oaths, obtains dissimilar generalization levels in discriminating oath takers, and better adding in oath expressions explanation. Computational linguistics - Research Qurʼan - Language, style 2014-10 Thesis http://psasir.upm.edu.my/id/eprint/60534/ http://psasir.upm.edu.my/id/eprint/60534/1/FSKTM%202014%2032IR.pdf text en public doctoral Universiti Putra Malaysia Computational linguistics - Research Qurʼan - Language, style