Speech synthesis module with adaptive emotional expression /

Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of spee...

Full description

Saved in:
Bibliographic Details
Main Author: Mahmood, Ahmed Mustafa (Author)
Format: Thesis
Language:English
Published: Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2010
Subjects:
Online Access:http://studentrepo.iium.edu.my/handle/123456789/5230
Tags: Add Tag
No Tags, Be the first to tag this record!
LEADER 045260000a22004810004500
001 489903
005 20100426160000.0
008 181108s2010 my a f m 000 eng d
035 |a (Sirsi) 489903 
040 |a UIAM  |b eng  |e rda 
041 |a eng 
043 |a a-my--- 
050 |a TK7882.S65 
100 1 |a Mahmood, Ahmed Mustafa,  |e author 
245 1 0 |a Speech synthesis module with adaptive emotional expression /  |c by Ahmed Mustafa Mahmood 
264 1 |a Kuala Lumpur :  |b Kulliyyah of Engineering, International Islamic University Malaysia,  |c 2010 
300 |a xv, 117 leaves :   |b illustrations ;   |c 30 cm. 
336 |2 rdacontent  |a text 
337 |2 rdamedia  |a unmediated 
337 |2 rdamedia  |a computer 
338 |2 rdacarrier  |a volume 
338 |2 rdacarrier  |a computer disc 
338 |2 rdacarrier  |a online resource 
347 |2 rdaft  |a text file  |b PDF 
500 |a Abstracts in English and Arabic. 
500 |a "A dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science (Computer and Information Engineering)." --On title page. 
502 |a Thesis (MSCIE) -- International Islamic University Malaysia, 2010. 
504 |a Includes bibliographical references (leaves 81-84). 
520 |a Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of speech synthesise research. The quality of emotional speech synthesis is judged by its intelligibility and similarity to natural speech. High quality speech is achievable using the high computational cost unit selection technology. This technology relays on huge sets of recorded speech segments to achieve optimum quality. On the other hand, diphone synthesis technology utilizes computational resources and storage spaces. Its quality is less than unit selection, however, due to the introduction of many digital signal processing algorithms such as the PSOLA algorithm, more natural results was achievable. Emotional speech synthesis research has two significant trends. The first is unit selection based synthesis that aims to fulfil market needs regardless of resource utilization, and the second is diphone based synthesis that is often non-commercial, and oriented to develop intelligent algorithms that utilizes minimum resources to achieve natural output. In this thesis, the possibilities of achieving high quality speech using low computational cost systems are investigated. The diphone synthesis is chosen as the speech synthesis technology. The existing approaches to emotional emulation is analysed to determine aspects that could be further enhanced. Two aspects are highlighted: formant relation to emotions and the deterministic nature of pitch pattern relation to emotion. These asoects does not receive much attention from the existing approaches. Two algorithms are proposed to address these two aspects: formant manipulation, and deterministic pitch pattern generation algorithm. These algorithm are incorporated into one TTS system. The quality of speech synthesis of the proposed system is evaluated using the recently developed objective evaluation methods. The results show significantly small values of simulation error, the mean square error values for happy, sad, fear and anger emotions respectively are: 0.03225, 0.12928, 0.02513 and 0.02429. This margin of error value provides an evidence of the accuracy of the proposed system. 
596 |a 1 
650 |a Speech synthesis 
650 |a Speech synthesis   |x Computer programme 
650 |a Speech processing systems 
650 |a Human-computer interaction 
650 |a Emotions  
655 7 |a Theses, IIUM local 
690 |a Dissertations, Academic  |x Department of Electrical and Computer Engineering  |z IIUM 
710 2 |a International Islamic University Malaysia.  |b Department of Electrical and Computer Engineering 
856 |u http://studentrepo.iium.edu.my/handle/123456789/5230 
900 |a hj-fs, sbh-aaz 
999 |c 432558  |d 466418 
952 |0 0  |6 T TK 007882 S65 M215S 2010  |7 0  |8 THESES  |9 755740  |a IIUM  |b IIUM  |c MULTIMEDIA  |g 0.00  |o t TK 7882 S65 M215S 2010  |p 00011169497  |r 2019-08-07  |t 1  |v 0.00  |y THESIS 
952 |0 0  |6 TS CDF TK 7882 S65 M215S 2010  |7 0  |8 THESES  |9 838719  |a IIUM  |b IIUM  |c MULTIMEDIA  |g 0.00  |o ts cdf TK 7882 S65 M215S 2010  |p 00011169498  |r 2019-08-07  |t 1  |v 0.00  |y THESISDIG