Speech synthesis module with adaptive emotional expression /

Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of spee...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Mahmood, Ahmed Mustafa (مؤلف)
التنسيق:	أطروحة
اللغة:	English
منشور في:	Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2010
الموضوعات:	Speech synthesis Speech synthesis > Computer programme Speech processing systems Human-computer interaction Emotions Theses, IIUM local
الوصول للمادة أونلاين:	http://studentrepo.iium.edu.my/handle/123456789/5230
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!


LEADER	045260000a22004810004500
001	489903
005	20100426160000.0
008	181108s2010 my a f m 000 eng d
035			\|a (Sirsi) 489903
040			\|a UIAM \|b eng \|e rda
041			\|a eng
043			\|a a-my---
050			\|a TK7882.S65
100	1		\|a Mahmood, Ahmed Mustafa, \|e author
245	1	0	\|a Speech synthesis module with adaptive emotional expression / \|c by Ahmed Mustafa Mahmood
264		1	\|a Kuala Lumpur : \|b Kulliyyah of Engineering, International Islamic University Malaysia, \|c 2010
300			\|a xv, 117 leaves : \|b illustrations ; \|c 30 cm.
336			\|2 rdacontent \|a text
337			\|2 rdamedia \|a unmediated
337			\|2 rdamedia \|a computer
338			\|2 rdacarrier \|a volume
338			\|2 rdacarrier \|a computer disc
338			\|2 rdacarrier \|a online resource
347			\|2 rdaft \|a text file \|b PDF
500			\|a Abstracts in English and Arabic.
500			\|a "A dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science (Computer and Information Engineering)." --On title page.
502			\|a Thesis (MSCIE) -- International Islamic University Malaysia, 2010.
504			\|a Includes bibliographical references (leaves 81-84).
520			\|a Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of speech synthesise research. The quality of emotional speech synthesis is judged by its intelligibility and similarity to natural speech. High quality speech is achievable using the high computational cost unit selection technology. This technology relays on huge sets of recorded speech segments to achieve optimum quality. On the other hand, diphone synthesis technology utilizes computational resources and storage spaces. Its quality is less than unit selection, however, due to the introduction of many digital signal processing algorithms such as the PSOLA algorithm, more natural results was achievable. Emotional speech synthesis research has two significant trends. The first is unit selection based synthesis that aims to fulfil market needs regardless of resource utilization, and the second is diphone based synthesis that is often non-commercial, and oriented to develop intelligent algorithms that utilizes minimum resources to achieve natural output. In this thesis, the possibilities of achieving high quality speech using low computational cost systems are investigated. The diphone synthesis is chosen as the speech synthesis technology. The existing approaches to emotional emulation is analysed to determine aspects that could be further enhanced. Two aspects are highlighted: formant relation to emotions and the deterministic nature of pitch pattern relation to emotion. These asoects does not receive much attention from the existing approaches. Two algorithms are proposed to address these two aspects: formant manipulation, and deterministic pitch pattern generation algorithm. These algorithm are incorporated into one TTS system. The quality of speech synthesis of the proposed system is evaluated using the recently developed objective evaluation methods. The results show significantly small values of simulation error, the mean square error values for happy, sad, fear and anger emotions respectively are: 0.03225, 0.12928, 0.02513 and 0.02429. This margin of error value provides an evidence of the accuracy of the proposed system.
596			\|a 1
650			\|a Speech synthesis
650			\|a Speech synthesis \|x Computer programme
650			\|a Speech processing systems
650			\|a Human-computer interaction
650			\|a Emotions
655	7		\|a Theses, IIUM local
690			\|a Dissertations, Academic \|x Department of Electrical and Computer Engineering \|z IIUM
710	2		\|a International Islamic University Malaysia. \|b Department of Electrical and Computer Engineering
856			\|u http://studentrepo.iium.edu.my/handle/123456789/5230
900			\|a hj-fs, sbh-aaz
999			\|c 432558 \|d 466418
952			\|0 0 \|6 T TK 007882 S65 M215S 2010 \|7 0 \|8 THESES \|9 755740 \|a IIUM \|b IIUM \|c MULTIMEDIA \|g 0.00 \|o t TK 7882 S65 M215S 2010 \|p 00011169497 \|r 2019-08-07 \|t 1 \|v 0.00 \|y THESIS
952			\|0 0 \|6 TS CDF TK 7882 S65 M215S 2010 \|7 0 \|8 THESES \|9 838719 \|a IIUM \|b IIUM \|c MULTIMEDIA \|g 0.00 \|o ts cdf TK 7882 S65 M215S 2010 \|p 00011169498 \|r 2019-08-07 \|t 1 \|v 0.00 \|y THESISDIG

Speech synthesis module with adaptive emotional expression /

مواد مشابهة