Speech synthesis module with adaptive emotional expression /
Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of spee...
Saved in:
主要作者: | |
---|---|
格式: | Thesis |
語言: | English |
出版: |
Kuala Lumpur :
Kulliyyah of Engineering, International Islamic University Malaysia,
2010
|
主題: | |
在線閱讀: | http://studentrepo.iium.edu.my/handle/123456789/5230 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
總結: | Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of speech synthesise research. The quality of emotional speech synthesis is judged by its intelligibility and similarity to natural speech. High quality speech is achievable using the high computational cost unit selection technology. This technology relays on huge sets of recorded speech segments to achieve optimum quality. On the other hand, diphone synthesis technology utilizes computational resources and storage spaces. Its quality is less than unit selection, however, due to the introduction of many digital signal processing algorithms such as the PSOLA algorithm, more natural results was achievable. Emotional speech synthesis research has two significant trends. The first is unit selection based synthesis that aims to fulfil market needs regardless of resource utilization, and the second is diphone based synthesis that is often non-commercial, and oriented to develop intelligent algorithms that utilizes minimum resources to achieve natural output. In this thesis, the possibilities of achieving high quality speech using low computational cost systems are investigated. The diphone synthesis is chosen as the speech synthesis technology. The existing approaches to emotional emulation is analysed to determine aspects that could be further enhanced. Two aspects are highlighted: formant relation to emotions and the deterministic nature of pitch pattern relation to emotion. These asoects does not receive much attention from the existing approaches. Two algorithms are proposed to address these two aspects: formant manipulation, and deterministic pitch pattern generation algorithm. These algorithm are incorporated into one TTS system. The quality of speech synthesis of the proposed system is evaluated using the recently developed objective evaluation methods. The results show significantly small values of simulation error, the mean square error values for happy, sad, fear and anger emotions respectively are: 0.03225, 0.12928, 0.02513 and 0.02429. This margin of error value provides an evidence of the accuracy of the proposed system. |
---|---|
Item Description: | Abstracts in English and Arabic. "A dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science (Computer and Information Engineering)." --On title page. |
實物描述: | xv, 117 leaves : illustrations ; 30 cm. |
參考書目: | Includes bibliographical references (leaves 81-84). |