Text to image synthesis using generative adversarial network
The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. Howe...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2022
|
Subjects: | |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-mmu-ep.11543 |
---|---|
record_format |
uketd_dc |
spelling |
my-mmu-ep.115432023-07-18T05:23:06Z Text to image synthesis using generative adversarial network 2022-12 Tan, Yong Xuan Q300-390 Cybernetics The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. However, text-to-image synthesis remains challenging in synthesising realistic and semantically consistent images. In this thesis, text-to-image synthesis frameworks are proposed to synthesise highly realistic images that are conditioned on the text description. This research proposes GANs-based text-to-image synthesis frameworks to utilise advanced model architecture in synthesising large-scale realistic images. Moreover, self-supervised learning is the first to be investigated into text-toimage synthesis to explore high-level structural information in synthesising complex objects. In this work, three novel text-to-image synthesise frameworks are designed which are referred to as: (1) Self-supervised Residual Generative Adversarial Network (SResGAN), (2) Multi-scale Self-supervised Residual Generative Adversarial Network (MSResGAN), and (3) Multi-scale Refined Self-supervised Residual Generative Adversarial Network (MRSResGAN). SResGAN investigates advanced model architecture with the residual network to produce large scale and realistic images. For better visual realism, MSResGAN and MRSResGAN employed multi-scale GAN architecture to synthesise images from a lower scale to a larger scale. All designed frameworks manage to synthesise highly realistic images based on the received text description. Besides that, the proposed frameworks are integrated with self-supervised learning via a rotation task to eliminate low-data regime and diversify the model learned representation. In doing so, the models are able to maximise the high-level structural information throughout the network and synthesise more diverse image content. Furthermore, the proposed frameworks are integrated with feature matching, L1 distance loss, and one-sided label smoothing to stabilise the model training. All three frameworks are evaluated on two benchmark text-to-image synthesis datasets, namely Oxford-102 and CUB-200-2011. The performance of the frameworks is measured by three evaluation metrics including Inception Score, Fréchet Inception Distance, and Structural Similarity Index. Based on the experiment results, three frameworks are managed to outperform several existing text-to-image synthesis approaches on all two benchmark text-to-image synthesis datasets. 2022-12 Thesis http://shdl.mmu.edu.my/11543/ http://erep.mmu.edu.my/ masters Multimedia University Faculty of Information Science and Technology (FIST) EREP ID: 10860 |
institution |
Multimedia University |
collection |
MMU Institutional Repository |
topic |
Q300-390 Cybernetics |
spellingShingle |
Q300-390 Cybernetics Tan, Yong Xuan Text to image synthesis using generative adversarial network |
description |
The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. However, text-to-image synthesis remains challenging in synthesising realistic and semantically consistent images. In this thesis, text-to-image synthesis frameworks are proposed to synthesise highly realistic images that are conditioned on the text description. This research proposes GANs-based text-to-image synthesis frameworks to utilise advanced model architecture in synthesising large-scale realistic images. Moreover, self-supervised learning is the first to be investigated into text-toimage synthesis to explore high-level structural information in synthesising complex objects. In this work, three novel text-to-image synthesise frameworks are designed which are referred to as: (1) Self-supervised Residual Generative Adversarial Network (SResGAN), (2) Multi-scale Self-supervised Residual Generative Adversarial Network (MSResGAN), and (3) Multi-scale Refined Self-supervised Residual Generative Adversarial Network (MRSResGAN). SResGAN investigates advanced model architecture with the residual network to produce large scale and realistic images. For better visual realism, MSResGAN and MRSResGAN employed multi-scale GAN architecture to synthesise images from a lower scale to a larger scale. All designed frameworks manage to synthesise highly realistic images based on the received text description. Besides that, the proposed frameworks are integrated with self-supervised learning via a rotation task to eliminate low-data regime and diversify the model learned representation. In doing so, the models are able to maximise the high-level structural information throughout the network and synthesise more diverse image content. Furthermore, the proposed frameworks are integrated with feature matching, L1 distance loss, and one-sided label smoothing to stabilise the model training. All three frameworks are evaluated on two benchmark text-to-image synthesis datasets, namely Oxford-102 and CUB-200-2011. The performance of the frameworks is measured by three evaluation metrics including Inception Score, Fréchet Inception Distance, and Structural Similarity Index. Based on the experiment results, three frameworks are managed to outperform several existing text-to-image synthesis approaches on all two benchmark text-to-image synthesis datasets. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Tan, Yong Xuan |
author_facet |
Tan, Yong Xuan |
author_sort |
Tan, Yong Xuan |
title |
Text to image synthesis using generative adversarial network |
title_short |
Text to image synthesis using generative adversarial network |
title_full |
Text to image synthesis using generative adversarial network |
title_fullStr |
Text to image synthesis using generative adversarial network |
title_full_unstemmed |
Text to image synthesis using generative adversarial network |
title_sort |
text to image synthesis using generative adversarial network |
granting_institution |
Multimedia University |
granting_department |
Faculty of Information Science and Technology (FIST) |
publishDate |
2022 |
_version_ |
1776101417046507520 |