Text to image synthesis using generative adversarial network

The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. Howe...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Yong Xuan
Format:	Thesis
Published:	2022
Subjects:	Q300-390 Cybernetics
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-mmu-ep.11543
record_format	uketd_dc
spelling	my-mmu-ep.115432023-07-18T05:23:06Z Text to image synthesis using generative adversarial network 2022-12 Tan, Yong Xuan Q300-390 Cybernetics The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. However, text-to-image synthesis remains challenging in synthesising realistic and semantically consistent images. In this thesis, text-to-image synthesis frameworks are proposed to synthesise highly realistic images that are conditioned on the text description. This research proposes GANs-based text-to-image synthesis frameworks to utilise advanced model architecture in synthesising large-scale realistic images. Moreover, self-supervised learning is the first to be investigated into text-toimage synthesis to explore high-level structural information in synthesising complex objects. In this work, three novel text-to-image synthesise frameworks are designed which are referred to as: (1) Self-supervised Residual Generative Adversarial Network (SResGAN), (2) Multi-scale Self-supervised Residual Generative Adversarial Network (MSResGAN), and (3) Multi-scale Refined Self-supervised Residual Generative Adversarial Network (MRSResGAN). SResGAN investigates advanced model architecture with the residual network to produce large scale and realistic images. For better visual realism, MSResGAN and MRSResGAN employed multi-scale GAN architecture to synthesise images from a lower scale to a larger scale. All designed frameworks manage to synthesise highly realistic images based on the received text description. Besides that, the proposed frameworks are integrated with self-supervised learning via a rotation task to eliminate low-data regime and diversify the model learned representation. In doing so, the models are able to maximise the high-level structural information throughout the network and synthesise more diverse image content. Furthermore, the proposed frameworks are integrated with feature matching, L1 distance loss, and one-sided label smoothing to stabilise the model training. All three frameworks are evaluated on two benchmark text-to-image synthesis datasets, namely Oxford-102 and CUB-200-2011. The performance of the frameworks is measured by three evaluation metrics including Inception Score, Fréchet Inception Distance, and Structural Similarity Index. Based on the experiment results, three frameworks are managed to outperform several existing text-to-image synthesis approaches on all two benchmark text-to-image synthesis datasets. 2022-12 Thesis http://shdl.mmu.edu.my/11543/ http://erep.mmu.edu.my/ masters Multimedia University Faculty of Information Science and Technology (FIST) EREP ID: 10860
institution	Multimedia University
collection	MMU Institutional Repository
topic	Q300-390 Cybernetics
spellingShingle	Q300-390 Cybernetics Tan, Yong Xuan Text to image synthesis using generative adversarial network
description	The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. However, text-to-image synthesis remains challenging in synthesising realistic and semantically consistent images. In this thesis, text-to-image synthesis frameworks are proposed to synthesise highly realistic images that are conditioned on the text description. This research proposes GANs-based text-to-image synthesis frameworks to utilise advanced model architecture in synthesising large-scale realistic images. Moreover, self-supervised learning is the first to be investigated into text-toimage synthesis to explore high-level structural information in synthesising complex objects. In this work, three novel text-to-image synthesise frameworks are designed which are referred to as: (1) Self-supervised Residual Generative Adversarial Network (SResGAN), (2) Multi-scale Self-supervised Residual Generative Adversarial Network (MSResGAN), and (3) Multi-scale Refined Self-supervised Residual Generative Adversarial Network (MRSResGAN). SResGAN investigates advanced model architecture with the residual network to produce large scale and realistic images. For better visual realism, MSResGAN and MRSResGAN employed multi-scale GAN architecture to synthesise images from a lower scale to a larger scale. All designed frameworks manage to synthesise highly realistic images based on the received text description. Besides that, the proposed frameworks are integrated with self-supervised learning via a rotation task to eliminate low-data regime and diversify the model learned representation. In doing so, the models are able to maximise the high-level structural information throughout the network and synthesise more diverse image content. Furthermore, the proposed frameworks are integrated with feature matching, L1 distance loss, and one-sided label smoothing to stabilise the model training. All three frameworks are evaluated on two benchmark text-to-image synthesis datasets, namely Oxford-102 and CUB-200-2011. The performance of the frameworks is measured by three evaluation metrics including Inception Score, Fréchet Inception Distance, and Structural Similarity Index. Based on the experiment results, three frameworks are managed to outperform several existing text-to-image synthesis approaches on all two benchmark text-to-image synthesis datasets.
format	Thesis
qualification_level	Master's degree
author	Tan, Yong Xuan
author_facet	Tan, Yong Xuan
author_sort	Tan, Yong Xuan
title	Text to image synthesis using generative adversarial network
title_short	Text to image synthesis using generative adversarial network
title_full	Text to image synthesis using generative adversarial network
title_fullStr	Text to image synthesis using generative adversarial network
title_full_unstemmed	Text to image synthesis using generative adversarial network
title_sort	text to image synthesis using generative adversarial network
granting_institution	Multimedia University
granting_department	Faculty of Information Science and Technology (FIST)
publishDate	2022
_version_	1776101417046507520

Text to image synthesis using generative adversarial network

Similar Items