Document summarization using transfer learning

Document summarization refers to an automation method to shortening a document into a short and meaningful article. In computing, automatic summarization can basically split to two approaches where it can be done with classical method where the rank of the text is calculated and word is extracted wi...

Full description

Saved in:

Bibliographic Details
Main Author:	Chong, Jing Wen
Format:	Thesis
Language:	English
Published:	2018
Subjects:	TK Electrical engineering Electronics Nuclear engineering
Online Access:	http://eprints.utm.my/id/eprint/79268/1/ChongJingWenMFKE2018.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utm-ep.79268
record_format	uketd_dc
spelling	my-utm-ep.792682018-10-14T08:41:49Z Document summarization using transfer learning 2018 Chong, Jing Wen TK Electrical engineering. Electronics Nuclear engineering Document summarization refers to an automation method to shortening a document into a short and meaningful article. In computing, automatic summarization can basically split to two approaches where it can be done with classical method where the rank of the text is calculated and word is extracted with respect to the word’s rank. In the other hand, the task can be completed by using modern method, deep learning. In deep learning, it is slightly different that the programming is mainly prepare a model where it will learn to summarize the document. However, som pre-processing on the data is needed before it is fit into the deep learning model. All the detailed part will be discussed in this project. For this project, the sequence to sequence model will be used as the main computing unit. On top of that, word embedding layer will helps in the summarization by providing the knowledge of word’s relationship. By combining these two design, the deep learning model is able to differentiate the word with respect to the relationship. Of course, this project included some pre-processing where the data will pre-filtered and convert to data that recognized by the model. Similarly, the output will be converted back to word that understand by the human. At the end, the summarized output will be evaluate by BLUE, a benchmark for sentence similarity. As a result, the model can achieve loss as low as 0.8% and accuracy of 32%. Overall, the accuracy is capped by the size of the model. It is due to the reason that the model does not support high number of vocabulary. The design can be further improve by increasing the vocabulary size. However, the training process need to be completed by using a better hardware. In addition, the covered text is evaluated the BLEU value with respect to the expected output summary. Overall, a trainable model is designed. The model can be further improve by adding vocabulary size as well as increasing all the training set. 2018 Thesis http://eprints.utm.my/id/eprint/79268/ http://eprints.utm.my/id/eprint/79268/1/ChongJingWenMFKE2018.pdf application/pdf en public masters Universiti Teknologi Malaysia, Faculty of Electrical Engineering Faculty of Electrical Engineering
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
topic	TK Electrical engineering Electronics Nuclear engineering
spellingShingle	TK Electrical engineering Electronics Nuclear engineering Chong, Jing Wen Document summarization using transfer learning
description	Document summarization refers to an automation method to shortening a document into a short and meaningful article. In computing, automatic summarization can basically split to two approaches where it can be done with classical method where the rank of the text is calculated and word is extracted with respect to the word’s rank. In the other hand, the task can be completed by using modern method, deep learning. In deep learning, it is slightly different that the programming is mainly prepare a model where it will learn to summarize the document. However, som pre-processing on the data is needed before it is fit into the deep learning model. All the detailed part will be discussed in this project. For this project, the sequence to sequence model will be used as the main computing unit. On top of that, word embedding layer will helps in the summarization by providing the knowledge of word’s relationship. By combining these two design, the deep learning model is able to differentiate the word with respect to the relationship. Of course, this project included some pre-processing where the data will pre-filtered and convert to data that recognized by the model. Similarly, the output will be converted back to word that understand by the human. At the end, the summarized output will be evaluate by BLUE, a benchmark for sentence similarity. As a result, the model can achieve loss as low as 0.8% and accuracy of 32%. Overall, the accuracy is capped by the size of the model. It is due to the reason that the model does not support high number of vocabulary. The design can be further improve by increasing the vocabulary size. However, the training process need to be completed by using a better hardware. In addition, the covered text is evaluated the BLEU value with respect to the expected output summary. Overall, a trainable model is designed. The model can be further improve by adding vocabulary size as well as increasing all the training set.
format	Thesis
qualification_level	Master's degree
author	Chong, Jing Wen
author_facet	Chong, Jing Wen
author_sort	Chong, Jing Wen
title	Document summarization using transfer learning
title_short	Document summarization using transfer learning
title_full	Document summarization using transfer learning
title_fullStr	Document summarization using transfer learning
title_full_unstemmed	Document summarization using transfer learning
title_sort	document summarization using transfer learning
granting_institution	Universiti Teknologi Malaysia, Faculty of Electrical Engineering
granting_department	Faculty of Electrical Engineering
publishDate	2018
url	http://eprints.utm.my/id/eprint/79268/1/ChongJingWenMFKE2018.pdf
_version_	1747818187194368000

Document summarization using transfer learning

Similar Items