Document summarization using transfer learning

Document summarization refers to an automation method to shortening a document into a short and meaningful article. In computing, automatic summarization can basically split to two approaches where it can be done with classical method where the rank of the text is calculated and word is extracted wi...

Full description

Saved in:
Bibliographic Details
Main Author: Chong, Jing Wen
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://eprints.utm.my/id/eprint/79268/1/ChongJingWenMFKE2018.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.79268
record_format uketd_dc
spelling my-utm-ep.792682018-10-14T08:41:49Z Document summarization using transfer learning 2018 Chong, Jing Wen TK Electrical engineering. Electronics Nuclear engineering Document summarization refers to an automation method to shortening a document into a short and meaningful article. In computing, automatic summarization can basically split to two approaches where it can be done with classical method where the rank of the text is calculated and word is extracted with respect to the word’s rank. In the other hand, the task can be completed by using modern method, deep learning. In deep learning, it is slightly different that the programming is mainly prepare a model where it will learn to summarize the document. However, som pre-processing on the data is needed before it is fit into the deep learning model. All the detailed part will be discussed in this project. For this project, the sequence to sequence model will be used as the main computing unit. On top of that, word embedding layer will helps in the summarization by providing the knowledge of word’s relationship. By combining these two design, the deep learning model is able to differentiate the word with respect to the relationship. Of course, this project included some pre-processing where the data will pre-filtered and convert to data that recognized by the model. Similarly, the output will be converted back to word that understand by the human. At the end, the summarized output will be evaluate by BLUE, a benchmark for sentence similarity. As a result, the model can achieve loss as low as 0.8% and accuracy of 32%. Overall, the accuracy is capped by the size of the model. It is due to the reason that the model does not support high number of vocabulary. The design can be further improve by increasing the vocabulary size. However, the training process need to be completed by using a better hardware. In addition, the covered text is evaluated the BLEU value with respect to the expected output summary. Overall, a trainable model is designed. The model can be further improve by adding vocabulary size as well as increasing all the training set. 2018 Thesis http://eprints.utm.my/id/eprint/79268/ http://eprints.utm.my/id/eprint/79268/1/ChongJingWenMFKE2018.pdf application/pdf en public masters Universiti Teknologi Malaysia, Faculty of Electrical Engineering Faculty of Electrical Engineering
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic TK Electrical engineering
Electronics Nuclear engineering
spellingShingle TK Electrical engineering
Electronics Nuclear engineering
Chong, Jing Wen
Document summarization using transfer learning
description Document summarization refers to an automation method to shortening a document into a short and meaningful article. In computing, automatic summarization can basically split to two approaches where it can be done with classical method where the rank of the text is calculated and word is extracted with respect to the word’s rank. In the other hand, the task can be completed by using modern method, deep learning. In deep learning, it is slightly different that the programming is mainly prepare a model where it will learn to summarize the document. However, som pre-processing on the data is needed before it is fit into the deep learning model. All the detailed part will be discussed in this project. For this project, the sequence to sequence model will be used as the main computing unit. On top of that, word embedding layer will helps in the summarization by providing the knowledge of word’s relationship. By combining these two design, the deep learning model is able to differentiate the word with respect to the relationship. Of course, this project included some pre-processing where the data will pre-filtered and convert to data that recognized by the model. Similarly, the output will be converted back to word that understand by the human. At the end, the summarized output will be evaluate by BLUE, a benchmark for sentence similarity. As a result, the model can achieve loss as low as 0.8% and accuracy of 32%. Overall, the accuracy is capped by the size of the model. It is due to the reason that the model does not support high number of vocabulary. The design can be further improve by increasing the vocabulary size. However, the training process need to be completed by using a better hardware. In addition, the covered text is evaluated the BLEU value with respect to the expected output summary. Overall, a trainable model is designed. The model can be further improve by adding vocabulary size as well as increasing all the training set.
format Thesis
qualification_level Master's degree
author Chong, Jing Wen
author_facet Chong, Jing Wen
author_sort Chong, Jing Wen
title Document summarization using transfer learning
title_short Document summarization using transfer learning
title_full Document summarization using transfer learning
title_fullStr Document summarization using transfer learning
title_full_unstemmed Document summarization using transfer learning
title_sort document summarization using transfer learning
granting_institution Universiti Teknologi Malaysia, Faculty of Electrical Engineering
granting_department Faculty of Electrical Engineering
publishDate 2018
url http://eprints.utm.my/id/eprint/79268/1/ChongJingWenMFKE2018.pdf
_version_ 1747818187194368000