Gated recurrent unit for low power wake-word detection

Neural networks made some of the latest state of the art technologies such as speech recognition, language translation and stock prediction possible. Among them, speech recognition is a very popular application which is growing rapidly. It is widely used in applications such as mobile phones and Ama...

Full description

Saved in:
Bibliographic Details
Main Author: Chin, Jian Qee
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/id/eprint/96438/1/JianQeeChinMFABU2021.pdf.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Neural networks made some of the latest state of the art technologies such as speech recognition, language translation and stock prediction possible. Among them, speech recognition is a very popular application which is growing rapidly. It is widely used in applications such as mobile phones and Amazon smart speakers in order to enhance user experience. However, neural networks used for speech recognition require a large amount of computations, especially if it is in always-on state. This made it infeasible to be implemented in battery-powered edge devices such as wearables, sensors, and internet-of-things devices, as the battery life will not last long enough to provide a good user experience. To address this issue, this work enhances the recurrent neural network (RNN), or specifically, Gated Recurrent Unit (GRU) for the task ofwake-word detection. Awake-word detector is always powered-on, listening to a specific phrase, the wake-word. Therefore, the power consumption must be low enough to enable long battery usage – a feature that is sought by many end-consumers. This work proposes four modifications to the existing GRU architecture. First, the reset gate is removed as there are researches which implies that it is not needed in application such as speech recognition. Second, the activation function is changed from the conventional sigmoid/hyperbolic tangent function to softsign function. Third, weight quantization is carried out to reduce the memory footprint and speed up calculations. Fourth, fixed point arithmetic is used instead of floating point format. With the above enhancements in architecture, memory and power consumption is reduced while keeping the impact to the accuracy minimal. Furthermore, it is possible to embed this new neural network model to battery-powered edge devices such as wearables. In summary, this work explores the possibility of implementing an improved GRU architecture in batterypowered edge devices to enable low-power usage for speech recognition purpose.