Automated image based CAPTCHA solver

CAPTCHA is known as “Completely Automated Public Turing Test to tell Computers and Humans Apart”. Text-based CAPTCHA is the most common technique used across the internet to detect bot from attacking an online system. An image of distorted word is generated as computer program will have difficulty t...

Full description

Saved in:
Bibliographic Details
Main Author: Choong, Kai Bin
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://eprints.utm.my/id/eprint/78552/1/ChoongKaiBinMFKE2018.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.78552
record_format uketd_dc
spelling my-utm-ep.785522018-08-27T03:22:27Z Automated image based CAPTCHA solver 2018-01 Choong, Kai Bin TK Electrical engineering. Electronics Nuclear engineering CAPTCHA is known as “Completely Automated Public Turing Test to tell Computers and Humans Apart”. Text-based CAPTCHA is the most common technique used across the internet to detect bot from attacking an online system. An image of distorted word is generated as computer program will have difficulty to read it. In fact, human can read the text in the image CAPTCHA easily. This will help to prevent websites from being attacked by automated scripts. Hence, CAPTCHA should be considered as a win-win strategy that is able to provide security for websites from bot attack but do not cause any disturbance to the user. On the other hand, due to the advancement of pattern recognition technology, current text based CAPTCHA may not be robust enough to defend the intelligence of bot. Thus, in this project, a CAPTCHA solving algorithm is developed to investigate on the strength of CAPTCHA in defeating the bot. Besides, it is also aimed to find out the gap of text based CAPTCHA which in turn helps to develop a more robust CAPTCHA. The project methodology can be broken down into pre-processing, segmentation and character recognition. In preprocessing stage, CAPTCHA image is converted to grey image. After that, lines and dots are removed in order to get back the original word in the image. Segmentation is carried out to crop out individual characters that exist in the image CAPTCHA for character recognition purpose. After the characters have been extracted, the characters are recognized by matching them with the database. If all the characters can be recognized, the text based CAPTCHA is broken. The CAPTCHA solving algorithm was developed with MATLAB, so that it can be trained against a custom dataset. It is able to break ASP.NET text-based CAPTCHA with accuracy of 96 % and 98.86 % in term of word and character recognition respectively. 2018-01 Thesis http://eprints.utm.my/id/eprint/78552/ http://eprints.utm.my/id/eprint/78552/1/ChoongKaiBinMFKE2018.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:108397 masters Universiti Teknologi Malaysia, Faculty of Electrical Engineering Faculty of Electrical Engineering
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic TK Electrical engineering
Electronics Nuclear engineering
spellingShingle TK Electrical engineering
Electronics Nuclear engineering
Choong, Kai Bin
Automated image based CAPTCHA solver
description CAPTCHA is known as “Completely Automated Public Turing Test to tell Computers and Humans Apart”. Text-based CAPTCHA is the most common technique used across the internet to detect bot from attacking an online system. An image of distorted word is generated as computer program will have difficulty to read it. In fact, human can read the text in the image CAPTCHA easily. This will help to prevent websites from being attacked by automated scripts. Hence, CAPTCHA should be considered as a win-win strategy that is able to provide security for websites from bot attack but do not cause any disturbance to the user. On the other hand, due to the advancement of pattern recognition technology, current text based CAPTCHA may not be robust enough to defend the intelligence of bot. Thus, in this project, a CAPTCHA solving algorithm is developed to investigate on the strength of CAPTCHA in defeating the bot. Besides, it is also aimed to find out the gap of text based CAPTCHA which in turn helps to develop a more robust CAPTCHA. The project methodology can be broken down into pre-processing, segmentation and character recognition. In preprocessing stage, CAPTCHA image is converted to grey image. After that, lines and dots are removed in order to get back the original word in the image. Segmentation is carried out to crop out individual characters that exist in the image CAPTCHA for character recognition purpose. After the characters have been extracted, the characters are recognized by matching them with the database. If all the characters can be recognized, the text based CAPTCHA is broken. The CAPTCHA solving algorithm was developed with MATLAB, so that it can be trained against a custom dataset. It is able to break ASP.NET text-based CAPTCHA with accuracy of 96 % and 98.86 % in term of word and character recognition respectively.
format Thesis
qualification_level Master's degree
author Choong, Kai Bin
author_facet Choong, Kai Bin
author_sort Choong, Kai Bin
title Automated image based CAPTCHA solver
title_short Automated image based CAPTCHA solver
title_full Automated image based CAPTCHA solver
title_fullStr Automated image based CAPTCHA solver
title_full_unstemmed Automated image based CAPTCHA solver
title_sort automated image based captcha solver
granting_institution Universiti Teknologi Malaysia, Faculty of Electrical Engineering
granting_department Faculty of Electrical Engineering
publishDate 2018
url http://eprints.utm.my/id/eprint/78552/1/ChoongKaiBinMFKE2018.pdf
_version_ 1747818013264969728