Analyzing DNA Sequences Using Clustering Algorithm

Data mining gives a bright prospective in DNA sequences analysis through its concepts and techniques. This study carries out exploratory data analysis method to cluster DNA sequences.Feature vectors have been developed to map the DNA sequences to a twelve-dimensional vector in the space. Lysozyme, M...

Full description

Saved in:

Bibliographic Details
Main Author:	Alhersh, Taha Talib Ragheb
Format:	Thesis
Language:	eng eng
Published:	2009
Subjects:	QA76 Computer software
Online Access:	https://etd.uum.edu.my/1913/1/Taha_Taleb_Ragheb_Alhersh.pdf https://etd.uum.edu.my/1913/2/1.Taha_Taleb_Ragheb_Alhersh.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-uum-etd.1913
record_format	uketd_dc
institution	Universiti Utara Malaysia
collection	UUM ETD
language	eng eng
topic	QA76 Computer software
spellingShingle	QA76 Computer software Alhersh, Taha Talib Ragheb Analyzing DNA Sequences Using Clustering Algorithm
description	Data mining gives a bright prospective in DNA sequences analysis through its concepts and techniques. This study carries out exploratory data analysis method to cluster DNA sequences.Feature vectors have been developed to map the DNA sequences to a twelve-dimensional vector in the space. Lysozyme, Myoglobin and Rhodopsin protein families have been tested in this space. The results of DNA sequences comparison among homologous sequences give close distances between their characterization vectors which are easily distinguishable from non-homologous in experiment it with a fixed DNA sequence size that does not exceed the maximum length of the shortest DNA sequence. Global comparison for multiple DNA sequences simultaneously presented in the genomic space is the main advantage of this work by applying direct comparison of the corresponding characteristic vectors distances. The novelty of this work is that for the new DNA sequence, there is no need to compare the new DNA sequence with the whole DNA sequences length, just the comparison focused on a fixed number of all the sequences in a way that does not exceed the maximum length of the new DNA sequence. In other words, parts of the DNA sequence can identify the functionality of the DNA sequence, and make it clustered with its family members.
format	Thesis
qualification_name	masters
qualification_level	Master's degree
author	Alhersh, Taha Talib Ragheb
author_facet	Alhersh, Taha Talib Ragheb
author_sort	Alhersh, Taha Talib Ragheb
title	Analyzing DNA Sequences Using Clustering Algorithm
title_short	Analyzing DNA Sequences Using Clustering Algorithm
title_full	Analyzing DNA Sequences Using Clustering Algorithm
title_fullStr	Analyzing DNA Sequences Using Clustering Algorithm
title_full_unstemmed	Analyzing DNA Sequences Using Clustering Algorithm
title_sort	analyzing dna sequences using clustering algorithm
granting_institution	Universiti Utara Malaysia
granting_department	College of Arts and Sciences (CAS)
publishDate	2009
url	https://etd.uum.edu.my/1913/1/Taha_Taleb_Ragheb_Alhersh.pdf https://etd.uum.edu.my/1913/2/1.Taha_Taleb_Ragheb_Alhersh.pdf
_version_	1747827231075336192
spelling	my-uum-etd.19132022-04-21T03:28:29Z Analyzing DNA Sequences Using Clustering Algorithm 2009 Alhersh, Taha Talib Ragheb College of Arts and Sciences (CAS) College of Arts and Sciences QA76 Computer software Data mining gives a bright prospective in DNA sequences analysis through its concepts and techniques. This study carries out exploratory data analysis method to cluster DNA sequences.Feature vectors have been developed to map the DNA sequences to a twelve-dimensional vector in the space. Lysozyme, Myoglobin and Rhodopsin protein families have been tested in this space. The results of DNA sequences comparison among homologous sequences give close distances between their characterization vectors which are easily distinguishable from non-homologous in experiment it with a fixed DNA sequence size that does not exceed the maximum length of the shortest DNA sequence. Global comparison for multiple DNA sequences simultaneously presented in the genomic space is the main advantage of this work by applying direct comparison of the corresponding characteristic vectors distances. The novelty of this work is that for the new DNA sequence, there is no need to compare the new DNA sequence with the whole DNA sequences length, just the comparison focused on a fixed number of all the sequences in a way that does not exceed the maximum length of the new DNA sequence. In other words, parts of the DNA sequence can identify the functionality of the DNA sequence, and make it clustered with its family members. 2009 Thesis https://etd.uum.edu.my/1913/ https://etd.uum.edu.my/1913/1/Taha_Taleb_Ragheb_Alhersh.pdf text eng public https://etd.uum.edu.my/1913/2/1.Taha_Taleb_Ragheb_Alhersh.pdf text eng public masters masters Universiti Utara Malaysia Abonyi, J., & Feil, B. (2005). Computational Intelligence in Data Mining. Informatica,29, 3-12.Aksoy, S., & Haralick, R. M. (1999). Graph–Theoretic Clustering for Image Grouping and Retrieval. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), 1, 1063. Anastassiou, D. (2000). Frequency-domain analysis of biomolecular sequences.Bioinformatics, 16(4), 1073-1081. Ansari, A., & Viswanathan, R. (1992). Application of Expectation-Maximization Algorithm to the Detection of Direct-Sequence Signal in pulsed Noise Jamming. IEEE Military Communications Conference, 3, 811-815.Apon, A., Mache, J., Buyya, R., & Jin, H. (2004). Cluster Computing in the Classroom and Integration with Computing Curricula 2001. IEEE Transactions on Education, 47(2), 188-195. Arasa, N., Oommenb, B. J., & Altınelc, I. K. (1999). The Kohonen network incorporating explicit statistics and its application to the travelling salesman problem. Neural Networks, 12(9), 1273-1284.Ayre, L. B. (2006). Data Mining for Information Professionals.Bach, F. R., & Jordan, M. I. (2003). Learning Spectral Clustering. Learning graphical models with Mercer kernels in Advances Neural Inform, 1, 1009-1016.Bolshoy, A., & Volkovich, Z. (2008). Whole-genome prokaryotic clustering based on gene lengths. Discrete Applied Mathematics, 157(10), 2370-2377.65 Borman, S. (2009). The Expectation Maximization Algorithm A short tutorial.Carvalho, F. A. T. (2006). Fuzzy clustering algorithms for symbolic interval data based on adaptive and non-adaptive Euclidean distances.Draghici S., Graziano, F., Kettoola, S., Sethi, I., & Towfic, G. (2003). Mining HIV dynamics using independent component analysis. Bioinformatics, 19(8), 981-986.Erban, G., & Moldovan, G. S. (2006). A Comparison of Clustering Techniques in Aspect Mining. Informatica, 1, 69-78.Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases.FitzGerald, P. C., Shlyakhtenko, A., Mir, A. A., & Vinson, C. (2004). Clustering of DNA Sequences in Human Promoters. Genome Res, 14, 1562-1574.Gates, M. A. (1985). Simpler DNA sequence representations. Nature, 31, 219.Ghanem M., Chortaras, A., Guo, Y., Rowe, A., & Ratcliffe, J. (2005). A Grid Infrastructure for Mixed Bioinformatics Data and Text Mining.Graham, J., Page, C. D., & Kamal, A. (2003). Accelerating the Drug Design Process through Parallel Inductive Logic Programming Data Mining.Grammalidis, N., Bleris, L., & Strintzis, M. G. (2002). Using the Expectation-Maximization Algorithm for Depth Estimation and Segmentation of Multi-view Images.Guinepain, S., & Gruenwald, L. (2006). Automatic Database Clustering Using Data Mining.Guo, X., & Nandy, A. (2002). Numerical characterization of DNA sequences in a 2-D graphical representation scheme of low degeneracy.66 Hebert, P. D. N., Cywinska, A., Ball, S. L., & deWaard, J. R. (2003). Biological identifications through DNA barcodes. Hu, X. O., & Pan, Y. (Eds.). (2007). Knowledge Discovery in Bioinformatics Techniques,Methods, and Applications. Hoboken: Wiley.Huang, G., Liao, B., Li, Y., & Yu, Y.(2009). Similarity studies of DNA sequences based on a new 2D graphical representation.Irene, M. M. (1999). Hierarchical Clustering. Retrieved September 29, 2009, from http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/1999/clustering/node3.html Jain, A. K., & Dubes, R. C. (1988). Algorithms for Clustering Data. Upper Saddle River: Prentice-Hall.Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data Clustering: A Review. ACM Computing Surveys, 31(3).Jenssen, R., Hild, K. E., Erdogmus, D., Principe, J. C., & Eltoft, T. (n.d.). Clustering using Renyi’s Entropy.Kauer, G., & Blocker, H. (2003). Applying signal theory to the analysis of biomolecules. Bioinformatics, 19(16), 2016-2021.Kozobay-Avrahama, L., Hosid, S., Volkovich, Z., & Bolshoy, A. (2008). Prokaryote clustering based on DNA curvature distributions.Liu, L., Ho, Y., & Yau, S. (2006). Clustering DNA sequences by feature vectors.Lv, T., Huang, S., Zhang, X., & Wang, Z. (2006). A Robust Hierarchical Clustering Algorithm and its Application in 3D Model Retrieval.Myller, N., Suhonen, J., & Sutinen, E. (2002). Using Data Mining for Improving Web- Based Course Design.67 Ng, H. P., Ong, S. H., Foong, K. W. C., Goh, P. S., & Nowinski, W. L. (2006). Medical Image Segmentation Using K-Means Clustering and Improved Watershed Algorithm.Paccanaro, A., Casbon, J. A., & Saqi, M. A. S. (2006). Spectral clustering of protein sequences. Palace, B. (1996). Data Mining. Retrieved September 29, 2009, from http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htm Qi, Z., & Qi, X., (2009). Numerical characterization of DNA sequences based on digital signal method.Randi, M., Vracko, M., Ler, N., & Plavsi, D. (2002). Novel 2-D graphical representation of DNA sequences and their numerical characterization. Schenker, A. (2003). Graph-Theoretic Techniques for Web Content Mining.Silverman, B. D., & Linsker, R. (1986). A measure of DNA periodicity.Silverman, J. F., & Cooper, D. B. (1988). Bayesian Clustering for Unsupervised Estimation of Surface and Texture Models.Song, J., & Tang, H. (2005). A new 2-D graphical representation of DNA sequences and their numerical characterization.Stoeckle, M. (2003). Taxonomy, DNA, and the Bar Code of Life. BioScience, 3(9), 796-797.Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Boston:Pearson Education.68 Valgren, C., Duckett, T., & Lilienthal, A. (2007). Incremental Spectral Clustering and Its Application To Topological Mapping. IEEE International Conference on Robotics and Automation.Vinod, V. V., Chaudhury, S., Mukherjee, J., & Ghose, S. (1994). A Connectionist Approach for Clustering with Applications in Image Analysis. Visnick, L. (2003). Clustering Techniques.Voss, R. (1992). Evolution of long-range fractal correlation and 1/f noise in DNA base sequences. Physical Review Letters, 68, 3805-3808.Wang, W., & Johnson, D. H. (2002). Computing linear transforms of symbolic signals Signal Processing. IEEE Trans. Sig. Proc., 50(3), 628-634.Weiming, H. X. L., & Zhang, Z. (2007). Corner Detection of Contour Images Using Spectral Clustering.XL Miner (n.d.). Hierarchical Clustering. Retrieved September 29, 2009, from http://www.resample.com/xlminer/help/HClst/HClst_intro.htm Zhang, H., Ho, T., & Linz, M. (2004). An Evolutionary K-Means Algorithm for Clustering Time Series Data. Zhang, Q., Peng, Q., & Xu, T. (2008). DNA splice site sequences clustering method for conservativeness analysis. Zien, A. , Ratsch, G., Mika, S., Scholkopf, B., Lemmen, C., Smola, A., Lengauer, T., & Muller, K. R. (n.d.).Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites.

Analyzing DNA Sequences Using Clustering Algorithm

Similar Items