Siddha, Shravanee Shekhar (2020) Protein Sequence Classification using Machine Learning and Deep Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (506kB) | Preview |
Preview |
PDF (Configuration manual)
Download (492kB) | Preview |
Abstract
A number of protein sequences are found and added to the database but its functional properties are unknown. The experiments carried out in the laboratory consume a considerable amount of time for predicting the functions of a protein. Thus, this gives rise to the need of using computational methods for the classification of protein sequences into the respective family. Protein family classification can significantly contribute in the prediction of protein function based on sequence motifs. These factors promote proteomics as a very important area in the field of modern computational biology. This project provides an approach for protein sequence classification using Natural Language Processing (TF-IDF and Word Embedding). Different machine learning models like Decision Tree, Random Forest and deep learning models like Convolutional Neural Network, Long Short-Term Memory were developed and compared for generating efficient protein classification system. The results showcased that Decision Tree showed the highest accuracy of 78.71%, followed by Random Forest and were much faster.
Keywords: Protein Sequence Classification, Proteomics, NLP, TF-IDF, Word Embedding Decision Tree, Random Forest, Convolutional Neural Network, Long Short-Term Memory
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software T Technology > T Technology (General) > Information Technology > Computer software |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Dan English |
Date Deposited: | 25 Jan 2021 15:20 |
Last Modified: | 25 Jan 2021 15:20 |
URI: | https://norma.ncirl.ie/id/eprint/4472 |
Actions (login required)
View Item |