NORMA eResearch @NCI Library

Protein Sequence Classification using Machine Learning and Deep Learning

Siddha, Shravanee Shekhar (2020) Protein Sequence Classification using Machine Learning and Deep Learning. Masters thesis, Dublin, National College of Ireland.

[img]
Preview
PDF (Master of Science)
Download (506kB) | Preview
[img]
Preview
PDF (Configuration manual)
Download (492kB) | Preview

Abstract

A number of protein sequences are found and added to the database but its functional properties are unknown. The experiments carried out in the laboratory consume a considerable amount of time for predicting the functions of a protein. Thus, this gives rise to the need of using computational methods for the classification of protein sequences into the respective family. Protein family classification can significantly contribute in the prediction of protein function based on sequence motifs. These factors promote proteomics as a very important area in the field of modern computational biology. This project provides an approach for protein sequence classification using Natural Language Processing (TF-IDF and Word Embedding). Different machine learning models like Decision Tree, Random Forest and deep learning models like Convolutional Neural Network, Long Short-Term Memory were developed and compared for generating efficient protein classification system. The results showcased that Decision Tree showed the highest accuracy of 78.71%, followed by Random Forest and were much faster.
Keywords: Protein Sequence Classification, Proteomics, NLP, TF-IDF, Word Embedding Decision Tree, Random Forest, Convolutional Neural Network, Long Short-Term Memory

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science

Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 25 Jan 2021 15:20
Last Modified: 25 Jan 2021 15:20
URI: http://norma.ncirl.ie/id/eprint/4472

Actions (login required)

View Item View Item