Augestin, Anjali (2024) Eight-state Protein Secondary Structure Prediction Using NLP and Deep Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (1MB) | Preview |
Abstract
The protein secondary structure prediction(PSSP) is a significant task in bioinformatics, as it determines the structural arrangement such as -helices, beta-sheets, and random coils, of amino acids. These structures are used to identify the 3D structure of protein, which in turn determines the function of each protein. This research mainly investigates the effects of Natural Language Processing (NLP) techniques in integration with deep learning models to predict the eight-state protein secondary structure prediction. NLP methods such as Word2Vec, GloVe, and ESM are used for retrieving embeddings from the amino acid sequences and the study compares their effectiveness in capturing contextual protein features. The LSTM and BiLSTM with attention mechanisms used for model training, improve prediction accuracy, while challenges such as class imbalance and the inability to identify all DSSP8 states remain. The findings highlight the potential of language models but emphasize the need for incorporating additional features like PSSM and resampling strategies to enhance class prediction. This study lays a foundation for future work in integrating contextual information for improved PSSP accuracy.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Niculescu, Hamilton UNSPECIFIED |
Uncontrolled Keywords: | Bioinformatics; protein secondary structure prediction; Natural Language Processing; Word2vec; glove; ESM; LSTM; BiLSTM |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Ciara O'Brien |
Date Deposited: | 01 Sep 2025 14:26 |
Last Modified: | 01 Sep 2025 14:26 |
URI: | https://norma.ncirl.ie/id/eprint/8675 |
Actions (login required)
![]() |
View Item |