NORMA eResearch @NCI Library

Sub-Optimal Hyperparameter Selection for Multi-Label Classifier Chains Predicting Cardiotoxicity from Gene-Expression Data

Signorelli, Christopher (2022) Sub-Optimal Hyperparameter Selection for Multi-Label Classifier Chains Predicting Cardiotoxicity from Gene-Expression Data. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (3MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (2MB) | Preview

Abstract

Robust multi-label classifier chains are difficult to optimise, due to the large search space of base model types and hyperparameters. This project demonstrates how robust MLC models can be trained within a subspace, using a pragmatic approach. A solution is proposed for training a robust multi-label classifier chain that predicts cardiotoxicity forms from known gene-expression data and drug properties. Empirical gene-expression data from the LINCS L1000 project have been joined with drug compound properties, curated by other researchers to create the model training set. The training set includes features for gene-expression responses to drug perturbation of cancerous cells, molecular descriptors, 79-bit estate fingerprints, and perturbation times. The multiple binary labels represent various cardiotoxicity outcomes for each of the drugs. An automated hyperparameter search was conducted on a relatively small search space, and led to the creation of a robust MLC chain, called Best Means Chain, with its performance ranking in the top-3 of 100 other chains. The main finding is that for this cardiotoxicity application, training a robust MLC chain can be achieved in a relatively straightforward manner in reasonable time, albeit with the trade-off of being sub-optimal compared to searching the full hyperparameter search space.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer)
T Technology > Biomedical engineering
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 11 Mar 2023 13:25
Last Modified: 11 Mar 2023 13:25
URI: https://norma.ncirl.ie/id/eprint/6309

Actions (login required)

View Item View Item