NORMA eResearch @NCI Library

Detection of Polycystic Ovary Syndrome using Machine Learning Algorithms

Bhat, Shakoor Ahmad (2021) Detection of Polycystic Ovary Syndrome using Machine Learning Algorithms. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (4MB) | Preview


This research focus on data driven detection of PolyCystic Ovary Syndrome (PCOS) which is an medical disorder causes female fertility, affecting women in their childbearing age, even steady far off the reproductive age. This medical disorder leads to risk of complex long-term complications. Considering the supreme identification abilities of boosting and begging algorithms, especially in the medical domain. We combined Extreme Boosting with Random Forest(XGBRF). We proposed a new method, such as XGBRF and CatBoost model for early identification of PolyCystic Ovary Syndrome. To completely support this effective classification performance, data were re-sampled based on Synthetic Minority Over-sampling Techniques(SMOTE) to solve outliers and data imbalance issues. By exploiting univariate feature selection method, we identified top 10 important clinical and metabolic parameters which classify PolyCystic Ovary Syndrome conditions. We found that FSH(Follicle-stimulating hormone) is one of the significant parameter followed by LH(Luteinizing hormone). We tested models based on evaluation matrices such as Accuracy, Precision, Recall, F1-score, ROC curve plot, AUC score and K Fold Cross validation. At last, we investigate our model on a PCOS dataset collected from Kaggle repository to justify our novel approach. The other classifiers such as Gradient Boosting, Random Forest, Logistic regression, HRFLR, SVM, Decision Tree and MLP were applied as baseline approach to compare the results. Findings show that CatBoost and XGBRF outperformed all other models with an accuracy score of 0.95 and 0.89 respectively applied on top 10 parameters. Hence, CatBoost is suitable for detecting PolyCystic Ovary Syndrome.

Item Type: Thesis (Masters)
Uncontrolled Keywords: PCOS; Machine learning; Medical domain; Feature selection; Tuning; Boosting
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
R Medicine > R Medicine (General)
H Social Sciences > HM Sociology > Information Science > Communication > Medical Informatics
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Clara Chan
Date Deposited: 15 Nov 2021 12:56
Last Modified: 06 Dec 2021 10:33

Actions (login required)

View Item View Item