Sreekumar, Anakha (2024) Advanced Breast Cancer Diagnosis using Machine Learning and Deep Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (898kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (2MB) | Preview |
Abstract
Among all health issues in the world, breast cancer is one of the most critical, and so there is a dire need for effective and timely diagnostic tools that support treatment and prognosis. This work explores the use of machine learning and deep learning methods for the classification of breast cancer using two benchmark datasets: the Wisconsin Breast Cancer Diagnostic Dataset and the BreakHis Histopathological Images Dataset. The Wisconsin dataset utilized structured data in machine learning models, whereas BreakHis focuses on the classification of histopathological images with the use of Convolutional Neural Networks.
The preprocessing techniques performed on this Wisconsin dataset include feature scaling and selection using RFE and Random Forest feature importance. This is done in preparing the dataset for training. A total of seven different machine learning models explored in this study involve Logistic Regression, Random Forest, and XGBoost. Random Forest came to be the best model, with an accuracy of 97.37% and very high F1-score, precision, and recall.
The BreakHis dataset consists of images from eight varieties of tumors, with four of them being benign and four malignant, hence requiring both binary and multiclass classification. The images were resized as a pre-processing step and for introducing more variation. The Conventional Neural Network (CNN) resulted in a test accuracy of 99.94% in binary classification. In multiclass classification, the CNN did well and gave an accuracy of 90.98%. Certain key performance indicators that proved the efficiency of the models were confusion matrices, precision-recall curves, and classification reports.
The present work underlines the efficiency of integrating machine learning on structured data with CNNs on image data with regard to the diagnosis of breast cancer. Results confirm the potentiality of computational models as a means to improve diagnostic performance by early diagnosis, thus assisting clinical decision-making.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Kumar, Teerath UNSPECIFIED |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > Life sciences > Medical sciences > Pathology > Tumors > Cancer R Medicine > Healthcare Industry Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Ciara O'Brien |
Date Deposited: | 05 Sep 2025 11:04 |
Last Modified: | 05 Sep 2025 11:04 |
URI: | https://norma.ncirl.ie/id/eprint/8818 |
Actions (login required)
![]() |
View Item |