NORMA eResearch @NCI Library

Advanced Breast Cancer Diagnosis using Machine Learning and Deep Learning

Sreekumar, Anakha (2024) Advanced Breast Cancer Diagnosis using Machine Learning and Deep Learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (898kB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (2MB) | Preview

Abstract

Among all health issues in the world, breast cancer is one of the most critical, and so there is a dire need for effective and timely diagnostic tools that support treatment and prognosis. This work explores the use of machine learning and deep learning methods for the classification of breast cancer using two benchmark datasets: the Wisconsin Breast Cancer Diagnostic Dataset and the BreakHis Histopathological Images Dataset. The Wisconsin dataset utilized structured data in machine learning models, whereas BreakHis focuses on the classification of histopathological images with the use of Convolutional Neural Networks.

The preprocessing techniques performed on this Wisconsin dataset include feature scaling and selection using RFE and Random Forest feature importance. This is done in preparing the dataset for training. A total of seven different machine learning models explored in this study involve Logistic Regression, Random Forest, and XGBoost. Random Forest came to be the best model, with an accuracy of 97.37% and very high F1-score, precision, and recall.

The BreakHis dataset consists of images from eight varieties of tumors, with four of them being benign and four malignant, hence requiring both binary and multiclass classification. The images were resized as a pre-processing step and for introducing more variation. The Conventional Neural Network (CNN) resulted in a test accuracy of 99.94% in binary classification. In multiclass classification, the CNN did well and gave an accuracy of 90.98%. Certain key performance indicators that proved the efficiency of the models were confusion matrices, precision-recall curves, and classification reports.

The present work underlines the efficiency of integrating machine learning on structured data with CNNs on image data with regard to the diagnosis of breast cancer. Results confirm the potentiality of computational models as a means to improve diagnostic performance by early diagnosis, thus assisting clinical decision-making.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Kumar, Teerath
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > Life sciences > Medical sciences > Pathology > Tumors > Cancer
R Medicine > Healthcare Industry
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 05 Sep 2025 11:04
Last Modified: 05 Sep 2025 11:04
URI: https://norma.ncirl.ie/id/eprint/8818

Actions (login required)

View Item View Item