NORMA eResearch @NCI Library

ICD-10 Code Prediction using Machine Learning

Samynathan, Sarath Kumar (2022) ICD-10 Code Prediction using Machine Learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (366kB) | Preview


Unstructured data such as free text available in Electronic Health Records used in medical organizations are complex and hard to handle manually. Machine learning can be used to convert free text into vectors of tokens with the help of natural language processing and predict the outcome based on such conversion. Our aim of this project was to predict the ICD codes based on the synonyms of the diseases and we have predicted with the help of various machine learning algorithms and neural networks. We have used Random Forest, Support Vector, Logistic regression, Naive Bayes, KNN and MLP classifier to predict the ICD codes based on synonyms. We have also used the multilanguage embedding LASER so that the work can be used in multiple languages. The algorithm with highest accuracy is a random forest classifier, achieving 98% accuracy. Our results show that a reliable way to search for medical synonyms is made possible with traditional machine learning techniques. We therefore think that any user of such an application can successfully predict ICD codes based on synonyms.

Item Type: Thesis (Masters)
Uncontrolled Keywords: electronic health records; Random forest; Support vector classifier; MLP classifier; Streamlit
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
H Social Sciences > HM Sociology > Information Science > Communication > Medical Informatics
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 10 Mar 2023 16:26
Last Modified: 10 Mar 2023 16:26

Actions (login required)

View Item View Item