Bose, Sayok Kumar (2022) Generating Python Code from Docstrings using OpenNMT. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (6MB) | Preview |
Preview |
PDF (Configuration manual)
Download (2MB) | Preview |
Abstract
In the last two years with the birth of large language models we have seen some great advancements in the area of code generation in the past 2 years. With more years to come it is expected to rake in more developers from the current statistics of 26 million and stats show an exponential increase of code commits to GitHub every day. Which brings us to the idea considering writing code or a piece of a software can be scaffolded with the help of AI assisted systems. This following piece of article deals with the use two techniques firstly Neural machine translation and Decoder only Language Model built from scratch using OpenNMT Toolkit to generate python code from the docstrings that is scraped out of public GitHub repositories. The data source that we use for the research is CodeSearchNet (CSN) which is a cleaned dataset of code and docstring pairs. Moreover, the performance of the model is evaluated by human intervention, BLEU scores, Language Linting Tools and IDEs.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Tamara Malone |
Date Deposited: | 19 Jan 2023 13:33 |
Last Modified: | 06 Mar 2023 15:45 |
URI: | https://norma.ncirl.ie/id/eprint/6094 |
Actions (login required)
View Item |