NORMA eResearch @NCI Library

A Robust Text-to-SQL Parser With Optimized Pretraining Approach

Balaraman, Kirubakaran (2021) A Robust Text-to-SQL Parser With Optimized Pretraining Approach. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (356kB) | Preview

Abstract

Semantic parsing of natural language to Structured Query Language (SQL) has recently become a popular research topic with the release of the manually annotated WikiSQL dataset. Most recent research has used the encoder-decoder architecture with Bidirectional Input Representations for Transformers (BERT) for generating input embeddings. These models are made content-aware by passing table schema information coupled with the database contents as additional knowledge. Though BERT-based models have achieved superior performance over non-BERT variants, the model is undertrained. In the newer model like RoBERTa, hyperparameters are further optimized and trained on a bigger corpus. This research takes a novel approach by using RoBERTa to generate input embeddings for the decoder models like Bi-LSTM. A sketch-based slot filling approach is adopted for the Bi-LSTM decoders. The research also improved the size of the data and robustness of the model to diverse linguistic patterns by performing synonym-based paraphrasing. Considering the data privacy, this research omitted using the database contents as an additional input, thus making the model only schema-aware. The model is evaluated using the logical form and execution accuracy and showed better performance than the non-schema-aware counterparts. However, the model’s performance is lower than those using table contents. The decrease in performance is acceptable owing to the privacy vs. performance tradeoff. The logical form and execution accuracy of the model with the test set are 68.8% and 76.6% respectively.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4450 Databases
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Clara Chan
Date Deposited: 11 Nov 2021 10:56
Last Modified: 11 Nov 2021 10:56
URI: https://norma.ncirl.ie/id/eprint/5133

Actions (login required)

View Item View Item