NORMA eResearch @NCI Library

Advanced Google Scholar Scraper: A Content-Based Filtering Approach for Literature Recommendation Using BERT

Bandi, Praneeth (2024) Advanced Google Scholar Scraper: A Content-Based Filtering Approach for Literature Recommendation Using BERT. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (721kB) | Preview

Abstract

Advanced Google Scholar Scraper is a complex recommendation system for reading materials. It employs various techniques including web scraping, natural language processing, and content-based filtering. It uses Selenium, Beautiful Soup, and the Hugging Face Transformers library (with a focus on BERT) to make literature referrals more accurate and relevant. The scraper was developed to provide researchers, students, and practitioners with a simple but flexible tool that will allow them to find relevant articles across all fields. Domains include Natural Language Processing (NLP), Machine Learning (ML), and BERT Models. Trying to provide contextually and semantically accurate recommendations, the system is based on BERT embeddings and cosine similarity metrics. An assessment of the scraper verifies its capacity to collect articles from specific domains and offers examples of successful applications for natural language processing methods and web scraping functions. The results show high similarity scores in different fields of research and are timely. The results are that the Advanced Google Scholar Scraper succeeds in getting over these obstacles for dynamic Web scraping, error handling, and user interface design. This is one solution appropriate to all the applications of literature suggestions. The scraper’s adaptability, real-time progress monitoring, and error tolerance make it an extremely useful tool in many research environments.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Haque, Rejwanul
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 30 May 2025 14:09
Last Modified: 30 May 2025 14:09
URI: https://norma.ncirl.ie/id/eprint/7714

Actions (login required)

View Item View Item