Chess LLM Arena: A Framework for Evaluating Strategic Decision-Making in Large Language Models

Jannala, Sai Dhanush

Chess LLM Arena: A Framework for Evaluating Strategic Decision-Making in Large Language Models

Tools

Jannala, Sai Dhanush (2024) Chess LLM Arena: A Framework for Evaluating Strategic Decision-Making in Large Language Models. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (533kB) \| Preview
Preview	PDF (Configuration Manual) Download (932kB) \| Preview

Abstract

The rise of Large Language Models constitutes a paradigmatic shifting in how AI has decided on strategic decisions so far. The present study constitutes a deep exploration into the strategic reasoning of eight heterogeneous LLM architectures in the process of playing chess games. Of the 680 analyzed games presented here, the experimental frame proposes standardized API wrappers supplemented by sophisticated evaluation metrics that bring to light distinct patterns of how various approaches to LLM architectures solve or address strategic decision-making situations. The results show that the model's quality clearly depends on the model's size. Thus, among all the other GPT models, the best results shown are for GPT4o with CPL 16.75 and a Win Rate of 64.2%. The analysis reveals a systematic performance difference between playing White and Black sides, with all models showing increased CPL when playing Black. Interestingly, the research shows evidence of a signature aggressive playing style across all models: 62.5% of games ended decisively, which is in contrast with traditional chess engine behavior. The results go beyond chess and inform on how different AI architectures address complex strategic reasoning tasks. This piece of research may contribute to understanding the capabilities of LLM in SDCM and provide a concomitant methodological framework through which the performance of artificial intelligence in strategically complex real-world environments can be evaluated and analyzed.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Chikkankod, Arjun UNSPECIFIED
Uncontrolled Keywords:	Large Language Models; Strategic Decision-Making; Chess AI; Performance Analysis; Artificial Intelligence
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence G Geography. Anthropology. Recreation > GV Recreation Leisure > Games and Amusements
Divisions:	School of Computing > Master of Science in Data Analytics
Depositing User:	Ciara O'Brien
Date Deposited:	02 Sep 2025 14:37
Last Modified:	02 Sep 2025 14:37
URI:	https://norma.ncirl.ie/id/eprint/8714

Actions (login required)

View Item