A systematic evaluation of vision transformers for galaxy classification

Pani, Pinaki

A systematic evaluation of vision transformers for galaxy classification

Tools

Pani, Pinaki (2024) A systematic evaluation of vision transformers for galaxy classification. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (3MB) \| Preview
Preview	PDF (Configuration Manual) Download (1MB) \| Preview

Abstract

This study explores the effectiveness of Vision Transformers (ViTs) in the morphological classification of galaxies. This research utilizes the Galaxy10 Decals dataset for the deep learning tasks. The research focuses on three advanced transformer-based models—ViT Base, Swin Transformer, and DeiT Transformer alongside the conventional ResNet50 model. The Galaxy10 dataset comprises of 10 galaxy classes, serves as the benchmark for evaluating model performance. The ViT Base model is fine-tuned on the Galaxy10 dataset with weights pre-trained on ImageNet. The model demonstrated a robust performance due to its ability to capture complex relationships through multiple layers of multi-head self-attention. Similarly, the Swin Transformer is known for its hierarchical design and shifting windows, and the DeiT Transformer is enhanced with data efficiency techniques and knowledge distillation. Both the models showcased significant accuracy and precision in galaxy classification tasks. Evaluation metrics were included in this research such as precision, recall, accuracy, and F1 score. The metrics ensured a comprehensive assessment of model performances. The results indicates that the ViT Base model achieved the highest accuracy; however, a baseline CNN model performed faster. This research highlights the trade-off of Vision Transformers in the domain of astronomical image classification. It offers insights into their capability for detailed morphological analysis of images. The findings suggest that ViTs could be used as a general-purpose image classification technique, showing slightly better accuracy than ResNet50. Overall, vision transformers show superior ability to model contextual information and are promising tools for image classification.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Estrada, Giovani UNSPECIFIED
Uncontrolled Keywords:	Vision transformers; convolutional neural network; galaxy; morphology; CNN
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QB Astronomy Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions:	School of Computing > Master of Science in Data Analytics
Depositing User:	Ciara O'Brien
Date Deposited:	25 Aug 2025 09:12
Last Modified:	25 Aug 2025 09:12
URI:	https://norma.ncirl.ie/id/eprint/8606

Actions (login required)

View Item