Chaudhry, Muhammad Qamar (2025) Serverless AI: Leveraging Cloud Functions For GPU-Optimized Machine Learning Deployment and Comparative Analysis with Traditional Methods. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (694kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (731kB) | Preview |
Abstract
The fusion of serverless computing and GPU-accelerated machine learning (ML) marks a new approach in AI deployment at the cloud level. Proposed strategies incorporating virtual machines or containers seem to struggle with scaling, cost efficiency, management overhead, and infrastructure. This work analyzes the hybrid serverless architecture based on AWS Lambda and API Gateway for managing inference workloads using GPU-backed EC2 instances. The focus of the research is on two machine learning models - cyberbullying detection model with ML algorithms and an object detection model - YOLOv8, monitoring latency, resource consumption, processing time, and cost. The findings illustrate that lightweight models perform consistently under serverless configurations, but heavy workload models take advantage of dynamic offload to GPU-backed EC2 instances. In addition to facing latency spikes and inconsistent resource utilization, delay-sensitive computing GPU models exhibit tremendous resource needs. Nevertheless, the hybrid model achieves some balance of diminishing returns with performance and cost. The study illustrates the capability of such serverless architectures to configure with lower responsiveness for modern AI workloads, increasing the potential to enable these approaches with suitable requirements.
Actions (login required)
![]() |
View Item |
Tools
Tools