A Reinforcement Learning Approach to Real-Time, Cost-Aware Kubernetes Auto-Scaling

Kumaresan, Mageshwaran

A Reinforcement Learning Approach to Real-Time, Cost-Aware Kubernetes Auto-Scaling

Tools

Kumaresan, Mageshwaran (2025) A Reinforcement Learning Approach to Real-Time, Cost-Aware Kubernetes Auto-Scaling. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (1MB) \| Preview
Preview	PDF (Configuration Manual) Download (1MB) \| Preview

Abstract

Traditional Kubernetes autoscalers (HPA, VPA, KEDA) utilize a fixed set of thresholds, which resulted in over-provisioning or a struggle to meet latency SLA requirements in dynamic, latency-sensitive applications. Current autoscaling methodologies that utilize RL are usually tailored towards individual metrics (e.g., CPU), without being very adaptive to the workloads The proposed research provides a new RL-based autoscaling mechanism that can be integrated with Prometheus and evaluated on Minikube to achieve the most cost-effective scaling of Kubernetes pods under an SLA of 200ms of latency that is a necessity when targeting cost-effective cloud services. To have an experiential understanding of how Kubernetes works when used in autoscalers, a custom Environment gym, KubeScalingEnv, was created based on Prometheus metrics and Google Cluster Data 2011 (changed to synthetic load profile 60 chunks.csv). A Deep Q-Network (DQN) model is developed and trained on Minikube, and selects a set of optimal pod replicas (1 to 5 pods) using the following dimensions as its state space: CPU, memory, latency, replicas, and spot price. The RL approach was compared with HPA, VPA and KEDA on SLA violations, cost and efficiency of pods. The RL autoscaler had a 0.00% SLA violation and a better 2.90 pods ($0.0580/step) than HPA (4.13 pods, $0.0825/step) and KEDA (3.55 pods, $0.0711/step) with a 24.7 percent increase in costs compared to VPA (2.18 pods, $0.0437/step). RL decreased pod utilization by 29.8% compared to HPA and 18.3% compared to KEDA. Similarly, T-tests show that the latency behavior is equal to HPA (p=1.0). The given RL framework is the first to combine Prometheus and multi-metric DQN with traditional autoscaler and earlier RL (e.g., Zhang et al., 2021, 0.5% SLA violations). In contrast to the oversimplified scaling VPA offers, RL is resilient to dynamic workloads, and thus zero SLA violations are witnessed with lesser resources compared to HPA and KEDA. It provides a scalable, cost-efficient solution to cloud application, and the production-grade potential is there, although the marginal cost is slightly higher than VPA. Future work should integrate the model with a real-world production application to collect workload data for validating the framework, learn the RL policy to acquire a cost-efficient method close to the VPA, and test with more sophisticated RL algorithms such as PPO to support autoscaling across multiple deployments.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Samarawickrama, Yasantha UNSPECIFIED
Subjects:	T Technology > T Technology (General) > Information Technology > Cloud computing Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions:	School of Computing > Master of Science in Cloud Computing
Depositing User:	Ciara O'Brien
Date Deposited:	26 Mar 2026 15:23
Last Modified:	26 Mar 2026 15:23
URI:	https://norma.ncirl.ie/id/eprint/9233

Actions (login required)

View Item