Satheeshkumar, Harikrishnan (2025) ML-Powered Cloud Task Failure Prediction and Scalable Deployment on AWS. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (681kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (336kB) | Preview |
Abstract
Predicting task failures in large-scale cloud environments is critical for improving system reliability and managing Service Level Agreements. This project presents an end-to-end machine learning system for predicting cloud task failures using historical workload data from Google Borg traces. A Random Forest Classifier was trained to distinguish between failing and succeeding tasks based on features such as resource requests, memory usage, and CPU cycles. The resulting model was operationalized by building a Flask-based REST API that dynamically loads the model from Amazon S3. For deployment, the application was containerized using Docker and orchestrated on AWS using ECS Fargate, ensuring a serverless and scalable execution environment for the prediction service. The system's endpoint is exposed via an Application Load Balancer, with an attached Auto Scaling policy based on CPU utilization to handle variable prediction request loads, ensuring the API itself remains responsive. This work demonstrates a complete, production-ready pipeline for deploying a real-time, scalable, ML-powered classification service in a cloud-native fashion.
| Item Type: | Thesis (Masters) |
|---|---|
| Supervisors: | Name Email Kazmi, Aqeel UNSPECIFIED |
| Uncontrolled Keywords: | Machine Learning; Failure Prediction; Cloud Computing; AWS; ECS Fargate; Docker; Random Forest; Auto Scaling; REST API; Google Borg Dataset |
| Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Cloud computing Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
| Divisions: | School of Computing > Master of Science in Cloud Computing |
| Depositing User: | Ciara O'Brien |
| Date Deposited: | 30 Mar 2026 14:11 |
| Last Modified: | 30 Mar 2026 14:11 |
| URI: | https://norma.ncirl.ie/id/eprint/9258 |
Actions (login required)
![]() |
View Item |
Tools
Tools