Bhaskaran, Ranjith (2025) AI-Driven Cloud Optimization: Enhancing Cost Prediction, Resource Scheduling and Fault Resilience in Cloud Environments. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (1MB) | Preview |
Abstract
Cloud computing has the benefits of scalability and flexibility yet poses long term problems of cost estimation, efficient scheduling of resources and fault tolerance. In this study, a single unit in the form of AI-based system is proposed, which can reconcile these drawbacks through a combination of three main modules, such as cost prediction, dynamic task scheduling, and fault detection, into a user-friendly visualization dashboard. The cost prediction module was based on supervised machine learning algorithms to predict the costs of a task based on synthetic workloads created with iFogSim including Linear Regression, Random Forest and XGBoost. The prediction accuracy is also enhanced after hyperparameter optimization using Optuna. To perform clever scheduling, the system employs Deep Meaningful Learning (DRL) with a Deep Q-Network ( DQN ) structure that maximizes job placement on heterogeneous virtual machines (VMs) and has benchmark comparisons with First-Come-First-Serve ( FCFS ) and Round-Robin schedules. The logic of the scheduling is trained and tested on Kaggle Cloud Task Scheduling dataset. The Isolation Forest algorithm is applied to fault detection to detect the anomalous system behavior like a CPU usage behavior or the long execution time. All the outcomes, such as evaluation metrics, reward curves, anomaly plots, and interpretability graphs, are displayed as a part of a Streamlit-based dashboard on Render. The framework is a modular automation construct to stage each module on-demand, which makes it flexible and reproducible and resilient in deployment. Experimentation (to a great extent) proves that such a method makes cost estimation more accurate, minimizes delays in scheduling, and increases fault tolerance. This makes the proposed system holistic and practical, since predictive analytics can be combined with reinforcement learning, along with anomaly detection, to optimise operations in multi-cloud environments and therefore it can be of research value as well as real-life cloud management application.
| Item Type: | Thesis (Masters) |
|---|---|
| Supervisors: | Name Email Gupta, Shaguna UNSPECIFIED |
| Uncontrolled Keywords: | Cloud Optimization; Cost Prediction; AI Scheduling; Fault Tolerance; Cloud Simulation |
| Subjects: | Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence T Technology > T Technology (General) > Information Technology > Cloud computing |
| Divisions: | School of Computing > Master of Science in Cloud Computing |
| Depositing User: | Ciara O'Brien |
| Date Deposited: | 20 Mar 2026 11:28 |
| Last Modified: | 20 Mar 2026 11:28 |
| URI: | https://norma.ncirl.ie/id/eprint/9200 |
Actions (login required)
![]() |
View Item |
Tools
Tools