Presently, datacenters make up for 2-3 % of the world’s electrical energy consumption. As more and more computing and data storing is carried out “in the cloud”, this figure is expected to increase.
The ingoing electrical energy eventually becomes waste heat. In some locations, the heat air is used to heat buildings. PREPD, a metric invented by Petter Terenius and Damian Boroweic, is used for performing calculations on the reclaimability of this heat energy.
We claim that a large portion of future datacenters will be placed in Africa and Southeast Asia. In turn, the two regions have little need for heating buildings, which means that outgoing heat needs to be used in ways not currently associated with datacenter waste heat. Thus, a vital part of this research project is to identify uses and the degree of usefulness of outgoing datacenter heat energy in warm, and in particular low-income, countries.
Below is a preliminary tool for calculating PREPD.
Electrical power going to datacenter (kW) Computing Power (kW) Reclaimed heat from the datacenter (kW) Loss during transportation (%) Loss during conversion to other usage (%) Cheapest alternative heat utility price (CAHUP) Electricity heat utility price (EHUP)
Power Usage Efficiency (PUE) 0
Power Reclamation Efficiency (PREPD) 0
Distributed Cloud Gaming
Lead: James Bulman
J. Bulman, P. Garraghan, Towards GPU Utilization Prediction for Cloud Deep Learning, USENIX HotCloud, 2020.
Gaming is now the largest entertainment industry in the world at $143.5b as of 2020. Cloud gaming has become increasingly promiment in recent years with the likes of Google Stadia and Microsoft Project XCloud, providing lower loading times, portability, and higher graphical quality via leveraging cloud resources.
However cloud gaming encounter various challenges encompassing complex monolithic game engine architectures, platform dependency, and susceptibility to performance degradation and failure between, and outside of, the cloud.
This project aims to create an open-source cloud gaming framework, allowing for the first time dynamic distributed game engines designed natively for the cloud.
Due the complexity of distributing (and coordinating) the game engine across the cloud, our framework is focusing on completing specific game subsystems.
Phase 1: Graphics
Dynamic graphical rendering across cloud-client systems.
Dynamic Graphics API hotswapping (Vulkan, OpenGL)
Cloud-client frame interlacing
Hardware and Operating System agnostic
Secure ML Systems
Lead: William Hackett
Federated Cloud-Fog Schedulers
Lead: Dominic Lindsay
Multi-Tenant Deep Learning Systems
Lead: Gingfung Yeung
Deep Learning Systems are generally categorized into two types: Training focus and Inference focus.
In training focus system, existing big data cluster framework disallow GPU sharing, which leads to slow turn-around time, i.e. in hyper-parameter tuning scenario or architecture search.
Hence, instead of the scheduling extender pattern, we create our own scheduler pkg, where we can observe cluster’s GPU utilization and make scheduling decision.
Further, we discover interesting phenomena where, models computation graph might have all the information we need to predict resource consumption. This opens up possibility to optimize our system infrastructure, for scheduling workload w.r.t. resource utilization, cost/energy efficiency.
In inference focused systems, we want to minimize QoS violation. Currently there are many approaches to do so, such as early exit prediction which based on model’s confidence, query submission pattern prediction, models sharing and cloud instance scaling etc.
Currently, we are working on inference system. We focus on energy and latency minimization on hardware constraint platform such as jetson or mobile phone.
Deep Learning Energy Efficiency
Lead: Damian Borowiec
Deep Learning, being a subset of Machine Learning, relies heavily on accelerators such as GPUs or TPUs (Tensor Processing Units) for its computational power.
Such accelerators provide incomparable computational abilities for the price of vast energy consumption, thermal dissipation and cost.
The key aim of this project is to investigate the effects of Deep Learning on the energy consumption of high-power clusters.