Date of Award

Spring 5-23-2021

Degree Type


Degree Name

Doctor of Philosophy (PhD)


Electrical Engineering and Computer Science


Tang, Jian


Communication Networks, Deep Reinforcement Learning, Experience-Driven Control, Mobile and Edge Computing, Resource Allocation, Scheduling

Subject Categories

Artificial Intelligence and Robotics | Computer Sciences | Engineering | Physical Sciences and Mathematics


Modern networking and computing systems have become very complicated and highly dynamic, which makes them hard to model, predict and control. In this thesis, we aim to study system control problems from a whole new perspective by leveraging emerging Deep Reinforcement Learning (DRL), to develop experience-driven model-free approaches, which enable a network or a device to learn the best way to control itself from its own experience (e.g., runtime statistics data) rather than from accurate mathematical models, just as a human learns a new skill (e.g., driving, swimming, etc). To demonstrate the feasibility and superiority of this experience-driven control design philosophy, we present the design, implementation, and evaluation of multiple DRL-based control frameworks on two fundamental networking problems, Traffic Engineering (TE) and Multi-Path TCP (MPTCP) congestion control, as well as one cutting-edge application, resource co-scheduling for Deep Neural Network (DNN) models on mobile and edge devices with heterogeneous hardware.

We first propose DRL-TE, a DRL-based framework that enables experience-driven networking for TE. DRL-TE maximizes a widely-used utility function by jointly learning network environment and its dynamics, and making decisions under the guidance of powerful DNNs. We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. Furthermore, we propose an Actor-Critic-based Transfer learning framework for TE, ACT-TE, which solves a practical problem in experience-driven networking: when network configurations are changed, how to train a new DRL agent to effectively and quickly adapt to the new environment. In the new network environment, ACT-TE leverages policy distillation to rapidly learn a new control policy from both old knowledge (i.e., distilled from the existing agent) and new experience (i.e., newly collected samples).

In addition, we propose DRL-CC to enable experience-driven congestion control for MPTCP. DRL-CC utilizes a single (instead of multiple independent) DRL agent to dynamically and jointly perform congestion control for all active MPTCP flows on an end host with the objective of maximizing the overall utility. The novelty of our design is to utilize a flexible recurrent neural network, LSTM, under a DRL framework for learning a representation for all active flows and dealing with their dynamics. Moreover, we integrate the above LSTM-based representation network into an actor-critic framework for continuous congestion control, which applies the deterministic policy gradient method to train actor, critic, and LSTM networks in an end-to-end manner.

With the emergence of more and more powerful chipsets and hardware and the rise of Artificial Intelligence of Things (AIoT), there is a growing trend for bringing DNN models to empower mobile and edge devices with intelligence such that they can support attractive AI applications on the edge in a real-time or near real-time manner. To leverage heterogeneous computational resources (such as CPU, GPU, DSP, etc) to effectively and efficiently support concurrent inference of multiple DNN models on a mobile or edge device, in the last part of this thesis, we propose a novel experience-driven control framework for resource co-scheduling, which we call COSREL. COSREL has the following desirable features: 1) it achieves significant speedup over commonly-used methods by efficiently utilizing all the computational resources on heterogeneous hardware; 2) it leverages DRL to make dynamic and wise online scheduling decisions based on system runtime state; 3) it is capable of making a good tradeoff among inference latency, throughput and energy efficiency; and 4) it makes no changes to given DNN models, thus preserves their accuracies.

To validate and evaluate the proposed frameworks, we conduct extensive experiments on packet-level simulation (for TE), testbed with modified Linux kernel (for MPTCP), and off-the-shelf Android devices (for resource co-scheduling). The results well justify the effectiveness of these frameworks, as well as their superiority over several baseline methods.


Open Access