Date of Award
December 2017
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical Engineering and Computer Science
Advisor(s)
Jae C. Oh
Keywords
Multi Armed Bandit, Reinforcement Learning, Simulated Annealing, Stackelberg Game
Subject Categories
Engineering
Abstract
With the rapid growth in velocity and volume, streaming data compels decision support systems to predict a small number of unique data points in due time that can represent a massive amount of correlated data without much loss of precision. In this work, we formulate this problem as the {\it online set coverage problem} and propose its solution for recommendation systems and the patrol assignment problem.
We propose a novel online reinforcement learning algorithm inspired by the Multi-Armed Bandit problem to solve the online recommendation system problem. We introduce a graph-based mechanism to improve the user coverage by recommended items and show that the mechanism can facilitate the coordination between bandits and therefore, reduce the overall complexity. Our graph-based bandit algorithm can select a much smaller set of items to cover a vast variety of users’ choices for recommendation systems. We present our experimental results in a partially observable real-world environment.
We also study the patrol assignment as an online set coverage problem, which presents an additional level of difficulty. Along with covering the susceptible routes by learning the diversity of attacks, unlike in recommendation systems, our technique needs to make choices against actively engaging adversarial opponents. We assume that attacks over those routes are posed by intelligent entities, capable of reacting with their best responses. Therefore, to model such attacks, we used the Stackelberg Security Game. We augment our graph-based bandit defenders with adaptive adjustment of reward coming from this game to perplex the attackers and gradually succeed over them by maximizing the confrontation.
We found that our graph bandits can outperform other Multi-Arm bandit algorithms when a simulated annealing-based scheduling is incorporated to adjust the balance between exploration and exploitation.
Access
Open Access
Recommended Citation
Rahman, Mahmuda, "ONLINE LEARNING WITH BANDITS FOR COVERAGE" (2017). Dissertations - ALL. 805.
https://surface.syr.edu/etd/805