Analyzing Varying Complexity on Q-Learning

BAILEY NELSON, MATTHEW CHAPMAN, and MICHAEL E. COTTERELL, University of Georgia, USA
CSCI 4960 & CSCI 6950 Spring 2020


Abstract

Q-Learning is a Reinforcement Learning algorithm used to teach a computer agent how to act optimally in a complex environment. The agent chases a reward signal given from the environment in order to form its optimal policy. Q-Learning is known to be effective at learning complex environments, and much work has been done with the algorithm: expanding it, applying it, proving it, analyzing it; however, very little work has been done analyzing how varying complexity can impact the effectiveness of the algorithm. That is, it is not fully understood how effective Q-Learning can be as the difficulty of problems increases; there is only proof that it will converge to an optimal strategy. To investigate this, Snake games are constructed at varying complexities, and multiple replications of the algorithm are executed for each complexity. The complexity of Snake can be increased through new objectives and mechanics. The algorithm’s effectiveness at each complexity is determined by comparing the number of replications needed to reach a particular objective with prior complexity levels. The results are expected to show that the effectiveness measure of each optimal policy will be comparable, regardless of complexity (i.e., the differences between means will not be statistically significant from zero). This study will lead to a better understanding of how the algorithm learns, how quickly it can develop an optimal policy as problem complexity increases, and its suitability in environments of varying complexities.

CCS Concepts: Computing Methodologies → Reinforcement Learning; Q-Learning