Rethinking the Objective for Policy Optimization in Reinforcement Learning - Martha White, Associate Professor, University of Alberta

Martha White Image

DATE: Mon, June 15, 2020 - 3:30 pm

LOCATION: Please register to receive the Zoom link


Please register for this event here



The goal in reinforcement learning is to obtain a policy that maximizes long-term reward. Policy optimization in reinforcement learning involves directly estimating a parameterized policy, that maps states to probabilities over actions. Typically, these algorithms are built on the policy gradient theorem, which provides a simple form for the gradient of the policy optimization objective. In practice, however, a key weighting in the gradient is dropped for convenience; despite this omission, these widely used algorithms seem to perform quite well. In this talk, I will first discuss some failure cases where this omission does in fact result in poor solutions. I will then discuss how this might not be observed in practice, because the actual algorithms used are better though of as approximate policy iteration algorithms which have different optimization behavior. When viewed in this way, it suggests a range of possible policy optimization algorithms, with different objectives, that might help explain some of the discrepancy between theory and practice as well as suggest next steps to improve the algorithms used in practice. 



Martha White is an Associate Professor of Computing Science at the University of Alberta. She is a PI of the Alberta Machine Intelligence Institute (Amii) and a director of RLAI---the Reinforcement Learning and Artificial Intelligence Lab at the University of Alberta. Her research focus is on developing algorithms for agents continually learning on streams of data, with an emphasis on representation learning and reinforcement learning.


Please register for this event here


< Back to Events