AI & Fundamentals

The Provable Effectiveness of Policy Gradient Methods in Reinforcement Learning and Controls - Sham Kakade, Professor, University of Washington; Microsoft Research

DATE: Mon, September 27, 2021 - 1:00 pm

LOCATION: Please register to receive the Zoom link

DETAILS

Please register for this event here.

Abstract:

Reinforcement learning is the dominant paradigm for how an agent learns to interact with the world in order to achieve some long term objectives. Here, policy gradient methods are among the most effective methods in challenging reinforcement learning problems, due to that they: are applicable to any differentiable policy parameterization; admit easy extensions to function approximation; easily incorporate structured state and action spaces; are easy to implement in a simulation based, model-free manner.

However, little is known about even their most basic theoretical convergence properties, including:
- do they converge to a globally optimal solution, say with a sufficiently rich policy class?
- how well do they cope with approximation error, say due to using a class of neural policies?
- what is their finite sample complexity?
This talk will cover a number of recent results on these basic questions and also provide the first approximation results which do have not worst case dependencies on the size of the state space. We will highlight the interplay of theory, algorithm design, and practice.

Joint work with: Alekh Agarwal, Jason Lee, Gaurav Mahajan

Bio:

Sham Kakade is a professor in the Department of Computer Science and the Department of Statistics at the University of Washington and is also a senior principal researcher at Microsoft Research. He works on the mathematical foundations of machine learning and AI. Sham's thesis helped lay the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods in reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award, the IBM Pat Goldberg best paper award, and INFORMS Revenue Management and Pricing Prize. He has been program chair for COLT 2011.

Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He completed his Ph.D. with Peter Dayan in computational neuroscience at the Gatsby Computational Neuroscience Unit. He was a postdoc with Michael Kearns at the University of Pennsylvania.