Speaker: Giorgio Manganini
United Technologies Research Centre
When: 24th January, 11.00 am
Where: GSSI Library
Abstract: Policy Search is a Reinforcement Learning approach that focuses on the search for the optimal policy of a Markov Decision Process in a limited policy space. It has gained popularity as an approach for complex, real applications since it can deal with high–dimensional state and action spaces, while keeping the search limited to a task–appropriate predefined parametrized policy class. During this talk, we introduce the concepts and the formulations of Policy Gradient methods, focusing on an exploration strategy in the policy parameter space (PGPE).
In the first part, we endow the PGPE with a novel policy parameterization using particles to describe entire areas of the state space associated to the same action, and hence scaling favorably with state space size. In the second part, the gradient direction of the PGPE is extended to second–order Newton methods: we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample–based estimation and a finite sample analysis in the case of Normal distribution.
Beside discussing the theoretical properties, we empirically evaluate the proposed methods on either instructional and real case studies.