Continuous-Time Linear-Quadratic Reinforcement Learning
Speaker: – University of Georgia, United States
Ìý.
Abstract: We focus on learning from a single trajectory to control linear dynamical systems that evolve as stochastic differential equations. Reinforcement learning policies will be presented for stabilizing unknown systems, and for minimizing quadratic cost functions. First, fast and reliable stabilization algorithms that utilize Bayesian learning methods will be discussed. Then, we propose effective policies that can balance the exploration and exploitation, in a manner similar to Epsilon-Greedy or Thompson Sampling. Theoretical analyses showing regret bounds that grow with the square-root of time and with the number of parameters will be provided, together with experiments for different real systems. Further fundamental limitations will be discussed as well.
Bio: Mohamad Kazem Shirani Faradonbeh received his PhD in statistics from the University of Michigan in 2017, and his BSc in electrical engineering from Sharif University of Technology in 2012. He was a postdoc with the Informatics Institute and with the Department of Statistics at the University of Florida, and a fellow in the Simons Institute for the Theory of Computing at the University of California - Berkeley. From 2020 at the University of Georgia, he is an assistant professor of Data Science with the Department of Statistics and with the Institute for Artificial Intelligence.