Ã山ǿ¼é

Event

Informed Posterior Sampling Based Reinforcement Learning Algorithms

Friday, April 26, 2024 10:30to11:30
McConnell Engineering Building Zames Seminar Room, MC 437, 3480 rue University, Montreal, QC, H3A 0E9, CA

Informal Systems Seminar (ISS), Centre for Intelligent Machines (CIM) and Groupe d'Etudes et de Recherche en Analyse des Decisions (GERAD)

Speaker: Dengwang Tang


**Ìý±·´Ç³Ù±ðÌý³Ù³ó²¹³ÙÌý³Ù³ó¾±²õÌý¾±²õÌý²¹Ìý³ó²â²ú°ù¾±»åÌý±ð±¹±ð²Ô³Ù.
**Ìý°Õ³ó¾±²õÌý²õ±ð³¾¾±²Ô²¹°ùÌý·É¾±±ô±ôÌý²ú±ðÌý±è°ù´ÇÂá±ð³¦³Ù±ð»åÌý²¹³ÙÌý²Ñ³¦°ä´Ç²Ô²Ô±ð±ô±ôÌý437Ìý²¹³ÙÌý²Ñ³¦³Ò¾±±ô±ôÌý±«²Ô¾±±¹±ð°ù²õ¾±³Ù²â


²Ñ±ð±ð³Ù¾±²Ô²µÌý±õ¶Ù:Ìý845Ìý1388Ìý1004ÌýÌýÌýÌýÌýÌýÌý
±Ê²¹²õ²õ³¦´Ç»å±ð:Ìý³Õ±õ³§³§

Abstract: In many traditional reinforcement learning (RL) settings, an agent learns to
control the system without incorporating any prior knowledge. However, such a
paradigm can be impractical since learning can be slow. In many engineering
applications, offline datasets are often available. To leverage the information provided
by the offline datasets with the power of online-finetuning, we proposed the informed
posterior sampling based reinforcement learning (iPSRL) for both episodic and
continuing MDP learning problems. In this algorithm, the learning agent forms an
informed prior with the offline data along with the knowledge about the offline policy that
generated the data. This informed prior is then used to initiate the posterior sampling
procedure. Through a novel prior-dependent regret analysis of the posterior sampling
procedure, we showed that when the offline data is informative enough, the iPSRL
algorithm can significantly reduce the learning regret compared to the baselines (that do
not use offline data in the same way). Based on iPSRL, we then proposed the more
practical iRLSVI algorithm. Empirical results showed that iRLSVI can significantly
°ù±ð»å³Ü³¦±ðÌý°ù±ð²µ°ù±ð³ÙÌý³¦´Ç³¾±è²¹°ù±ð»åÌý³Ù´ÇÌý²ú²¹²õ±ð±ô¾±²Ô±ð²õÌý·É¾±³Ù³ó´Ç³Ü³ÙÌý°ù±ð²µ°ù±ð³Ù.

Bio: Dengwang Tang is currently a postdoctoral researcher at University of Southern
California. He obtained his B.S.E in Computer Engineering from University of Michigan,
Ann Arbor in 2016. He earned his Ph.D. in Electrical and Computer Engineering (2021),
M.S. in Mathematics (2021), and M.S. in Electrical and Computer Engineering (2018) all
from University of Michigan, Ann Arbor. Prior to joining USC he was a postdoctoral
researcher at University of California, Berkeley. His research interests involve control
and learning algorithms in stochastic dynamic systems, multi-armed bandits, multi-agent
²õ²â²õ³Ù±ð³¾²õ,Ìý±ç³Ü±ð³Ü¾±²Ô²µÌý³Ù³ó±ð´Ç°ù²â,Ìý²¹²Ô»åÌý²µ²¹³¾±ðÌý³Ù³ó±ð´Ç°ù²â

Back to top