Seminar Event Detail

Financial/Actuarial Mathematics

Date:  Wednesday, March 16, 2022
Location:  Zoom Virtual (3:00 PM to 4:00 PM)

Title:  Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models

Abstract:   We develop a probabilistic framework for analysing model-based reinforcement learning in the episodic setting. We then apply it to study finite-time horizon stochastic control problems with linear dynamics but unknown coefficients and convex, but possibly irregular, objective function. Using probabilistic representations, we study regularity of the associated cost functions and establish precise estimates for the performance gap between applying optimal feedback control derived from estimated and true model parameters. Next, we propose a phase-based learning algorithm for which we show how to optimise exploration-exploitation trade-off. Our algorithm achieves sublinear (or even logarithmic) regrets in high probability and expectation, which matches the best possible results from the literature.

Files: 7660_lc_rl.pdf

Speaker:  Yufei Zhang
Institution:  LSE

Event Organizer:     


Edit this event (login required).
Add new event (login required).
For access requests and instructions, contact

Back to previous page
Back to UM Math seminars/events page.