科学研究

科学研究

学术讲座
当前位置是: 首页 -> 科学研究 -> 学术讲座 -> 正文

Deep Reinforcement Learning via Noncrossing Quantile Regression

作者: 发布时间:2020-11-18 点击数:
主讲人:冯兴东
主讲人简介:

上海财经大学统计与管理学院院长、统计学教授、博士生导师。研究领域为数据降维、稳健方法、分位数回归、大数据统计计算等;已经在国际统计学顶级学术期刊JASA、AoS、JRSSB、Biometrika等发表一系列高质量论文。第八届国务院学科评议组成员,2018年成为国际统计学会当选会员(Elected member)。

主持人:钟威
讲座简介:

Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with some well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.

演讲嘉宾线上进行讲座,预约成功的同学请到经济楼N302观看,其他师生也可以选择线上参加,参会方式将以邮件形式发送给师生。

时间:2020-11-18(Wednesday)16:40-18:00
地点:经济楼N302
讲座语言:中文
主办单位:
承办单位:
期数:高级计量经济学与统计学系列第127讲
联系人信息:
TOP