Learning from human feedback
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
DayDreamer: World Models for Physical Robot Learning
Video Pretraining (VPT) learning to act by watching unlabeled online videos
Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble
MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
The Primacy Bias in Deep Reinforcement Learning
Decision Transformer and its variants