This note provides brief bios of the notable people focusing on their contributions to the field of Reinforcement Learning as referenced in the provided text.
- **Sutton, Richard S.** (and Barto, Andrew G.): The pioneer in reinforcement learning, author of the seminal textbook on the subject. Their work is fundamental to the field.
- **Scherrer, Bruno:** Worked on Batched Reinforcement Learning.
- **Hafner, Danijar:** Contributed to the research of model based RL using learned latent dynamics for planning.
- **Chua, Kurt:** Contributed to work on efficient model-based RL.
- **Covington, Paul:** Work on recommender systems, an example of problems with large action spaces.
- **Zahavy, Tom:** Contributed to research into handling large action spaces with action elimination methods.
- **He, Jing:** Part of the team that developed Deep Reinforcement Relevance Networks for text-based games, and who first investigated the use of natural language as a action space.
- **Iyengar, Garud:** Developed the Robust MDP framework.
- **Peng, Xue Bin:** Worked on Domain Randomization for robust learning.
- **Finn, Chelsea:** Contributed to research into meta-learning and system identification for adapting to new environments.
- **Nagabandi, Anusha:** Contributed to research on system identification for dealing with non-stationary environments.
- **Van Seijen, Harm:** Developed an approach to learning with multi-objective reward signals.
- **Montavon, Grégoire:** Worked on the field of Explainable AI, particularly methods for elucidating the intent of neural networks.
- **Arjona-Medina, Jose A.:** Worked on return decomposition for delayed rewards.
- **Bellemare, Marc G.:** Worked on distributional reinforcement learning.
- **Bohez, Simon:** Worked on Value-Constrained Model-Free Continuous Control.
- **Dabney, Will:** Contributed to work on distributional RL and value function approximation.
- **Dalal, G.:** Researched Safe exploration in continuous action spaces, and the use of constraint violation metrics for safety.
- **Derman, Emmanuel:** Worked on Soft-Robust actor-critic policy gradient algorithms.
- **Di Castro, Daniel:** Studied Policy Gradients with Variance Related Risk Criteria.
- **Henderson, Peter:** Worked on the reproducibility of Deep RL research.
- **Jiang, Nan:** Contributed to research into doubly robust off-policy evaluation.
- **Lillicrap, Timothy P.:** Known for his work on continuous control with deep reinforcement learning, a foundational work for DRL for robotics.
- **Mankowitz, Daniel J.:** Worked on situational awareness by risk-conscious skills in RL and Action Elimination techniques for DRL.
- **Mnih, Volodymyr:** A key figure in deep reinforcement learning, developed the DQN algorithm for playing Atari games.
- **Precup, Doina:** Worked on the reproducibility of DRL research.
- **Roijers, Daan M.:** Conducted a survey of multi-objective sequential decision-making.
- **Ross, Stéphane:** Studied imitation learning.
- **Tamar, Aviv:** Studied Policy Gradient methods for risk measures.
- **Tassa, Yuval:** Worked on deep reinforcement learning and the DeepMind Control Suite of environments.
- **Hausknecht, Matthew J.:** Contributed to deep recurrent Q-learning for POMDPs and to the field of text based games.
- **Anderson, Marc:** Created Zork, a landmark text-based game.
- **Côté, Marc-Alexandre:** Contributed to the development of the Treasure Hunter text-based benchmark.
- **Trischler, Adam:** Worked on the Cooking World, Food-Text Word Play (FTWP) and the QAit text-based game benchmarks and the ALFWorld framework.
- **Ammanabrolu, Prithviraj:** Contributed to the research into the TextWorld and ClubFloyd text based benchmarks.
- **Murugesan, Keerthana:** Worked on the TextWorld-Commonsense environment.
- **Adhikari, Dipesh:** Worked on TextWorld-Cook benchmark.
- **Yuan, Xingdi:** Worked on the Coin Collector and QAit, and ALFWorld text based game benchmarks.
- **Cho, Kyunghyun:** One of the original developers of the Gated Recurrent Unit (GRU).
- **Hochreiter, Sepp:** Known for developing the LSTM network, a key innovation for Recurrent Networks.
- **Schmidhuber, Jürgen:** Developed the LSTM network.
- **Vaswani, Ashish:** Co-authored the seminal paper on the Transformer architecture.
- **Schlichtkrull, Michael:** Developed the Relational Graph Convolutional Network (R-GCN).
- **Hasselt, Hado:** Worked on Double Deep Q-Networks (DDQN).
- **Yin, Xusen:** Worked on the Deep Siamese Q-Network and on generalisation in sequential decision making.
- **May, Jonathan:** Worked on the Deep Siamese Q-Network and generalisation in sequential decision making.
- **Golovin, Danny:** Created the Treasure Hunter text based game environments and challenges.
- **Silver, David:** A key figure in deep reinforcement learning, lead the team that created the AlphaGo AI.
- **Shridhar, Mohit:** Lead developer of the ALFWorld benchmark.
- **Kipf, Thomas N:** Worked on graph convolutional networks.
- **Blundell, Charles:** Developed Episodic Control.
- **Lewis, Patrick:** Developed Retrieval Augmented Generation (RAG).
- **Goyal, Anirudh:** Investigated Episodic RL.
- **Wei, Jason:** Worked on Chain-of-Thought prompting in Large Language Models.
- **Huang, Wenlong:** Researched Inner Monologue agents that combine planning and large language models.
- **Wang, Guanzhi:** Created the Voyager AI agent.
- **Yao, Shunyu:** Co-authored research into integrating reasoning and acting in language models.
- **Ji, Ying:** Worked on the concept of Spatial-temporal understanding in neural networks.
- **Jiang, Dongfu:** Developed the TIGER-Score benchmark for the evaluation of Natural Language Generation.
- **Shinn, Noah:** Worked on the Reflexion agent that uses self reflection.
- **Sreedharan, Sarath:** Worked on providing explanations for decision making problems.
- **Xie, Yuqi:** Worked on the Pandalm benchmark for LLM evaluation.
- **Xu, Derong:** Contributed to research in Large language models and Information Extraction.
- **Brockman, Greg:** Creator of the OpenAI gym environment, which includes the Frozen-Lake domain.
- **Sheng, Xiaoyu:** Developed the language interpretation of the frozen lake environment used for language based RL research.