This note provides brief bios of the notable people focusing on their contributions to the field of Reinforcement Learning as referenced in the provided text. - **Sutton, Richard S.** (and Barto, Andrew G.): The pioneer in reinforcement learning, author of the seminal textbook on the subject. Their work is fundamental to the field. - **Scherrer, Bruno:** Worked on Batched Reinforcement Learning. - **Hafner, Danijar:** Contributed to the research of model based RL using learned latent dynamics for planning. - **Chua, Kurt:** Contributed to work on efficient model-based RL. - **Covington, Paul:** Work on recommender systems, an example of problems with large action spaces. - **Zahavy, Tom:** Contributed to research into handling large action spaces with action elimination methods. - **He, Jing:** Part of the team that developed Deep Reinforcement Relevance Networks for text-based games, and who first investigated the use of natural language as a action space. - **Iyengar, Garud:** Developed the Robust MDP framework. - **Peng, Xue Bin:** Worked on Domain Randomization for robust learning. - **Finn, Chelsea:** Contributed to research into meta-learning and system identification for adapting to new environments. - **Nagabandi, Anusha:** Contributed to research on system identification for dealing with non-stationary environments. - **Van Seijen, Harm:** Developed an approach to learning with multi-objective reward signals. - **Montavon, Grégoire:** Worked on the field of Explainable AI, particularly methods for elucidating the intent of neural networks. - **Arjona-Medina, Jose A.:** Worked on return decomposition for delayed rewards. - **Bellemare, Marc G.:** Worked on distributional reinforcement learning. - **Bohez, Simon:** Worked on Value-Constrained Model-Free Continuous Control. - **Dabney, Will:** Contributed to work on distributional RL and value function approximation. - **Dalal, G.:** Researched Safe exploration in continuous action spaces, and the use of constraint violation metrics for safety. - **Derman, Emmanuel:** Worked on Soft-Robust actor-critic policy gradient algorithms. - **Di Castro, Daniel:** Studied Policy Gradients with Variance Related Risk Criteria. - **Henderson, Peter:** Worked on the reproducibility of Deep RL research. - **Jiang, Nan:** Contributed to research into doubly robust off-policy evaluation. - **Lillicrap, Timothy P.:** Known for his work on continuous control with deep reinforcement learning, a foundational work for DRL for robotics. - **Mankowitz, Daniel J.:** Worked on situational awareness by risk-conscious skills in RL and Action Elimination techniques for DRL. - **Mnih, Volodymyr:** A key figure in deep reinforcement learning, developed the DQN algorithm for playing Atari games. - **Precup, Doina:** Worked on the reproducibility of DRL research. - **Roijers, Daan M.:** Conducted a survey of multi-objective sequential decision-making. - **Ross, Stéphane:** Studied imitation learning. - **Tamar, Aviv:** Studied Policy Gradient methods for risk measures. - **Tassa, Yuval:** Worked on deep reinforcement learning and the DeepMind Control Suite of environments. - **Hausknecht, Matthew J.:** Contributed to deep recurrent Q-learning for POMDPs and to the field of text based games. - **Anderson, Marc:** Created Zork, a landmark text-based game. - **Côté, Marc-Alexandre:** Contributed to the development of the Treasure Hunter text-based benchmark. - **Trischler, Adam:** Worked on the Cooking World, Food-Text Word Play (FTWP) and the QAit text-based game benchmarks and the ALFWorld framework. - **Ammanabrolu, Prithviraj:** Contributed to the research into the TextWorld and ClubFloyd text based benchmarks. - **Murugesan, Keerthana:** Worked on the TextWorld-Commonsense environment. - **Adhikari, Dipesh:** Worked on TextWorld-Cook benchmark. - **Yuan, Xingdi:** Worked on the Coin Collector and QAit, and ALFWorld text based game benchmarks. - **Cho, Kyunghyun:** One of the original developers of the Gated Recurrent Unit (GRU). - **Hochreiter, Sepp:** Known for developing the LSTM network, a key innovation for Recurrent Networks. - **Schmidhuber, Jürgen:** Developed the LSTM network. - **Vaswani, Ashish:** Co-authored the seminal paper on the Transformer architecture. - **Schlichtkrull, Michael:** Developed the Relational Graph Convolutional Network (R-GCN). - **Hasselt, Hado:** Worked on Double Deep Q-Networks (DDQN). - **Yin, Xusen:** Worked on the Deep Siamese Q-Network and on generalisation in sequential decision making. - **May, Jonathan:** Worked on the Deep Siamese Q-Network and generalisation in sequential decision making. - **Golovin, Danny:** Created the Treasure Hunter text based game environments and challenges. - **Silver, David:** A key figure in deep reinforcement learning, lead the team that created the AlphaGo AI. - **Shridhar, Mohit:** Lead developer of the ALFWorld benchmark. - **Kipf, Thomas N:** Worked on graph convolutional networks. - **Blundell, Charles:** Developed Episodic Control. - **Lewis, Patrick:** Developed Retrieval Augmented Generation (RAG). - **Goyal, Anirudh:** Investigated Episodic RL. - **Wei, Jason:** Worked on Chain-of-Thought prompting in Large Language Models. - **Huang, Wenlong:** Researched Inner Monologue agents that combine planning and large language models. - **Wang, Guanzhi:** Created the Voyager AI agent. - **Yao, Shunyu:** Co-authored research into integrating reasoning and acting in language models. - **Ji, Ying:** Worked on the concept of Spatial-temporal understanding in neural networks. - **Jiang, Dongfu:** Developed the TIGER-Score benchmark for the evaluation of Natural Language Generation. - **Shinn, Noah:** Worked on the Reflexion agent that uses self reflection. - **Sreedharan, Sarath:** Worked on providing explanations for decision making problems. - **Xie, Yuqi:** Worked on the Pandalm benchmark for LLM evaluation. - **Xu, Derong:** Contributed to research in Large language models and Information Extraction. - **Brockman, Greg:** Creator of the OpenAI gym environment, which includes the Frozen-Lake domain. - **Sheng, Xiaoyu:** Developed the language interpretation of the frozen lake environment used for language based RL research.