Real-World Reinforcement Learning

# Definition The work in \cite{Osborne:RL4RL-book} specifies that the definition of a "real-life'' task is a spectrum rather than a binary classification. \begin{itemize} \item A \emph{simulated problem} can encompass a wide range of tasks. This could be a simple simulator of a pole-balancing system that a researcher constructed themselves based on simple physical equations. It could be a sophisticated simulation that mimics a real-world task, such as using a data-driven approach to model an oil well. When thinking about how "real-world'' such a task is, one may consider whether the simulation is based on data, how well it mimics the true system, or whether the simulation may have been designed to highlight a Reinforcement Learning algorithm's success (rather than the other way round). \item A \emph{virtual problem} is the setting when a virtual task is the true task. For example, when competing in video game tournaments or trading stocks, there is no physical implementation. Instead, the true problem is fully virtual, but actions do have real-life consequences. \item A \emph{physical problem} is one in which the agent takes actions that have physical consequences. On the simpler side, this could be maneuvering a robot in a controlled laboratory setting. On the more complex side, this could involve coordinating multiple self-driving cars through a busy intersection. \end{itemize} A high-fidelity simulation, a virtual problem that has real-life impacts, or a method to optimise a chemical plant over time are all examples that are at the "real-world tasks'' end of the spectrum. Similarly, a toy problem or a simple robot arm stacking blocks in a lab are less "real world.'' We would also argue that the task could either be control (the focus of this work) or evaluation (e.g., Reinforcement Learning can be used to judge the quality of a laser weld, which is also a sequential decision task). However, to make things more concrete, we will adopt the following existing definition from the literature for the remainder of this work, with the caveat that we believe this is a continuum, not a binary distinction. In this work, we define a \textbf{real-world task} as one in which observations of sequentially dependent events are sourced from control systems grounded in the physical world, consistent with previous work: \begin{quote}\textit{"We consider control systems grounded in the physical world, optimization of software systems, and systems that interact with users such as recommender systems and smartphones. These systems can range in size from a small drone to a data center, in complexity from a one-dimensional thermostat to a self-driving car, and in cost from a calculator to a spaceship. In all these scenarios there are recurring themes: there is rarely a good simulator, the systems are stochastic \& non-stationary, have strong safety constraints, and running them is expensive and/or slow."} \cite{dulac:rl-challenges}.\end{quote} This is in comparison to events generated from virtual or simulated sources. For example, a game of chess is grounded in the real world but it can be simulated by computers for millions of games to find all possible outcomes. A real-world task in this case would be to only use the events observed by playing against human players where interaction is limited. To apply an offline method, data is collected containing features and decisions tracked over a time period where the order is logged. Typically, this data is collected by a data analyst or data engineer; someone that has a background in mathematics and coding ability but has limited understanding of Reinforcement Learning. In some cases, Reinforcement Learning may only be a small component in a larger system. For example, it could be used to tune the parameters for control engineering \cite{Johannes:2020} within a prediction problem or control a single thermostat \cite{urieli:2013,brandi:2020}. Alternatively, it can be used as an end-to-end replacement for a problem that either naturally works well for Reinforcement Learning or can be adapted such that doing so has advantages (e.g., automation). In both cases, it is important to understand both the limitations and advantages of introducing Reinforcement Learning to known problems which will be decided by both the data available and real-world politics (e.g., ensuring ethical automation practices). Increasingly, Reinforcement Learning is being discussed in specific domains. So much so that healthcare professionals have recently specified guidelines for doing so in the context of diagnosis and the risks associated with limited observed knowledge about the patient. For example: \begin{quote}\textit{"Severely sick septic patients may receive fluids earlier than healthier patients yet have worse outcomes, which is clearly a result of them being sicker in the first place. This difference in outcome may lead to an analysis that associates earlier fluid administration with worse outcomes if not properly adjusted for clinical context.''} \cite{Gottesman:2019} \end{quote}