# Recommender Systems Reinforcement Learning became a highly-relevant research direction for recommender systems as sequential effects are considered towards the long-term results rather than simply finding the best match at the current time. In early research, the states in recommender systems were formally defined by the following: \textit{"The set of states contains all possible sequences of user selections"} \cite{rojanavasu:new-recommendation}. %(ref)[https://pdfs.semanticscholar.org/f041/ac53fba83674a23e0a4a3454f73b6112fe3c.pdf] The authors of \cite{Li:RecommenderRL} notably extended the prior work for the use of \textit{contextual bandits} to further improve the quality of recommendations for a News website. In this problem, in each iteration an agent has to choose between arms. Before making the choice, the agent sees a d-dimensional feature vector (context vector), associated with the current iteration. The learner uses these context vectors along with the rewards of the arms played in the past to make the choice of the arm to play in the current iteration. Over time, the learner's aim is to collect enough information about how the context vectors and rewards relate to each other, so that it can predict the next best arm to play by looking at the feature vectors. An important aspect of RL is the improvement of sequential recommendations over simply the current best match. The authors of \cite{liebman:dj-mc} capitalised on this feature of RL to utilise not just the reward of individual items but their quality given the prior selections. This is formalised as the reward for individual items (songs in this case) plus the reward for the transition between currently chosen items and the new addition in the action. Furthermore, the authors defined each item (a song) by its composition of descriptive features (tempo, loudness, etc.). They note: "We find that these descriptors enable a great deal of flexibility (for instance, in capturing similarities between songs from vastly different backgrounds, or the ability to model songs in unknown languages).". More recent work with deep Reinforcement Learning for recommender systems is covered in the paper \cite{Zhang:RecommenderRL} in addition to other state of the art methods for recommender systems. The authors noted that: \textit{"Most recommendation models consider the recommendation process as a static process, which makes it difficult to capture user’s temporal intentions and to respond in a timely manner."}