Hello world! It’s Siraj. And in this episode we are gonna build our own game bot capable of beating any Atari game that we give it. In 2014, Google aquired a small, London-based startup called Deep Mind for 500 million dollars. That is a lot of *mumble* — *horn* For what seems like pretty simple software: it was a bot for Atari games. But the reason they paid so much for it was because it was one of the first steps towards General Artificial Intelligence. That means an AI that can excel in not one but a variety of tasks. Its capabilities are generalised, just like ours are. Their paper was later featured on the cover of Nature. Showing that their algorithm could be applied to 50 different Atari games and achieve ‘superhuman’ performance in all of them. They called their bot the *echo* ‘Deep Q Learner’. But before we talk about that, let’s talk about the concept of Reinforcement Learning. Supervised and unsupervised learning techniques are well known in the applied AI community. You give some model a dataset with labels, and have it learn the mapping between the two or a dataset without labels, and try to learn what the labels are by clustering or detecting the anomaly in the dataset. We can use these datasets to create data classifiers or data generators. But consider this scenario! You’re playing the game Super Mario Brothers – awesome game – and rather than play it yourself, you’d like to train an AI to play it for you. How should we think about this problem? If we screen-captured game sessions from expert players, we could use the video frames from the game as input to a model and the output could be the directions that Mario could move. This would be a supervised classification problem since we have labels, the directions to move, assuming we have lots of data, and access to some sick GPUs, it makes sense to try out a neural network here. Given video frames in a new game, it would know how best to navigate to beat the level. Right? Yeah, but then we would need hundreds of hours of gameplay videos to train on and it doesn’t seem like an elegant solution to this specific problem. First of all, we aren’t training our model on a static dataset but a dynamic one. The training data is continuous – new frames are constantly emerging in this game world, this environment. And we want to learn how to act in this world Humans learn best by interacting with an environment not by watching others interact in it. Environments are stochastic, any number of events can occur. It seems best to learn by trying out different possibilities So rather than framing this problem as solvable by pattern recognition, let’s frame it as solvable through a process of Trial and Error.