Deep Q Learning for Video Games – The Math of Intelligence #9
Articles Blog

Deep Q Learning for Video Games – The Math of Intelligence #9

September 7, 2019


Hello world! It’s Siraj. And in this episode we are gonna build our own game bot capable of beating any Atari game that we give it. In 2014, Google aquired a small, London-based startup called Deep Mind for 500 million dollars. That is a lot of *mumble* — *horn* For what seems like pretty simple software: it was a bot for Atari games. But the reason they paid so much for it was because it was one of the first steps towards General Artificial Intelligence. That means an AI that can excel in not one but a variety of tasks. Its capabilities are generalised, just like ours are. Their paper was later featured on the cover of Nature. Showing that their algorithm could be applied to 50 different Atari games and achieve ‘superhuman’ performance in all of them. They called their bot the *echo* ‘Deep Q Learner’. But before we talk about that, let’s talk about the concept of Reinforcement Learning. Supervised and unsupervised learning techniques are well known in the applied AI community. You give some model a dataset with labels, and have it learn the mapping between the two or a dataset without labels, and try to learn what the labels are by clustering or detecting the anomaly in the dataset. We can use these datasets to create data classifiers or data generators. But consider this scenario! You’re playing the game Super Mario Brothers – awesome game – and rather than play it yourself, you’d like to train an AI to play it for you. How should we think about this problem? If we screen-captured game sessions from expert players, we could use the video frames from the game as input to a model and the output could be the directions that Mario could move. This would be a supervised classification problem since we have labels, the directions to move, assuming we have lots of data, and access to some sick GPUs, it makes sense to try out a neural network here. Given video frames in a new game, it would know how best to navigate to beat the level. Right? Yeah, but then we would need hundreds of hours of gameplay videos to train on and it doesn’t seem like an elegant solution to this specific problem. First of all, we aren’t training our model on a static dataset but a dynamic one. The training data is continuous – new frames are constantly emerging in this game world, this environment. And we want to learn how to act in this world Humans learn best by interacting with an environment not by watching others interact in it. Environments are stochastic, any number of events can occur. It seems best to learn by trying out different possibilities So rather than framing this problem as solvable by pattern recognition, let’s frame it as solvable through a process of Trial and Error.

Only registered users can comment.

  1. Siraj please check out that DOTA2 AI. It's phenomenal. IT can easily beat the best dota players in world 1 V 1

  2. Hi Siraj, could you have a video mention the OpenAI bot that beat a pro gamer at Dota 2 a few days ago? It's great that you released this video so close to this current event

  3. Hey Siraj, fantastic work. I am a unity developer so how can i integrate this functionality in games i already coded. Best wishes for future videos.

  4. is it possible to do what you do in windows? cause i cant get the environment started even though the emulator is running can anyone help?

  5. Thanks Siraj. Can't wait for the Super Mario Bros Bot. I enjoyed your videos in the deep learning ND. Cheers your effort is appreciated.

  6. Could you guys give me any hint on how i can approach pong game to build a model where i can apply q learning? (I have all the informations necessary, like ball x and y position, player x and y position, ball speed, etc). I'm struggling at this :_:

  7. Hey Siraj, great work … as always 🙂 Could you upload a code that you show in video on enviroment SuperMarioBros. Linked MountainCar-v0 is great but it would be nice to compare that one you talked in video.
    Keep up good work 😉

  8. Bill Nye of Computer Science
    Kanye of Code
    Beyonce of Neural Networks
    Osain Bolt of Learning
    Chuck Norris of Python
    Jesus Christ of Machine Learning

  9. Hi Siraj, is there any way we can train a machine learning mode with a raw text file and properly arranged data from the text file in .csv file? So that when we input a new text file it automatically converts that text file into the .csv file format with columns and rows which we used as training data. Is this even possible?

  10. Hi Siraj, I love your teaching style and I am a member in UDACITY's deep learning foundation program in which you are an instructor, Here my doubt is that can we use DEEP Q-LEARNING in any other situations where image or pixel input would not be there, If yes can you tell how. I have read that for building Q-table we can use neural networks instead of table(state * action). can you explain it or if possible do a video about this.

  11. at 5:15
    you say the more in the fuure the reward is – more are we uncertain of it? i didn't get it-can you explain with an example ?

  12. Hey Siraj, here's my shot at this week's challenge. Definitely interested to know if anyone has any feedback on improving my convolutional neural network's performance.
    https://github.com/NoahLidell/math-of-intelligence/blob/master/q_learning/cartpole_cnn_qlearning.ipynb

  13. Here is my code challenge. thank you.
    https://github.com/jhGitHub009/Game_bot_DQN
    this code is working but not efficient……sorry

  14. The videos of David Silver from Deepmind are worth watching, that might be the bast reinforcement learning courses on web.

  15. I feel like Siraj keeps his videos as short as possible so he can feign concentrated learning, but they're really him talking about the topic, and then skipping over specifics… a bit of a sham, I do like the channel though

  16. Just a piece of advice, I hope you see this : never speak while showing text ! (I remember Vsauce saying this in a video too)
    But really, either show text and read it, or show images / yourself while talking; but displaying a text while saying something different is really hard to follow.
    If you want to talk about a part of the text, try to darken everything but the line you're talking about; overwise we won't know where to stop and whether to listen to you or read. (at least that's what most "educational" YouTubers I follow do, and it works quite well)

    Especially when you're talking about such complicated subjects (and with such pace), I think that's important !

    Hope it'll be useful somehow;
    Thanks for the vid' !

  17. Cool video. Thanks.
    But who to adjust this for certain purpose (like collecting all coins / getting the less score / speedrunning)?

  18. Question: Why do pooling layers make the Network spatially invariant? Don't they just compress information? I thought convolutional layers do that, which the model does have

  19. hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.

  20. Very nice! Do you have a video with more detail on Q learning? Would be interesting to see how the Q matrix evolves over play of a simple game.

  21. So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?

  22. Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.

  23. Modified Q Learning model achieves superhuman level on OpenAI Lunar Lander test.
    https://www.youtube.com/watch?v=z9R5hDT6vUQ

  24. So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?

  25. I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19

    Thank you!

  26. Can you show the game Mario game actually running? It throws an error in my notebook. I'm using python 3.6 so maybe its a translation issue?

  27. Hi Siraj, I am interested in stock price prediction and would like to have a glance on the second runner up code, can you kindly share the github link, thanks in advance,

  28. Hi Siraj, I am A bit stuck with implementing reinforcement learning. I was hoping you could help me understand what exactly going on. A link with detailed description of the problem is found here:

    https://www.reddit.com/r/MachineLearning/comments/7qrsyd/dhelp_in_understanding_reinforcement_learning/

    Thanks!
    Samid

  29. It is stupid congress of mass! nothing new but horrible hordes of shit, piled in 10 min to make you feel utterly bad, confused, with nothing learned from the video!

  30. Hey Siraj! Great stuff! it could be really cool if you would combine Recurrent Neural Network and Deep Q-network = DRQN in a video! Thanks!

  31. Thanks for sharing, I was given this paper in a psychology class to make a presentation about it 😀 problem solved

  32. Great Video Siraj Thanks
    But I don't get something. How do you input 4 Gamescreens?
    Do you combine them as one input?

  33. Thanks a lot Siraj! This video provided a great insight on applications of Q learning and RL. Are there any programming assignments (that includes a dataset) for this?

  34. Hi Siraj, I want to know that I am going to do a path planning project to navigate a robot with Q learning. How much minimum hardware will be required to do this? Do we need a GPU? Will a core i5 PC only with CPU will be enough?

  35. How would reinforcement learning work on a game with a town hub? One that requires mouse clicks to go into a dungeon, eg, Diablo, MMOs.

  36. Can Q-Learning be used for solving classification problem? If it does then how? could you explain or make a video regarding this topic? If you do it will very helpful.

  37. "We can't be sure that we'll get the same rewards in another episode" to justify discounted rewards… There's a gap between the two that I can''t seem to grasp, could anybody help?

  38. 7:46 Well I don't think the pooling layer is used to get insensitive about the locations of the objects in an image. The convolutional layer can already do that since the convolutional operation is actually a pixel window going from location to location until all locations are considered under the set stride. The pooling layer is used to semantically merge similar features into one, like in the max pooling example used in this video, you can see the image is partitioned into 4 parts and in each part, the max number is preserved. The max number can semantically represent a feature in that region. It's more like image compression but we have preserved the key features of this object in this image. Feeding this pooled image into the neural net could be more efficient.

  39. Thanks Siraj – was brought here from my video ''Reinforcement Learning – A Simple Python Example and A Step Closer to AI with Assisted Q-Learning" https://youtu.be/nSxaG_Kjw_w

  40. That part where he says hello world it's siraj… I'm replaying it again and again coz it's soo funny xD

  41. Hi, can someon please explain to me how the model is predicting in this sequence of code when it hasn't been trained yet? I'd really appreciate it. Thanks!!

    if np.random.rand() <= epsilon:
    action = np.random.randint(0, env.action_space.n, size=1)[0]
    else:
    Q = model.predict(state) # Q-values predictions ########### This is the line I don't understand##################
    action = np.argmax(Q) # Move with highest Q-value is the chosen one

Leave a Reply

Your email address will not be published. Required fields are marked *