Deep Q Learning for Video Games – The Math of Intelligence #9

We’re going to replicate DeepMind’s Deep Q Learning algorithm for Super Mario Bros! This bot will be able to play a bunch of different video games by using reinforcement learning. This is the first video in this series that uses libraries (Keras & Gym) because if it didn’t, the code would be way too long for a short video. I’ll make a longer, in-depth version without libraries soon.

Code for this video:

Please Subscribe! And like. And comment. That’s what keeps me going.

More learning resources:

Join us in the Wizards Slack channel:

And please support me on Patreon:
Follow me:
Facebook: Instagram:


32 thoughts on “Deep Q Learning for Video Games – The Math of Intelligence #9”

  1. I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19

    Thank you!

  2. So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?

  3. Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.

  4. So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?

  5. hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.

Leave a Reply

Your email address will not be published. Required fields are marked *