gamers

Deep Q Learning for Video Games – The Math of Intelligence #9



We’re going to replicate DeepMind’s Deep Q Learning algorithm for Super Mario Bros! This bot will be able to play a bunch of different video games by using reinforcement learning. This is the first video in this series that uses libraries (Keras & Gym) because if it didn’t, the code would be way too long for a short video. I’ll make a longer, in-depth version without libraries soon.

Code for this video:
https://github.com/llSourcell/deep_q_learning

Please Subscribe! And like. And comment. That’s what keeps me going.

More learning resources:
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0
http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/
http://karpathy.github.io/2016/05/31/rl/
https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html
https://keon.io/deep-q-learning/
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Resources_files/deep_rl.pdf
http://mnemstudio.org/path-finding-q-learning-tutorial.htm

Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/

And please support me on Patreon:
https://www.patreon.com/user?u=3191693
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/

source

32 thoughts on “Deep Q Learning for Video Games – The Math of Intelligence #9”

  1. I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19

    Thank you!

  2. So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?

  3. Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.

  4. So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?

  5. hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.

Leave a Reply

Your email address will not be published. Required fields are marked *