We’re going to replicate DeepMind’s Deep Q Learning algorithm for Super Mario Bros! This bot will be able to play a bunch of different video games by using reinforcement learning. This is the first video in this series that uses libraries (Keras & Gym) because if it didn’t, the code would be way too long for a short video. I’ll make a longer, in-depth version without libraries soon.

Code for this video:

https://github.com/llSourcell/deep_q_learning

More learning resources:

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/

http://karpathy.github.io/2016/05/31/rl/

https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html

https://keon.io/deep-q-learning/

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Resources_files/deep_rl.pdf

http://mnemstudio.org/path-finding-q-learning-tutorial.htm

Is it possible to do that with games like Overwatch?

Hi Siraj, I am A bit stuck with implementing reinforcement learning. I was hoping you could help me understand what exactly going on. A link with detailed description of the problem is found here:

https://www.reddit.com/r/MachineLearning/comments/7qrsyd/dhelp_in_understanding_reinforcement_learning/

Can u make a video on Temporal Difference Learning(TD).

Hi Siraj, I am interested in stock price prediction and would like to have a glance on the second runner up code, can you kindly share the github link, thanks in advance,

Can you show the game Mario game actually running? It throws an error in my notebook. I'm using python 3.6 so maybe its a translation issue?

I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19

So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?

Modified Q Learning model achieves superhuman level on OpenAI Lunar Lander test.

https://www.youtube.com/watch?v=z9R5hDT6vUQ

Sounds like q learning for investments

Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.

I have a 4 node raspberry pi cluster computer, can I use it to train this Mario game?

Do I have to learn calculus to learn deep learning?

So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?

Very nice! Do you have a video with more detail on Q learning? Would be interesting to see how the Q matrix evolves over play of a simple game.

hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.

Question: Why do pooling layers make the Network spatially invariant? Don't they just compress information? I thought convolutional layers do that, which the model does have

But who to adjust this for certain purpose (like collecting all coins / getting the less score / speedrunning)?