We’re going to replicate DeepMind’s Deep Q Learning algorithm for Super Mario Bros! This bot will be able to play a bunch of different video games by using reinforcement learning. This is the first video in this series that uses libraries (Keras & Gym) because if it didn’t, the code would be way too long for a short video. I’ll make a longer, in-depth version without libraries soon.

Code for this video:

https://github.com/llSourcell/deep_q_learning

Please Subscribe! And like. And comment. That’s what keeps me going.

More learning resources:

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/

http://karpathy.github.io/2016/05/31/rl/

https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html

https://keon.io/deep-q-learning/

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Resources_files/deep_rl.pdf

http://mnemstudio.org/path-finding-q-learning-tutorial.htm

Join us in the Wizards Slack channel:

http://wizards.herokuapp.com/

And please support me on Patreon:

https://www.patreon.com/user?u=3191693

Follow me:

Twitter: https://twitter.com/sirajraval

Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/

source

Is it possible to do that with games like Overwatch?

Very nice information and rythm, subscribed !

Hi Siraj, I am A bit stuck with implementing reinforcement learning. I was hoping you could help me understand what exactly going on. A link with detailed description of the problem is found here:

https://www.reddit.com/r/MachineLearning/comments/7qrsyd/dhelp_in_understanding_reinforcement_learning/

Thanks!

Samid

Your videos are amazing, thanks.

Why do you talk like a girl giving so much expressions?

Nice explanation, but try to dance less in your next video.

Can u make a video on Temporal Difference Learning(TD).

Hi Siraj, I am interested in stock price prediction and would like to have a glance on the second runner up code, can you kindly share the github link, thanks in advance,

The memes were distracting, was to busy laughing that I didn't learn anything.

Can you show the game Mario game actually running? It throws an error in my notebook. I'm using python 3.6 so maybe its a translation issue?

I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19

Thank you!

So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?

I just want to be your friend.

Modified Q Learning model achieves superhuman level on OpenAI Lunar Lander test.

https://www.youtube.com/watch?v=z9R5hDT6vUQ

Nice

Sounds like q learning for investments

Thank you very much 🙂

Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.

hey siraj

I have a 4 node raspberry pi cluster computer, can I use it to train this Mario game?

Do I have to learn calculus to learn deep learning?

Video uploaded Aug 2017 and it's only 9:46 long? Autolike from me 🙂

So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?

too much drama

Sorry bro, you lost me.

А зачем он нужен бот то этот.

Very nice! Do you have a video with more detail on Q learning? Would be interesting to see how the Q matrix evolves over play of a simple game.

Hi, here's a free guide on learning how to code: https://learntocode.lpages.co/learn-to-code-fast/

Siraj Rival is the neurotransmitter of generation z

hey Siraj, please join our tedx event!!! would make a stellar speaker, sahilraj16@gmail.com for tedxwhitefield.

hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.

Question: Why do pooling layers make the Network spatially invariant? Don't they just compress information? I thought convolutional layers do that, which the model does have

Cool video. Thanks.

But who to adjust this for certain purpose (like collecting all coins / getting the less score / speedrunning)?