Q-learning agent
Environment
Optional
Resolution
Returns a action.
Current states
Greedy rate
Action
Returns a score.
Score values
Update model.
Current state
Next state
Reward
Q-learning agent