Monte Carlo agent
Environment
Optional
Resolution
Returns a action.
Current states
Greedy rate
Action
Returns a score.
Score values
Reset agent.
Update model.
Next state
Reward
Done epoch or not
Monte Carlo agent