Policy gradient agent
Environment
Optional
Resolution
Returns a action.
Current states
Action
Returns a score.
Score values
Reset agent.
Update model.
Next states
Reward
Done epoch or not
Learning rate
Policy gradient agent