A2C agent
Environment
Resolution of actions
Number of processes
Network layers
Optimizer of the network
Returns a action.
Current states
Action
Returns a score.
Score values
Update model.
Done epoch or not
Learning rate
Batch size
Loss value
A2C agent