In policy gradient methods, the policy is commonly represented using a:
Decision Tree
Neural Network
Overlook minor misbehaviors
Impose harsh punishments for any infraction

Reinforcement Learning Exercises are loading ...