What is the primary role of the reward function in Policy Gradients algorithms?
To define the goal of the task
To generate training data
To update the policy parameters
To estimate the policy's performance

Machine Learning Algorithms Exercises are loading ...