Part 3: Intro to Policy Optimization Deriving the Simplest Policy Gradient Implementing the Simplest Policy Gradient Expected Grad-Log-Prob Lemma Don’t Let the Past Distract You Implementing Reward-to-Go Policy Gradient Baselines in Policy Gradients Other Forms of the Policy Gradient Recap Simplest Policy Gradient 유도 확률적이고 파라미터화된 정책인 $$\pi_{\theta}$$ 의 경우를 다룬다. 우리의 목표$$J(\pi_{\theta})$$는 기대 누적 보..