양갱로그

앎을 경계하기

Reinforcement Learning 11

Shallow Minded - Specification gaming: the flip side of AI ingenuity

제목 : Specification gaming: the flip side of AI ingenuity 날짜 : 2020년 4월 21일 URL : deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity Specification gaming: the flip side of AI ingenuity Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome. We have all had experiences with specification gaming, ..

Machine Learning/Reinforcement Learning 2020.09.04

차근차근 Spinning Up 톺아보기 Key Paper : A3C

이번에 볼 논문은 Asynchronous Advantage Actor-Critic (A3C)이다. A3C가 소개된 논문은 Asynchronous Methods for Deep Reinforcement Learning 이다. ICML에 Google DeepMind에서 발표하였다. 논문 전체가 A3C에 대한 이야기는 아니고 소개된 여러 비동기적 방법들 중에 RL Task에서 SOTA를 차지했었던 A3C가 포함되어 있다. A3C의 가장 큰 특징은 다음과 같다. Global Network/actor-learner Global Network : 각 Actor threads의 Gradient를 받아서 학습된 Network, Actors에게 parameter를 공유해준다. Actor threads : 주어진 Envi..

Machine Learning/Reinforcement Learning 2020.06.15

차근차근 Spinning Up 톺아보기 Key Paper : PER

Machine Learning/Reinforcement Learning 2019.12.09

차근차근 Spinning Up 톺아보기 Key Paper : DRQN

이번에 볼 논문은 DRQN이다. 논문 번역 식으로 포스팅하다보니 양이 많아지기도 해서 내가 읽은 대로 잊지않기 위해 정리한다. 먼저 DRQN 구조를 보자. DRQN은 DQN에서 첫번째 FC layer를 LSTM layer로 변경한 RNN+CNN 구조의 DQN이다. LSTM LSTM에 대해 이해하기 위해서 블로그를 참고하였다. RNN(Recurrent Neural Network)은 스스로 반복해서 이전 단계에 얻은 정보를 계속 기억하는 뉴럴네트워크이다. 이 그림이 RNN을 이해하는데 도움이 되었다. input X가 차례로 들어오면서 A에 누적되는것을 볼 수 있다. LSTM은 아래와 같은 구조로 생겼다. 각 요소들을 gate라고 하는데, 먼저 f는 forget gate, 잊는 것에 대한 게이트이다. sigm..

Machine Learning/Reinforcement Learning 2019.11.20