Bienvenue sur Scribd !

Q-Learning in RL With Openai Gym: Joo Soon Lee

Transféré par

0% ont trouvé ce document utile (0 vote)

15 vues34 pages

Q-learning is a reinforcement learning algorithm used to find an optimal action-selection policy for Markov decision processes (MDPs). The document discusses using Q-learning with OpenAI Gym to solve the FrozenLake environment. It introduces the Q-table which stores Q-values representing expected rewards for state-action pairs. The Q-values are updated using the Q-learning update rule during training to maximize rewards. The algorithm is demonstrated with a basic implementation, but improvements like adding exploration through random actions or discounts are discussed to improve learning. Results show the agent learning optimal behavior through Q-table training in the FrozenLake environment.

Description originale:

Workshop 16th Jan

Titre original

Workshop 16th Jan

Copyright

Formats disponibles

PPTX, PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Droits d'auteur :

Formats disponibles

Téléchargez comme PPTX, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

15 vues34 pages

Q-Learning in RL With Openai Gym: Joo Soon Lee

Transféré par

이주순

Droits d'auteur :

Formats disponibles

Téléchargez comme PPTX, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 34

Rechercher à l'intérieur du document

Q-Learning in RL

with OpenAI Gym

JOO SOON LEE
2018. 01. 16

Center for Healthcare Robotics

School of integrated technology
Internship Program of Intelligent Robot Technology
Gwangju Institute of Science and Technology (GIST)
Contents

 OpenAI Gym

 Q-Learning with Table

 Improvement
OpenAI Gym
- Library for Reinforcement Learning
: Toolkit for developing and comparing reinforcement learning
algorithms

- In anaconda envs,
(env)$ pip install gym
OpenAI Gym
- FrozenLake (4x4)

- Agent will start

[Agent]
S travel from S to G.

- When the agent

H H go to H (hole), the
Reward : 0
game is over.
H
Reward : +1

H G
[Environment]
Frozen Lake

S 0 1 2 3
H H 4 5 6 7
H 8 9 10 11
H G 12 13 14 15
[Environment] [State]

+1 Left Down Right Up

[Action]
[Agent] [Reward]

=> Return (new_state, reward, done, info)

Q-function
[Agent]
[Action at] [Action at+1]

State St State St+1 State St+2

However Reward is not always given

Q-function (state-action value function) is needed

(1) State
(3) Quality (reward)
(2) Action

Q(state, action)
Q-function

(1) State
(3) Quality (reward)
(2) Action
Q(state, action)

Ex)

Q(s, Left) = 0
Action : Right
Q(s, Right) = 0.5 = argmax Q(s, a)
= π*(s)
Q(s, Up) = 0
Q(s, Down) = 0.3
Finding, Learning Q-function
To judge which actions can receive the highest rewards.

* Assumption

For Q(s, a)
1. The agent is in State s
2. When action is taken, it receives reward r and moves to state s‘
3. There is Q(s’, a’) in state s’
4. Q(s, a) = r + max Q(s’, a’)
State, Action and Reward

S
H H
H
H G
Terminal State
s0 , a0 , r1 , s1 , a1 , r2 , …. , sn-1 , an-1 , rn , sn
Tatal Reward

R = r1 + r 2 + r3 + … + rn 𝑅𝑡∗ = rt + rt+1 + rt+2 + … + rn

Rt = rt + rt+1 + rt+2 + … + rn = rt + max ( Rt+1 )
= rt + Rt+1 Q(s, a) = r + max Q(s’, a’)
Learning Q-function
Updating Q(s, a)

෠ a) ← r + max 𝑄(s’,
𝑄(s, ෠ a’)

S
H H Q-Table
Learning
H
H G

16 (state) x 4 (action)
Q-Table Learning
1. Initialized Table with Zero(0)
2. ෠ a) ← r + max 𝑄(s’,
𝑄(s, ෠ a’)
3. Random action before arriving to Goal

16 (state) x 4 (action)

Updating Q(s, a)
Q-Table Learning
Q(0,
Q(1,R)R)==reward
reward++max
maxQ(1,
Q(2,a)a)==00++00

0 0
0

0
Q(14, R) = reward + max Q(15, a) = 1 + 0
1

16 (state) x 4 (action)
Q-Table Learning

1 1
1

1
1

1 1
Q-Table Learning – Algorithm
Q-Table Learning Result
Q-Table Learning Result
Q-Table Learning Result
Q-Table Learning
Problem
: The agent will not go different way because the agent follows only maximum value.
So the Q-Table may not be updated perfectly.

Solution

: Sometimes, the agent will select random action not following maximum value.

Exploit & Exploration

Q-Table Learning - Exactly
[ Random Action ]
1. Random Move : How much the chains twist around each other
1) E-greedy
2) Decaying E-greedy

2. Random Noise
1) Random Noise
2) Decaying Noise
Q-Table Learning - Exactly
[ Random Action ]
1. Random Move
1) E-greedy
e = 0.1
if rand(1) < e :
action = random
else :
action = argmax[ Q(s,a) ]

2) Decaying E-greedy
for i in range (1000) :
e = 0.1 / (i+)
if rand(1) < e :
action = random
else :
action = argmax[ Q(s,a) ]
Q-Table Learning - Exactly
[ Random Action ] ex)
2. Random Noise [ 0.5 0.6 0.3 ] + [0.1 0.2 0.14]
Q(s, a) noise
1) Random Noise
action = argmax[ Q(s, a) + random_value ]

2) Decaying Noise
for i in range (1000) :
action = argmax[ Q(s,a) + random_value / (i+1) ]
Q-Table Learning - Exactly
Problem
: Too many paths can be generated

Solution

: Multiplying discount constant to lower [ maxQ(s’, a’) ]

Discount constant

෠ a) ← r + 𝛾 × max 𝑄(s’,
𝑄(s, ෠ a’)

𝛾 : discount constant < 1

Q-Table Learning – Exactly - Results
* Adding Noise
Q-Table Learning – Exactly - Results
* Adding Noise
Q-Table Learning – Exactly - Results
* Adding Noise
Q-Table Learning – Exactly - Results
* Adding Noise – various discount factors

Discount constant = 0.50 Discount constant = 0.75 Discount constant = 0.99

Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy
Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy
Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy
Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy – Larger ‘e’ : More Random moves
Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy – Larger ‘e’ : More Random moves
Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy – Larger ‘e’ : More Random moves
Q-Table Learning – Exactly - Results
* Random Move - Decaying E-greedy – Larger ‘e’ : More Random moves
Thank you
for your Attention.

Vous aimerez peut-être aussi

Problem Set 1
Document15 pages
Problem Set 1
Muhammad Hamza
Pas encore d'évaluation
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
Document19 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
monsieresm
Pas encore d'évaluation
Lecture 3 - MDPs and Dynamic Programming
Document66 pages
Lecture 3 - MDPs and Dynamic Programming
Trinaya Kodavati
Pas encore d'évaluation
Steady State Error
Document42 pages
Steady State Error
Eng AbdiRahim Khalif Ali
Pas encore d'évaluation
MDP PDF
Document37 pages
MDP PDF
bsudheertec
Pas encore d'évaluation
Ch15 Transfer Function
Document17 pages
Ch15 Transfer Function
Lujain Amro
Pas encore d'évaluation
Chapter 5 - Managing Class A Items
Document17 pages
Chapter 5 - Managing Class A Items
Khánh Đoan Lê Đình
Pas encore d'évaluation
Non-Deterministic Reward and Action
Document2 pages
Non-Deterministic Reward and Action
Reddy Bunny
Pas encore d'évaluation
Find The Nyquist Plot of The Given Transfer Function 1. ( ) 2. ( ) 3. ( ) T 2, T 3
Document6 pages
Find The Nyquist Plot of The Given Transfer Function 1. ( ) 2. ( ) 3. ( ) T 2, T 3
Jayant Mohanty
Pas encore d'évaluation
2D Geometrical Transf
Document13 pages
2D Geometrical Transf
Dr Sameer Chakravarthy VVSS
Pas encore d'évaluation
System Design 10 - Time Domain Analysis
Document14 pages
System Design 10 - Time Domain Analysis
Sanjay Raaj
Pas encore d'évaluation
Homework #3: MDPS, Q-Learning, &: Pomdps
Document18 pages
Homework #3: MDPS, Q-Learning, &: Pomdps
shivam pradhan
Pas encore d'évaluation
Steady-State Errors (202011)
Document45 pages
Steady-State Errors (202011)
Azlina Ramli
Pas encore d'évaluation
Atib
Document10 pages
Atib
Atib Jaman
Pas encore d'évaluation
q2B Review Sol
Document14 pages
q2B Review Sol
Jamman Ayesha
Pas encore d'évaluation
Policy Gradient Methods
Document70 pages
Policy Gradient Methods
Islam Utyagulov
Pas encore d'évaluation
6b Soln
Document3 pages
6b Soln
Omar Ahmed
Pas encore d'évaluation
Handling Non-Convexities: Entrepreneurship and Financial Frictions
Document5 pages
Handling Non-Convexities: Entrepreneurship and Financial Frictions
Saptarshi Mukherjee
Pas encore d'évaluation
1 More On The Cayley-Hamilton Theorem: MAE 280A 1 Maur Icio de Oliveira
Document7 pages
1 More On The Cayley-Hamilton Theorem: MAE 280A 1 Maur Icio de Oliveira
Mouli
Pas encore d'évaluation
Data Structure and Algorithms Advanced Lab: Pham Quang Dung Queues and Breadth First Search
Document28 pages
Data Structure and Algorithms Advanced Lab: Pham Quang Dung Queues and Breadth First Search
Nguyễn Lưu Hoàng Minh
Pas encore d'évaluation
Problema1 Walter Arrieta
Document2 pages
Problema1 Walter Arrieta
maroca
Pas encore d'évaluation
Generating Functions and Recurrence Relations
Document11 pages
Generating Functions and Recurrence Relations
Unknow User
Pas encore d'évaluation
Tugas Kel Sallen Key
Document5 pages
Tugas Kel Sallen Key
bambang
Pas encore d'évaluation
2D Geometrical Transf
Document13 pages
2D Geometrical Transf
komalpilla
Pas encore d'évaluation
MDP 2
Document53 pages
MDP 2
Tejas Mhaiskar
Pas encore d'évaluation
Root Locus Design Method PDF
Document18 pages
Root Locus Design Method PDF
David Portela Montealegre
Pas encore d'évaluation
Cs 188 HW Solutions Artificial Intelligence
Document7 pages
Cs 188 HW Solutions Artificial Intelligence
Claudia Wong
Pas encore d'évaluation
01 PID Compensation
Document49 pages
01 PID Compensation
Shadowツ
Pas encore d'évaluation
Beta Function and Gama Function
Document12 pages
Beta Function and Gama Function
Pratik Das
Pas encore d'évaluation
Exercise Solution For Chapter 2: Nanxi Zhang July 22, 2020
Document2 pages
Exercise Solution For Chapter 2: Nanxi Zhang July 22, 2020
maryamsaleem430
Pas encore d'évaluation
Lec 45
Document17 pages
Lec 45
무나
Pas encore d'évaluation
Getting Started With Reinforcement Learning and Open AI Gym
Document10 pages
Getting Started With Reinforcement Learning and Open AI Gym
KSD
Pas encore d'évaluation
Một Số Bài Toán Trong Mục 14+ Và 13+
Document466 pages
Một Số Bài Toán Trong Mục 14+ Và 13+
Dung Nguyen
Pas encore d'évaluation
Quiz 2 - Previous Semesters
Document4 pages
Quiz 2 - Previous Semesters
muhammad saad
Pas encore d'évaluation
Suggested Solution To Past Papers PDF
Document20 pages
Suggested Solution To Past Papers PDF
Mgla Angel
Pas encore d'évaluation
Applied Linear Algebra 1st Edition Olver Solutions Manual
Document25 pages
Applied Linear Algebra 1st Edition Olver Solutions Manual
DavidBrownjrzf
100% (58)
Assignment 4 (Sol.) : Reinforcement Learning
Document6 pages
Assignment 4 (Sol.) : Reinforcement Learning
simar rocks
Pas encore d'évaluation
Steady State Errors
Document33 pages
Steady State Errors
asma mushtaq
Pas encore d'évaluation
MATLAB Control-System Toolbox Tutorial
Document2 pages
MATLAB Control-System Toolbox Tutorial
qwerty
Pas encore d'évaluation
Lecture29 PDF
Document6 pages
Lecture29 PDF
SurangaG
Pas encore d'évaluation
Control Ch8
Document13 pages
Control Ch8
Wunna Swe
Pas encore d'évaluation
Notações Dos Algoritimos
Document10 pages
Notações Dos Algoritimos
Jonathan Messias
Pas encore d'évaluation
Transfer Function of A First Order System
Document3 pages
Transfer Function of A First Order System
marwan
Pas encore d'évaluation
Dwnload Full Applied Linear Algebra 1st Edition Olver Solutions Manual PDF
Document35 pages
Dwnload Full Applied Linear Algebra 1st Edition Olver Solutions Manual PDF
eyepieceinexact.4h0h
100% (12)
Applied Linear Algebra 1st Edition Olver Solutions Manual
Document18 pages
Applied Linear Algebra 1st Edition Olver Solutions Manual
thioxenegripe.55vd
100% (16)
Heaviside's Expansion Formula
Document15 pages
Heaviside's Expansion Formula
Abdullah Taharat
Pas encore d'évaluation
Smithrobert000123456Medi1111-We01: May/June 2020 - W E 0
Document11 pages
Smithrobert000123456Medi1111-We01: May/June 2020 - W E 0
Adam
Pas encore d'évaluation
Littomore
Document169 pages
Littomore
Kumar Rahul
Pas encore d'évaluation
Disturbance Rejection and Noise Attenuation:: Control 1, Second Material
Document5 pages
Disturbance Rejection and Noise Attenuation:: Control 1, Second Material
Karam Almasri
Pas encore d'évaluation
Chapter Two-Viscous Flow-Part1 PDF
Document11 pages
Chapter Two-Viscous Flow-Part1 PDF
UNKNOWN
Pas encore d'évaluation
EE3331C Feedback Control Systems L7: Control System Performance: Transient & Steady-State
Document30 pages
EE3331C Feedback Control Systems L7: Control System Performance: Transient & Steady-State
premsanjith subramani
Pas encore d'évaluation
Inverse Trignometry DPP
Document2 pages
Inverse Trignometry DPP
SURYA
100% (1)
Assginment - With Hints
Document2 pages
Assginment - With Hints
rui
Pas encore d'évaluation
Time Response Analysis
Document151 pages
Time Response Analysis
Tushar Gupta
Pas encore d'évaluation
Full Test - Sets, Relations and Functions: ©rajat Kalia - Alpha Classes
Document9 pages
Full Test - Sets, Relations and Functions: ©rajat Kalia - Alpha Classes
Rajat Kalia
Pas encore d'évaluation
Controls Systems Assignment 4
Document5 pages
Controls Systems Assignment 4
romesaali23
Pas encore d'évaluation
Appendix: 1. Power Law Fluid in A Pipe: (A) Z-Momentum
Document5 pages
Appendix: 1. Power Law Fluid in A Pipe: (A) Z-Momentum
Rehan Riaz
Pas encore d'évaluation
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
D'Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
Pas encore d'évaluation
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
D'Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
Pas encore d'évaluation
Transformation of Axes (Geometry) Mathematics Question Bank
D'Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
Évaluation : 3 sur 5 étoiles
3/5 (1)
Robotics HW1
Document17 pages
Robotics HW1
이주순
Pas encore d'évaluation
합성화학실험 SPPS Lecture
Document15 pages
합성화학실험 SPPS Lecture
이주순
Pas encore d'évaluation
2017 화학합성실험 Week 13-14 Solid Phase Peptide Synthesis
Document6 pages
2017 화학합성실험 Week 13-14 Solid Phase Peptide Synthesis
이주순
Pas encore d'évaluation
2017 화학합성실험 Week 4 5 Grignard and Column Chromatography
Document3 pages
2017 화학합성실험 Week 4 5 Grignard and Column Chromatography
이주순
Pas encore d'évaluation
Ch. 3 Homework (Due: 3/27 Mon) CM3104 Physical Chemistry 2
Document1 page
Ch. 3 Homework (Due: 3/27 Mon) CM3104 Physical Chemistry 2
이주순
Pas encore d'évaluation
2017 합성화학실험 - week 1
Document8 pages
2017 합성화학실험 - week 1
이주순
Pas encore d'évaluation
Animal Making
Document3 pages
Animal Making
이주순
Pas encore d'évaluation
Section1 Exp1 Team2 20155131-이주순
Document3 pages
Section1 Exp1 Team2 20155131-이주순
이주순
Pas encore d'évaluation
수업 PPT (ADULT) - 02
Document8 pages
수업 PPT (ADULT) - 02
이주순
Pas encore d'évaluation
Is Fair Trade A Fair Trade-Off?
Document23 pages
Is Fair Trade A Fair Trade-Off?
이주순
Pas encore d'évaluation
A Seminar ON Artificial Intilligence
Document16 pages
A Seminar ON Artificial Intilligence
Prachi Vyas
Pas encore d'évaluation
Pipelined Adc
Document14 pages
Pipelined Adc
Hubert Ekow Attah
Pas encore d'évaluation
Deep Learning Handout
Document6 pages
Deep Learning Handout
KHAN AZHAR ATHAR
100% (1)
02 Definitions and Acronyms: HCI, UI - Google Drive
Document1 page
02 Definitions and Acronyms: HCI, UI - Google Drive
Mihaela Vorvoreanu
Pas encore d'évaluation
Code For BFS Applied On MAP To Reach From Arad To Bucharest. (Artificial Intelligence)
Document2 pages
Code For BFS Applied On MAP To Reach From Arad To Bucharest. (Artificial Intelligence)
Ahmad Farooq
100% (1)
The Counterpropagation Network The Counterpropagation Network
Document3 pages
The Counterpropagation Network The Counterpropagation Network
Kirtesh Sharma
Pas encore d'évaluation
9-1 Moem Hirarc Feb 2017
Document1 page
9-1 Moem Hirarc Feb 2017
PHH9834
Pas encore d'évaluation
HIRADC Contoh
Document2 pages
HIRADC Contoh
resal
Pas encore d'évaluation
Project
Document14 pages
Project
Ravi Raj
Pas encore d'évaluation
Application of AI To Petrophysics
Document1 page
Application of AI To Petrophysics
jombibi
Pas encore d'évaluation
JLF Tanuku
Document6 pages
JLF Tanuku
L Jayanthi
Pas encore d'évaluation
Annotated Bibliography
Document5 pages
Annotated Bibliography
api-356639012
Pas encore d'évaluation
Neural
Document35 pages
Neural
depa
Pas encore d'évaluation
Graded - What Is AI - Applications and Examples of AI - Coursera
Document1 page
Graded - What Is AI - Applications and Examples of AI - Coursera
skarthikpriya
Pas encore d'évaluation
Shape Classification Using Histogram of Oriented Gradients
Document6 pages
Shape Classification Using Histogram of Oriented Gradients
pi194043
Pas encore d'évaluation
Cs2053 Soft Computing Syllabus R
Document1 page
Cs2053 Soft Computing Syllabus R
anoopkumar.m
Pas encore d'évaluation
Robotics:an Introduction
Document6 pages
Robotics:an Introduction
anil415
100% (1)
Generative Adversarial Networks Review 1-06-08-1.edit
Document24 pages
Generative Adversarial Networks Review 1-06-08-1.edit
gijare6787
Pas encore d'évaluation
Deep Learning Resources
Document5 pages
Deep Learning Resources
pscr
Pas encore d'évaluation
Planning of Robots Cooperation Automatic Modelling and Control
Document25 pages
Planning of Robots Cooperation Automatic Modelling and Control
Mowafak Hassan
Pas encore d'évaluation
The Nano World
Document25 pages
The Nano World
Meili R.
Pas encore d'évaluation
Machine Learning
Document2 pages
Machine Learning
Er Umesh Thoriya
Pas encore d'évaluation
Artificial General Intelligence (AGI) - Becoming Human - Artificial Intelligence Magazine PDF
Document8 pages
Artificial General Intelligence (AGI) - Becoming Human - Artificial Intelligence Magazine PDF
tanya tsekeni
Pas encore d'évaluation
FSDL 2 Projects
Document97 pages
FSDL 2 Projects
Akshay Narasimha
Pas encore d'évaluation
Niosh Training Schedule
Document20 pages
Niosh Training Schedule
akula
100% (2)
Summary of Pattern Recognition First Chapter
Document4 pages
Summary of Pattern Recognition First Chapter
zeeshan
Pas encore d'évaluation
Sharda Bia10e Tif 06
Document11 pages
Sharda Bia10e Tif 06
Kong Kasemsuwan
Pas encore d'évaluation
Neuro Fuzzy System
Document24 pages
Neuro Fuzzy System
Jeeva MJ
Pas encore d'évaluation
Job Safety Analysis (Jsa) and Risk Assessment Form: Wastono
Document8 pages
Job Safety Analysis (Jsa) and Risk Assessment Form: Wastono
Putra Fera
Pas encore d'évaluation
Penerapan Support Vector Machine (SVM) Pada Small Dataset Untuk Deteksi Dini Gangguan Autisme
Document6 pages
Penerapan Support Vector Machine (SVM) Pada Small Dataset Untuk Deteksi Dini Gangguan Autisme
Ondo Husein
Pas encore d'évaluation