Académique Documents
Professionnel Documents
Culture Documents
3,000
Years Old
40M
Players
10^170
Positions
The Rules of Go
Capture
Territory
Value network
Evaluation
v (s)
Position
Policy network
Move probabilities
p (a|s)
Position
Exhaustive search
Monte-Carlo rollouts
Supervised Learning
policy network
Reinforcement Learning
policy network
Self-play data
Value network
P prior probability
Q action value
p Policy network
P prior probability
Value network
Value network
r Game scorer
Action value
v Value network
r Game scorer
Deep Blue
Handcrafted chess knowledge
AlphaGo
Knowledge learned from expert
games and self-play
Nature AlphaGo
Seoul AlphaGo
9p
7p
5p
3p
1p
9d
Professional
dan (p)
3500
AlphaGo
494/495 against
computer opponents
7d
1d
Fuego
1k
0
Elo rating
Go Programs
5k
7k
Beginner
kyu (k)
3k
500
Gnu
Go
3d
Pachi
1000
Zen
1500
5d
Amateur
dan (d)
2000
Crazy Stone
Seoul AlphaGo
Deep Reinforcement Learning (as Nature AlphaGo)
Improved search
3000
2500
3d
1d
Fuego
Pachi
1000
5d
Zen
1500
Crazy Stone
CAUTION: ratings
based on selfplay results
7d
1k
Gnu
Go
5k
7k
Beginner
kyu (k)
3k
500
0
Amateur
dan (d)
2000
9p
7p
5p
3p
1p
9d
Professional
dan (p)
3500
>50% against
Nature AlphaGo
with 3 to 4 stone
handicap
4500
Whats Next?
Team
Dave
Silver
Aja
Huang
Chris
Maddison
Arthur
Guez
Laurent
Sifre
Julian
Schrittwieser
Ioannis
Antonoglou
Panneershelvam
Yutian
Chen
Marc
Lanctot
Sander
Dieleman
Dominik
Grewe
John
Nham
Nal
Kalchbrenner
Tim
Lillicrap
Maddy
Leach
Koray
Kavukcuoglu
Thore
Graepel
Demis
Hassabis
George
Veda
With thanks to: Lucas Baker, David Szepesvari, Malcolm Reynolds, Ziyu Wang,
Nando De Freitas, Mike Johnson, Ilya Sutskever, Jeff Dean, Mike Marty, Sanjay Ghemawat.
Demis Hassabis