Demis Hassabis: Towards General Artificial Intelligence


TOMASO POGGIO:
I’m Tomaso Poggio. I am the Director of the
Center for Brains, Minds, and Machines, which is a
center between MIT and Harvard, located in BCS in Building 46. And I have the pleasure
of hosting Demis. I don’t need to
say much about him. If you look on Wikipedia
or Financial Times, there’s a very good
caricature of Demis. And you can find him everywhere. He was a chess child prodigy. He studied computer
science in Cambridge. He started a couple or more
of successful computer games companies. Then he became a neuroscientist,
got a PhD at UCLA in London, and then I was lucky enough that
I can put on my CV that he was a post-doc of mine
for a brief period– between 2009 and 2010, I think. And then we saw each
other a couple of times. Once, he came to speak at one
of the symposia for the MIT 150th birthday. This was 2011. And we had one session of that
symposium, which was called “Brains, Minds, and Machine.” One section was titled “The
Marketplace for Intelligence.” And you spoke about DeepMind
that you had just started. And so DeepMind is an
amazing achievement. Demis managed to put together
a company, sell it to Google. The company is also
a great research lab, I would say the best
one in AI these days, with high-impact papers
in Nature and so on and achievements
like AlphaGo winning against what is
arguably the best player in the world, Lee Sedol. I was in Seoul for the
last game, the fifth game, and it was exciting
and historic. And it’s great to have Demis
here kind of telling us about what went on and what
was the background of it. Demis. [APPLAUSE] DEMIS HASSABIS: Thanks,
Tommy, for that very generous introduction. Thank you all for coming. It’s great being back at MIT. I always love coming back here
and seeing and catching up with old friends. So today, I’m going to
split my talk into two. The first half of it is going– I’m going to give you a kind of
whirlwind overview of how we’re approaching AI
development at DeepMind and the kind of philosophy
behind our approaches. And then the second half of the
talk will be all about AlphaGo and the sort of
combination of our work there and what we’re going
to do with it going forwards. So DeepMind– first of all,
it was founded in 2010, and we joined forces with
Google in an early part of 2014, so we’ve been there for
just over two years now. One of the ways we
think about DeepMind, and one of the ways
I’ve described it, is as a kind of
Apollo program for AI, Apollo program effort for AI. Currently, we have more
than 200 research scientists and engineers, so it’s
a pretty large team now, and we’re growing all the time. So obviously, there’s
a lot of work going on, and I’m only going to be able to
touch on a small fraction of it today. So apart from experimenting
on AI, which is obviously the main purpose of
DeepMind, at least half of my job and half
of my time is spent on thinking about
how to organize the endeavor of science. And what we try
to do at DeepMind is try to create an optimal
environment for research to flourish in. And the way– I mean, that would be
a whole talk in itself. But just to sort of give
you a one-line summary, what we try to do is fuse the best
from Silicon Valley startup culture with the
best from academia. And so, you know,
we’ve tried to combine the kind of blue-sky
thinking that you get in interdisciplinary
research, you get in the best
academic places, with the focus and
energy and resources and pace of a top startup. And I think this fusion
has worked really well. So our mission, as
some of you have heard me state, the way I
kind of articulate that is in two steps. So step one, try and
fundamentally solve intelligence. And then if we
were to do that, I think step two kind
of follows naturally– try and use that technology
to solve everything else. Certainly, that’s
why I’ve always been obsessed with working
on AI since I can remember, because I truly
believe that it’s one of the most important things
that mankind could be working on and will end up being one of
the most powerful technologies we ever invent. So more prosaically, what we’re
trying to do at DeepMind– what we’re interested in
doing– is trying to build what we call
general-purpose learning algorithms. So the algorithms we create
and develop a DeepMind– you know, we’re only
interested in algorithms that can learn automatically
for themselves, from raw inputs and raw experience, and
they’re not handcrafted or preprogrammed in any way. The second important point
is this idea of generality, so the idea that a single set of
algorithms, or a single system, can operate out of the box
across a wide range of tasks. In fact, this sort of connects
with our operational definition of intelligence. I know that’s kind
of a big debate, and there isn’t really
a kind of consensus around what intelligence is. But operationally, we regard it
as the ability to perform well across a wide range of tasks. So we really emphasize this
flexibility and generality. So we call this type of
AI “artificial general intelligence”
internally at DeepMind. And the hallmark
of this kind of AI is that it’s
flexible and adaptive and possibly, you
could argue, inventive. I’m going to come back
to that at the end, once we’ve covered AlphaGo. And the key thing
about it is that it’s built from the ground up
to deal with the unexpected and to flexibly deal with
things that it’s never potentially seen before. So by contrast, obviously AI’s
a huge buzz word at the moment and is hugely popular, both
in academia and industry, but still a lot of the that we
find around us all or that’s labeled AI is of this kind of
what I would call narrow AI. And that’s really
software that’s been handcrafted for
a particular purpose, and it’s special case
for that purpose. And often the problem with
those kinds of systems is that they’re hugely brittle. As soon as the users interact
with those systems in ways that the teams of
programmers didn’t expect, then obviously they just
catastrophically fail. Probably still the most famous
example of that kind of system is Deep Blue. And obviously, that was a hugely
impressive engineering feat back in the late 90s when it
beat Garry Kasparov at chess. But Deep Blue, you know, it’s
arguable whether it really exhibited intelligence,
in the sense that it wasn’t able to
do anything else at all, not even play strictly simpler
games like tic-tac-toe. It would have to be
preprogrammed again from scratch with
expert knowledge. So the way we think
about AI and intelligence is actually through the prism
of reinforcement learning. And most of you will
be probably familiar with reinforcement
learning, but I’m just going to cover it quickly
here in this cartoon diagram. for those of you who
don’t know what it is. So you start off with
an agent or an avatar. It finds itself in some kind of
environment trying to achieve a goal in that environment. That environment
can be, obviously, the real world, in which case
the agent would be a robot. Or it could be a
virtual environment, which is what we mostly
use, in which case it’s a kind of
Avatar of some sort. Now, the agent only interacts
with the environment in two ways. Firstly, it gets observations
through its sensory apparatus and reward signals. And we mostly use
vision, but we are looking to use other
modalities pretty soon. And the job of the agent
system is kind of twofold. Firstly, it’s got
to try and build as accurate a statistical model
as it can of the environment out there based on these
noisy, incomplete observations that it’s getting in real time. And once it’s built
the best model it can, then it has to decide
what action to take from the set of actions
that are available to it at that moment in time to
best get it incrementally towards its goal. So, reinforcement
learning, that’s basically the essence of
reinforcement learning. And this diagram is very
simple but, of course, this hides huge complexities
and difficulties and challenges that would need to be
solved to fully solve what’s in this diagram. But we know that if we
could solve all the issues and challenges behind
this framework, then that would be enough
for general intelligence, human level general
intelligence. And we know that because
many animal systems, including humans, use
reinforcement learning as part of their learning apparatus. In fact, the dopamine
neurons in the brain implement a form of TD learning. So the second thing
that we kind of committed to
philosophically in terms of our approach [INAUDIBLE]
at the beginning was this idea of
grounded cognition. And this is the notion
that a true thinking machine has to be grounded in
a rich sensorimotor reality. But that doesn’t mean it
needs to be a physical robot. As long as you’re
strict with the inputs, you can use virtual
worlds and treat them– these avatars and these agents
in these virtual worlds– like virtual robots in the
sense that the only access they have to the game state is
via their sensory apparatus. So there’s no cheating in terms
of accessing the internal game code or game states of
underlying the game. We think, if you treat
games in that way, then they can be the perfect
platform for developing and testing AI algorithms. And that’s for many reasons. Firstly, you can create
unlimited training data. There’s no testing
bias in the sense that, I think one of
the challenges of AI is actually creating
the right benchmarks. And very often, this
sort of turns out to be an afterthought for an
AI lab to build the benchmarks. And we think actually crafting
the right benchmarks is just as difficult, maybe
even more difficult, than coming up with
the algorithms. And games, of course, have
been built for other purposes– to entertain and
challenge human players– and they’ve been built
by games designers, so they weren’t built
for testing AI programs. So, in that sense,
they’re really independent in terms
of a testing/training ground for our AI ideas. Obviously you can run millions
of agents in parallel, and we do that on
the Google Cloud. And most games have scores,
so it is a convenient way to incrementally measure
your progress and improvement of your AI algorithms. And I think that’s very
important when you’re setting off on a very ambitious
goal and mission like we have, which may be multi-decades. It’s important to have
good incremental measures that you’re going in
the right direction. So this kind of
commitment then leads to this idea of
end-to-end learning agents and this notion of
starting with raw pixels and going all the way to
deciding on an action. At DeepMind, we’re interested in
that entire stack of problems, from perception to action. And I think we’ve, over
the last five years that DeepMind’s been
going, have pioneered this use of games
for AI research. And I see many other
research organizations now, and industrial groups, starting
to use games themselves for their own AI development. So I guess the first
big breakthrough that we had at
DeepMind was really starting this new field of
deep reinforcement learning. And this is the idea of
combining deep learning with reinforcement learning. And this allows
reinforcement learning to really work at scale and
tackle challenging problems. Until we came up with this
idea of deep reinforcement learning– RL, of course, as a
field, has been going for more than thirty years. But generally
speaking, up till then, they’ve only been
applied to toy problems, little grid-world problems. And nothing really
challenging or impressive had been done with
all our research, so we wanted to take
that further and apply it to a really challenging domain. So initially we picked
Atari 2600 platform, which is really the first iconic
games platform from the ’80s. And, conveniently, there’s
a nice open-source emulator which we took and improved. And then there are hundreds of
different classic Atari games available on this emulator. I’m just going to run you one
video in a second showing you how the agent performs in
this Atari environments. But before I do, just to sort
of confirm with you what you’re going to see, the agents
here only get the raw pixels as inputs. So the Atari screens are
200 by 150 pixels in size. There’s about 30,000
pixels per frame. And the goal here is simply
to maximize the score. Everything else is
learned from scratch. So the system is
not told anything about the rules or their
controlling or even the fact that pixels in video
streams next to each other are correlated in time. It has to find all that
structure for itself. And then there’s this notion
again of generality– one system able to play all the
different Atari games out of the box. So we call this
system DQN, and we think it really is a kind
of general Atari player. So this is a little medley of
the same system out of the box, the same [INAUDIBLE] is playing
all these very different games, very different rule sets,
very different objectives, very different visuals out of
the box with the same settings and the same architecture. And it performs better than top
human players on more than half of the Atari games. And since our
“Nature” paper, we’ve now increased that to about
95% of the Atari games. And here’s the boxing where
it’s the red boxer here, and it does a bit of
sparring with the inbuilt AI and then eventually corners
it and just racks up an infinite number of points. So if you want to know
more about that work, you can see our “Nature”
paper from last year. And the actual code is
freely available as well, linked from the “Nature”
site, so you can play around with the DQN
algorithm yourselves. So two planks of
our philosophy is, grounded cognition and
reinforcement learning. A third sort of pillar, if
you like, of our approach is the use of
systems neuroscience. And as a neuroscientist
myself, you know, I think this is going to
play a very important part of understanding
what intelligence is and then trying to
recreate that artificially. But when I talk
about neuroscience, I really want to stress I’m
talk about systems neuroscience. And what we mean
by that is really the algorithms, the
representations, and the architectures
the brain uses rather than the actual low-level
synaptic details of how the neural substrate works. So we’re really talking
about this high level, this computational
level, if you like, of how the brain functions. Now, I haven’t got time to
really go into all the areas that we’re sort of using
neuroscience inspiration for, but suffice it to say, some
of the key areas that we’re working on– memory, attention, concepts,
planning, navigation, imagination– all these areas that
we’re pushing hard on now, it’s going beyond the
work we did for Atari. And actually, the area of
the brain that I studied for my PhD, the hippocampus– which is the center part
of the brain here in pink– is actually implicated in
many of these capabilities. So it seems like,
perhaps the notion of creating an artificial
hippocampus of some sort which mimics the functionality
of the hippocampus, might be a good plan. So I haven’t got time
to go through all of these different areas of
the work we’re doing here, but I’ll just touch on a couple
of the most interesting ones. So one big push that
we have at the moment is adding memory
to neural networks. And what we really want to
do is add very large amounts of controllable memory. So what we’ve done is
created this system, which we are dubbing the
Neural Turing Machine. And what it effectively is is
you take a classical computer, you train a recurrent
neural network on it from input-output examples, and
that recurring neural network you can think of as like
the CPU, effectively. And what we give this recurring
neural network is a huge memory store, a kind of
KNN memory store, that it can learn to
access and control. And this whole system is
differentiable from end to end. So the recurring neural
network can learn what to do through gradient descent. And really, that is then all
the components of a Von Neumann machine that you need,
except here it’s all neural and it’s all been learned. So that’s why we call it
the Neural Turing Machine because it has all the aspects
you need for a true Turing machine. So here’s a little
cartoon diagram of what the Turing machine does. And you can think of this
input tape, and then the CPU, which is this recurring neural
network that actually has LSTMs as part of it, and
then it’s trying to produce the right output. And then it has this huge
memory store to the side that it can learn to read and
write elements to, vectors to. Now, with this
kind of system, we can start moving towards
symbolic reasoning using these kinds of
neural systems, which is really one of the big holy
grails of what we want to do. And, of course, there’s
a classic problem in AI– many unsolved classic problems. One of the problems we apply
this Neural Turing Machine to has been inspired
by the Shrdlu class of problems, which are these
block worlds from the ’70s and ’80s. And the idea here
is to manipulate the blocks in some way
and answer questions about the scene. Like put the red pyramid
on the green cube. Or what’s next to
the blue square? And both manipulate this
world, and also answer question and answer about it. Now, we’re not ready yet to– Neural Turing Machines can’t
scale to the full complexity of the full Shrdlu problem. But we have cut it
down to a 2D version, a blocks world version,
where we can solve some quite interesting things. So we call this
Mini-Shrdlu, and it has aspects of Tower of Hanoi
and other problems in it. And the idea here is that
you’ve got this little blocks world that you’re
looking side on and all these different
colored blocks, and you’re given the
start configuration here on the left-hand
side and the goal configuration you want to reach. And what the system can do is
lift one block from one column and put it down on the
top of another column. That’s the only moves
you’re allowed to do. And it gets trained through
seeing many starting examples and end examples and doing trial
and error with reinforcement learning and improving
itself over time. And then, once it’s
done it’s training, we then test it on new start
positions and goal positions that it’s never seen before. And it has to try and
solve these problems in an optimal number of moves. So I’m just going to run
this little video which will show you, going from that
start position on the left to end up on the goal position. I think this one’s
about twelve moves. It’s actually a
pretty hard task to do in an optimum number of moves. It’s really hard even
for humans to do this. And so now it’s solving pretty
interesting logic puzzles. Also, what we’ve been
using Neural Turing Machines to do recently
is solve graph problems. Which, as you all know, are
a general class of problems. And we’ll be publishing
something pretty impressive, I think, in the later part
of this year on this topic to add to our archive paper that
we already published last year. Now, we’re also site
experimenting with language as well. And we’ve incorporated a
cut-down version of language into these Shrdlu tasks. And here, the Neural
Turing Machine is reading a set
of constraints that are given to it in
code that you can see at the bottom of the screen. So here, each of the
blocks are numbered, and there are some
constraints that you want to satisfy with
the goal configuration. So, in this case, block three
should be down from block five, four up from two, one up from
four, and six down from three. And so it reads this in,
character by character, remembers these
instructions, and then starts executing the actions. And then it solves
the puzzle, and this is the end position
that satisfies all those constraints. Another thing
we’re moving to now is, there are still challenges
to overcome in Atari, but we’re also starting to
move towards 3D environments. So we’ve repurposed
the Quake III engine and added modifications to it. We call it Labyrinth. And we’re starting to tackle
all kinds of navigation problems and interesting 3D vision
problems within this kind of labyrinth-like environment. So I’ll just roll the
video of this agent finding its way through
the 3D environment, picking up these green
apples which are rewarding, and then trying to find
its way to the exit point. And again, all of this behavior
is learned just through– the only inputs are
the pixel inputs, and it has to learn
how to control itself in this 3-D environment and
find its way around and build maps of the world. So here, for an agent
like that, we’re starting to integrate some
of these different things together– deep reinforcement
learning with memory and 3D vision perception. So as we take this
forward, we’re thinking as one of our
goals over this next year is to create a rat-level
AI, so an AI agent that’s capable of doing all
the things a rat can do. And, you know, rats
are pretty smart, so it could do quite
a lot of things. So we’re looking at
the rat literature, actually, for
experimental ideas, experimental tests that we
can test our AI agent on. So now I want to switch
to AlphaGo, which is also part of these big pushes
that we’re doing into going beyond the Atari work. So one of the reasons
we took on AlphaGo is, we wanted to see how
well these neural network approaches could be meshed
with planning approaches. And Go is really the perfect
game to test that out with. So this is the game of Go for
those of you who don’t play. This is what a board looks like. It’s 19 by 19 grid, and there’s
two sides– black and white– taking turns. And you can place your stone– your piece, which
is called a stone– anywhere on an empty
vertex on the board. Now, the history of Go has got
a long and storied tradition in Asia. It’s more than 3,000 years old. Confucius wrote about
it 2,000 years ago. And he actually talked about
Go being one of the four arts you need to master
to be a true scholar. So it’s really regarded
in Asia up there with poetry and
calligraphy and art forms. There’s 40 million
active players today and more than
2,000 professionals who start going to Go school
before they’re teenagers, from around the age of
eight, nine, or ten. They go to special Go schools
instead of normal schools. And although the rules of Go
are incredibly simple– in fact, I’m going to teach you how
to play Go in two slides in a minute– they actually lead to
profound complexity. One way of quickly
illustrating that is that there are more than 10
to the power 170 possible board configurations. That’s more than there
are atoms in the universe by a large margin. So the two rules are–
rule one, the capture rule. Stones are captured
when they have no free vertices around
them, and these free vertices are called liberties. So let’s take a
look at our position from an early part of
a Go game, and let’s zoom into the bottom
right of the board to just illustrate
this first rule. So here, you can
see this white stone that’s surrounded by the
three black stones only has one remaining free vertex,
one remaining free liberty. So if black was to play
there, it would totally surround that white stone,
and that white stone would be captured and
removed from the board. And actually, big
groups of stones can be captured in this
way, not just one at a time. Whole large groups can be
captured if you surround all of their empty vertices. So that’s the first rule. The second rule is
called the ko rule. And that states that repeated
board position is not allowed. So let’s imagine we’re
in this position now and it’s white to play. Now, white could
capture that black stone by playing here and taking
that black stone off the board. So now it’s blacks move and
you might be wondering, well, can’t black just capture
back by replacing that stone and taking white? So what happens if
black was to play this? And this is not allowed because
if black was to play back there and remove the white
stone, now you’ll see that this
position we’re in now is identical to the
position we started with. So that’s not allowed. So that black move
is not allowed. Black would have to
play somewhere else first to break this symmetry
and then can go back and recapture that stone. And that’s it. That’s the rules of Go. And the idea of Go
is that you obviously want to take your opponent’s
pieces by surrounding it. But, actually, the main
thing you are trying to do is wall off parts of empty
territory on the board. And then at the end of the
game, when both players pass, they don’t think they can
improve their positions any further, you count up the
number of territory you’ve got and you add the
prisoners that you’ve taken from your opponent. And the person with the
most points wins the game. So the rules of Go are
simple, but it’s pretty much the most profound
and elegant game I think that mankind
has ever devised. And I say that as
a chess player. You know, I think Go
is really the pinnacle of perfect information games. It’s definitely the
most complex game that certainly humans have spent
a significant amount of time mastering and play at a very
high professional level today. And because of this
huge complexity of Go, it’s been an outstanding
grand challenge for AI for more than twenty
years, especially since the Deep Blue match. And the other interesting
thing for us is that– and I’m going to come back
to this more in a minute– that if you ask top
Go players, they’ll tell you that they rely on their
intuition a lot to play Go. So Go really requires both
intuition and calculation to play well. And we thought that
mastering it, therefore, would involve combining
pattern recognition techniques with planning. So why is Go hard for
computers to play? Well, the huge complexity means
that brute force search is not tractable. And really, that breaks down
into two main challenges. Firstly, the search
space is really huge. There’s a branching
factor of more than 200 in an average position in Go. And the second point,
which is probably an even bigger
problem, is that it was thought to be impossible
to write an evaluation function to tell the computer
system who is winning in a mid-game position. And without that
evaluation function, it’s very difficult to
do efficient search. So I’m just going to unpack
these by comparing Go to chess, and you’ll see the difference. So in chess, in an
average position, there are about
20 possible moves. So the branching
factor in chess is 20. In Go, by contrast, as I just
mentioned, it’s more like 200. So there’s an order of
magnitude, a larger branching factor. Plus, Go games tend to
last two to three times longer than chess games. The evaluation function– Why is this so difficult for Go? Well, we still
believe, actually, that it’s impossible to
handcraft a set of rules to tell the system
who’s winning. So you can’t really create
a expert system for Go, for evaluating a Go position. And the reasons are, there’s no
concept of materiality in Go. In chess, as a
first approximation, you can just count up the value
of the pieces on each side and that will tell you
roughly who’s winning. You can’t do that in Go because,
obviously, all the pieces are the same. Secondly, Go is a
constructive game, so the board starts
completely empty and you build up the
position move by move. So if you’re going to try and
evaluate a position halfway through or at the
beginning of the game, it’s very difficult
because it involves a huge amount of
prediction about what might happen in the future. If you contrast that
with chess, which is a kind of destructive
game, all the pieces start on the board
and, actually, the game gets simplified as you
move towards the endgame. The other issue
with Go is that it’s very susceptible to
local changes, very small local changes. So even moving one piece around
out of this mass of pieces can actually completely change
the evaluation of the position. So Go is really a game
about intuition actually rather than calculation. And because the
possibility is so huge, I think it’s kind of
at the limit of what humans can actually cope with
and deal with and master. And, you know, I’ve talked to
a lot of top Go players now and when you ask them about when
they play a brilliant move why they played it,
they’ll just tell you actually or quite often
that it felt right, and they’ll use those words. If you ask a chess
grandmaster why they played a
particular move, they’ll usually be able to
tell you exactly the reasons behind that move. You know, I played this move
because I was expecting this, and if that happens, then
I’m going to do this. And they’ll be able to give
you a very explicit plan of why that move was good. And you can see
that Go definitely has a sort of
history and tradition of being intuitive
rather than calculating because it has notions of things
like the idea of a divine move. And actually, there are
some famous games in history that get names, and within those
games, there are famous moves. And those moves are
sometimes named as well. And if you talk to
a top Go player, they dream about one day, at
one point in their career, playing one of
these divine moves, a move so profound it’s almost
as if it was divinely inspired. And you can look that up online. They have some really
interesting stories from the Edo period in Japan of
these incredible games played in front of the shogun and
these divine moves being played, ghost moves. So how did we decide to tackle
this intuitive aspect of Go? Well, we turned to
deep neural networks. And, in fact, what we did is, we
used two deep neural networks. So I’m just going to take you
through the training pipeline here. We started off with
human expert data that we downloaded
about 100,000 games from internet Go servers
of strong amateurs playing each other. And we, first of all, trained
through supervised learning what we called a policy network. And this deep neural network,
what it was trained to do was to mimic the human players. So we gave it a position
from one of those games. And, obviously, we know what
the human player played. And we trained this
network to predict and play the move the human
player played. And after a whole
bunch of training, we could get pretty
reasonably accurate. We can get to about
60% accuracy in terms of predicting the move
that the human would play. But, obviously, we
don’t want to just mimic how human players
play, especially not just amateur players. We want to get better
than the human players. So this is where reinforcement
learning comes in. Where we then iterate
through self-play this policy network many millions of
times playing against itself and incrementally improving
the weights in that network to slowly increase its win rate. So after millions of
games of self-play, this new policy network
has about an 80% win rate against the
original supervised learned policy network. Then we freeze
that network and we play that network against
itself 30 million times. And that generates
our new Go data set. And we take a position from
each of those 30 million games. And, obviously, we
have the position, and we also know the
outcome of the game. We know who finally
won, black or white. And then, with
that much data, we were finally able to crack
the holy grail of creating an evaluation function. So we created this second neural
network, the value network, which is a learned
evaluation function. So it learned to take
in board positions and try and accurately predict
who is winning and by how much. So after all of this training,
which is a lot of compute power and training on that,
we end up finally with two neural networks. The policy network, which
takes a board position coders are trying to
[INAUDIBLE] as an input. And the output is a
probability distribution over the likelihood of each
of the moves in that position. So the green bars
here, and the height of the green bars
on the green board, represent the kind of
probability mass associated with each of the moves
possible from that position. And then, the second network is,
we get this value network here in pink. And, again, you take the
board position as an input. But, here, the
output of the network is just a single real
number between 0 and 1. And that indicates whether
white or black is winning and by how much. So if it was 0, that means white
would be completely winning. And 1, black would
be totally winning. And 0.5, the position
would be about equal. So we take those forwards–
but the neural networks are not enough on their own. We also need something
to do the planning. And for that, we turn to
Monte Carlo tree search to stitch this all
together, and it uses the neural networks to
make the search more efficient. So I’m just going to show you
how the search works here. So imagine that we’re in
the middle of pondering what to do in a particular position,
and imagine that position is at the root node of this tree
represented by the little mini Go board here. And perhaps we’ve done a
few minutes or a few seconds of planning already,
so we’ve already looked at a few different moves
represented by the other leaf nodes here. And what you do is, you’ve got
two important numbers here– Q is really the
current action value of the move, the estimate
of how good the movie is. And P is this sort
of prime probability of the move from the
policy network in terms of how likely it is a
human would play that move. And let’s imagine
we’re following the most promising
path at the moment that we’ve found so far
in the bold arrows here that are coming down,
and we end up at a node, at a position that we
haven’t looked at so far. So what happens here
is, we expand the tree. And we do that by first
calling the policy network to find out which moves are
most probable in this position. So instead of having to look
at 200 possible moves, all the different possible
moves in this position, we just look at the
top three or four that the policy network
tells us are most likely. And so that expands
the tree there. And then once we’ve
expanded the tree, we evaluate the desirability
of that path in two ways. One is that we call
the value network, and that gives us an instant
estimate of the desirability of that position. And we also do a second
evaluation routine using Monte Carlo
rollouts, so we roll out maybe a few thousand
games to the end of the games and then we backup
the statistics of that back to this node. And what we’ve found is, that
by combining these two valuation strategies, we can get a really
accurate evaluation of how desirable that position is. And then, of course, that’s
one of the parameters we experiment with is the
mixing ratio between what the rollouts are telling us
and what the value network is telling us. And as we improved AlphaGo,
we trusted the value networks more and more. So I think now, the lambda
parameter’s about 0.8 in favor of trusting
the value network. And when we started on
this around last summer, it was about 0.5. So then, once you have that, you
back the Q value up the tree. And then once you’ve run out
of time or you allocate a time, you basically pick the
move that has the highest Q value associated with it. So if we think about what these
neural networks are doing then for us in terms
of the search, you could think of it in this way. Imagine that this is the search
tree from the current position. It’s totally intractable. It’s really huge. What we do is, we call
the policy network to really cut down the
width of that search, to narrow that down. And the value network really
cuts the depth of the search. So instead of having
to search all the way to the end of the game and
collect millions of statistics like that to be even
reasonably accurate, we can truncate that data search
at any point we like and call the value network. So once we built
the AlphaGo system, it was time to evaluate how
strong it was and test it out. So the first thing
we did was play it against the commercially
best available Go programs out there. The two best ones are
Crazy Stone and Zen. They’ve won all the recent
computer Go competitions of the last few years. And they’ve reached to
about strong amateur level. So in Go, you start off in
this thing called “cue” K-Y-U and you go down in
score as an amateur. And then, as you get
better as a strong amateur, you get a dan rating
which goes from one dan to about six or seven dan. And then, finally, you
can become professional, and then the dan ratings
start again from one to nine. So really, these programs
were about the strength of strong amateurs,
a strong club player. And AlphaGo did incredibly
well against them. So in the 495 matches we
tried, it won all but one. And it could do a 75% win rate
against these other programs, even when they were given
a four-move head start, which is huge in Go. It’s called a
four-stone handicap. And this graph here
that I’m showing you is just the single machine
version of AlphaGo, and it was even stronger on
the distributive version. And these rankings are quite
subjective, these Go rankings. So we actually created a
numerical ranking, an Elo ranking, that’s on the y-axis
on the left-hand side which is based on chess Elo ratings
and is purely statistical in terms of the win rates
of the different programs. And what we found is that
a gap of about 200 Elo points, or 250 Elo points,
translates to about an 80% win rate. And AlphaGo was more than a
thousand Elo points better than the other best programs. And so, this was
back in October, so this is not the
most recent version. And we beat all of
these other programs, so it was time to test ourselves
against some of the world’s top human players. So what we did back in
October with challenge this lovely guy called Fan
Hui who’s now based in France but was born and
grew up in China. He’s the current reigning
three-time European champion. He’s a two dan professional. He started playing go at
seven and turned professional in China at age 16. It’s very difficult to
turn professional in China, so he was a top, top player
before moving to France, and now he coaches the
national French team. And we change in October,
and this is what happened. FAN HUI: I think after first
game maybe it don’t like fight, it like play slow. So it’s why begin
second game, I fight. It do mistake sometimes. This gives me confidence. I think maybe I’m right. It’s why for another game,
I fight all the time. Now it’s complicated,
now it’s complicated. But I lose all my games. DEMIS HASSABIS: So AlphaGo won
five nil, much to our surprise, and became the first program to
ever be a professional at Go. And if you ask AI experts,
even the top programmers of these other programs,
even sort of a year before, they were predicting
this moment would be at least another decade away. So it’s about a decade
earlier than the top experts in the field expected,
and certainly a decade earlier
than the Go world thought it was going to happen. AUDIENCE: Was this the
distributed version or is it single version? DEMIS HASSABIS: This was
the distributed version. And this story ends well though,
he looks distraught here, but he ended up hiring him
as a consultant on our team after this. And he joined this side
of the program afterwards. But one interesting point about
this, which is interesting, is that he then came into
the office for about a week, every month, to make sure– he was part of our making
sure we went over fitting in our self play, by carrying
on pitting our wits against him. And he felt that his
play had improved by playing against AlphaGo. And actually he went from
ranked about 600 in the world at that time in October, to in
January, February, like three or four months later, being
ranked 300 in the world. So it was– and he’ll tell you
that it really opened his mind, he said, it freed my mind from
the constraints of the 3,000 years of tradition to think in
a different way about the game. So it’s very interesting. So again if you want to read
the technical details of this, this is another Nature
paper, front cover, that was a couple of months ago. And I think it’s caused a
really big storm in the AI world and the Go world. So then it was
time to take a kind of ultimate challenge, which
was just a few weeks ago now. We started to challenge
Lee Sedol, who is an absolute legend at the game. I call him like the
Roger Federer of Go. And he’s been indisputably the
best player of the past decade, and he’s won 18 world titles. And he’s also famed
for his creative style and creative brilliance, so he
was the perfect player for us to pit our whit’s against. And we played him in early
March for a million dollar first prize in Korea. Now just before I
go to the results, I just want to side
note on compute power here, which I always
get asked about. So we use roughly the same
compute power for this match as we did for the Fan Hui match. So there’s around about
50-60 GPUs worth of compute. And you might ask, well,
why don’t we just use more? Well, actually this–
asymptotes quite click quickly, actually, the strength
of the program with more compute power. And one of the reasons is
it’s actually quite hard to paralyze MCTS algorithms. They work much better,
more efficiently, if you do them sequentially. And if you batch them across
lots and lots of GPUs, you don’t actually get that much
more effectiveness out of it. And one measure of that is
that the distributed version, surprisingly, probably, I think
maybe to many of you, only wins about 75% of the time
against the single machine version. So we play the match,
and many of you will see that we
actually won 4-1. And it was pretty
outstanding to us, because even the day
before the match, they interviewed
Lee Sedol, and he was saying he was confident
he was going to win five nil. And the whole Go
world thought there was no chance we could win. Obviously, they we’re looking
at the Fan Hui matches and trying to estimate– maybe
we’d improve 10-20% since then, and that would stand no
chance against Lee Sedol. But actually in the five months
that we had between the two matches, the new
version of AlphaGo could beat the old version
of AlphaGo 99.9% of the time. So it is pretty
astoundingly much stronger. And actually it’s an amazing
experience out there, and I’ll talk about the culture
significance in a second, but one very nice thing is
the president of the Korean Go Association, in the
middle there, awarded us and AlphaGo with an
honorary 9-dan certificate. So it was really
beautiful, we have that framed up on the wall– for it’s creative play. And I just want to touch
on those themes actually about the creativity
and intuition. And that’s one of the reasons
I explained to you how to play Go, because I want
to just try and explain to you some of the significance
of what AlphaGo did. Now Chess is really my
main game that I play, but I played Go well enough
to be able to appreciate what’s going on. Now probably the best
move that AlphaGo played in the whole
five game series, and maybe the Go world will
decide to name this move, is move 37 in game 2. And this is the position– AlphaGo was black, and
AlphaGo decided to play here. It’s called a shoulder hit
move, this move here in red. And it’s funny– I’m going to
try to explain to you why this is so amazing, this
move, by telling you a little bit about Go. So there’s two key lines in Go,
the third and the fourth line of the board, that’s the
critical lines in Go. So here’s the third line. Now if you play a stone on the
third line, what you’re really trying to do, you’re
telling your opponent is I’m interested
in taking territory on the side of the board. That’s a third line move means. A fourth line move, by
contrast, is the fourth line. What that means if you
play on the fourth line is I’m trying to take
influence and power into the center of the board. OK, so you’re going
to try and influence the center of the board,
and radiate that influence across the board. So that’s towards the center. And the beauty of Go, and
I think one of the reasons why it ended up evolving
to 19 by 19 board, is the playing on the
third and fourth lines and going for
territory or influence, is considered to be
perfectly balanced. Right? So the territory that you get
for playing the third line is about equal to
what the opponent gets by playing on the fourth
line, and getting power and influence into the center. The idea is that that influence
that you get and power you get, you store up for
later and eventually that will give you territory
somewhere else on the board. So that’s the classic 3,000
years of history of Go, and yet AlphaGo played
on the fifth line, to take influence toward
the center of the board. And so this is kind
of astounding– goes against 3,000
years of history of Go. And just to show you how
astounding that was to the Go fraternity, I just
want to show you a clip from the commentary,
the live commentary. And so we had
commentating live, there were lots of
commentary channels. There’s actually 14
live channels in China, it’s all free national
TV stations in Korea, but also we had an
America– we had an English Channel via YouTube. And we had this
fantastic commentator called Michael Redmond who
was the only Westerner ever to get to 9-dan; he is the
only English-speaking person to ever get to 9-dan. And look at his reaction
to this move 37. So just to show you,
so what turned out was about 50 moves
after this move, that move here
influenced the fight over in the bottom left corner. Right? About 50 moves later. So you can’t calculate
that, because there’s too many possibilities. That was the influence of
the power of that move. So this is Michael
Redmond seeing this move. MICHAEL REDMOND: The Google
team was talking about is this evaluation– the value of– DEMIS HASSABIS: He doesn’t
even know where it is. CHRIS GARLOCK: That’s
a very surprising move. MICHAEL REDMOND: I
thought it was a mistake. CHRIS GARLOCK: Well, I
thought it was a click miss. MICHAEL REDMOND: If we were
online Go, we’d call it clicko. CHRIS GARLOCK: Yeah,
it’s a very strange move. Something like this would
be a more normal move. DEMIS HASSABIS: So I think
he means a miss click as opposed to a click miss. So he was thinking that our
operator, the person actually playing the moves for
AlphaGo, [INAUDIBLE] Wang is the lead programmer,
had actually entered the move wrong into the
machine, because it’s that surprising of a move. And this is what Lee
Sedol thought of it, he disappeared to the
bathroom for 15 minutes. So that’s his empty seat there. No one knew what
happened to him. So he just disappeared
for 15 minutes. So maybe it will be called the
face washing move or something, because they’re usually named
after something that happened. And actually later when we
investigated the statistics behind this, we found
that the policy network gave prior probability of
this move as less than one in 10,000. So AlphaGo overcame–
so it’s not just repeating what it’s seeing
in these professional games, because it would never have
thought to play this move. Later on, some of the
9-dan pros commented, it’s not a human move,
no human would ever have played this move. So it’s just really kind of
an original move, if you like. And one thing that we think
is going on here is that– AUDIENCE: Do you have
data that [INAUDIBLE]?? DEMIS HASSABIS:
No, we can’t yet. We need to build more
visualization tools to actually do that, we’re
building that at the moment. It’s pretty hard to know why– to explain why it’s
done that, for us. So here, in terms of
these surprising moves, I think this shows some
kind of originality. And what it might mean is–
when I talked to Michael Redmond about this is that
he said that– AlphaGo has this
very light touch. It doesn’t commit
to territory early, and what we think is going
on is that AlphaGo really likes influence in the
center of the board. And he likes it
so much, and it’s so good at ultimately making
the influence pay later on in the game, that it
actually thinks fifth line’s influence is good enough. So this may cause a whole
rethink in the game of Go as to what’s an
acceptable trade. Then, I must say, the other
really spectacular move was played by Lee
Sedol in game 4. So we won the first three
games, and then Lee Sedol came back strongly, because
he’s an incredible games player. I’ve met many of the
best games players in the world, Garry
Kasparov and others, but I put Lee Sedol at
the top of all the games players I’ve met in terms of his
creativity and fighting spirit. And in game 4, he won game 4. And he did it by playing this
incredible move, move 78. I haven’t got time to go
into why this is so special, but basically when we look
to the data on this as well, we found that AlphaGo thought
the probability of this move was also less than
one in 10,000. So it was totally
unexpected for AlphaGo, and that meant all the
pondering and search it’d done and up to
the move prior to this ended up having
to be thrown away. So it basically
had to start again as soon as this move
happened, and for some reason this caused some
misevaluation in the value net, which we’re
still investigating what happened there. So cultural impact of
this match was huge. We had 280 million
viewers, that’s more than like the Super Bowl. And 60 million viewers just
in China for the first game. We were being stopped
in the streets in Korea, and it was pretty crazy. And 35,000 press articles
literally every day, it was front page of all
the newspapers in Korea. And the thing I
liked most, actually, was that it popularized
Go in the west, there was a world wide
shortage of Go boards for the last few weeks after– still now, I think. If you’re trying to
order a Go board, you might have trouble
because of this game, which is fantastic to see. The press coverage
was just insane. These are pictures
of the press room, just a scrum number
of live TV cameras, 50 live TV cameras in the back. It was on all the
national TV stations, jumbo screens in the shopping
districts, it was pretty crazy. It was amazing to see. I think for Korea it was
the perfect match up of– they love technology, they
love AI, and they love Go, so for them it was
the perfect storm. And this is one interesting
thing that I want to show, is the rate of
progress of AlphaGo. So we started this project only
about just over 18 months ago, and the progress has been
relentless from the beginning. We found that these techniques,
which can improve themselves, and you can create more data,
and then train new versions, and then that can create more
better, high quality data. That virtuous
cycle had delivered about a one rank
improvement per month, which is pretty astounding. And the interesting thing
is we haven’t really seen any asymptote yet, so
we’re quite anxious to see how far this can go. And what is the optimal play, or
to get near optimal play in Go. How much further is there to go? And actually, I think, most
of the Go professionals are really interested in
this question as well. And I’m pretty sure that,
just like with Fan Hui, when we ultimately release AlphaGo
in some way to the public, I think it will improve
the standard of Go, and bring in whole new ideas. So after the heat of battle
I had a great dinner catch up with Lee Sedol, who’s also
an amazing and lovely guy, and we talked about the match. And he told me that it
was one of the greatest experiences of his life, and
the fact that it had totally– just the five
games he played had made him totally rejuvenated
for his passion for Go, and the ideas and creativity
about what could be done. AUDIENCE: How many games
a day does the machine play against itself? DEMIS HASSABIS: It’s playing
a few thousand a day, depends on how many
machines we use. AUDIENCE: So [INAUDIBLE] is
when human experience is at? DEMIS HASSABIS: Potentially. I mean these pros play
several thousand games a year, probably about 1,000-2000
when they’re training, so it’s quite a lot. Plus they read a lot about
all the ancient games. AUDIENCE: Do you think
the strong culture in Go has forced human play
into a corner instead of– DEMIS HASSABIS:
I don’t think so, because there are three
different schools of Go, the Japanese, the
Koreans, and the Chinese. And they’re very competitive
against each other. And they approach
the game differently, and I think that that creative
tension has forced them out of local maxima, I would say. So just to compare
Deep Blue with AlphaGo, just to be clear again
about the differences. So deep blue, again
not to take away from the immense
achievement that it was for its time, absolute
incredible, but it used handcrafted chess knowledge. By contrast, AlphaGo has
no handcrafted knowledge, all the knowledge it has it’s
learned from expert games and through self-play. Deep Blue did full width
search, pretty much looked at all the
alternatives, and that’s why I needed to crunch 200
million positions per second. By contrast, AlphaGo uses
these two neural networks to guide the search in a
highly selective manner. And that means we
only need to look at 100,000 positions
per second to deliver this kind of performance. So I just want to finish
by a couple of words on intuition and creativity. And this may be a little
bit controversial, so I don’t want to– I’m not saying this is the
full truth of the matter, or even fully
encompasses on everything to do with intuition
and creativity, but I think these are
interesting thoughts. So we have to sort of
define a little bit what do we mean by intuition? And one way I’d like to–
at least the way I think about it for Go, is
this implicit knowledge that humans have acquired
through experience of playing Go, but
it’s not consciously accessible or expressible. Certainly not to communicate
to someone else, but not even to themselves. But we know this
knowledge is there, and we know it’s a
very high quality, because we can
test the knowledge and verify it behaviorally. Obviously it’s the output of
the moves that the player plays. Secondly, what is creativity? And I’m sure
everyone in this room has their own pet definition. But I think, again, it
definitely encompasses the ability to synthesize
the knowledge you have and use that knowledge that
you’ve accumulated to produce novel or original ideas. And I think certainly,
at least within, albeit the constrained
domain of Go, I think AlphaGo has pretty
clearly demonstrated these two abilities. And obviously while playing
games is a lot of fun, and I believe the most efficient
way to go about AI development, obviously that’s not
the end goal for us. We want to apply the
technologies that we’ve built here as part of
AlphaGo, that we believe are pretty general purpose,
extend them, use components of them, and apply them to have
impacts on big challenging, real world problems. And we’re looking at
all sorts of areas at the moment, like
health care, robotics, and personal assistance. So I just want to thank
the amazing AlphaGo team who did all
this incredible work, really incredible engineering
and research efforts. And also again I
just want to stress all this work I’ve
shown you today is really less than a
tenth probably of the work that we’re doing at
Deep Mind, and if you’re interested in seeing
all of our publications, they all are on our
website, and there’s about 70-80 publications there
now of all of our latest work. And of course, I must mention,
if you want to get involved, we are hiring both research
scientists and software engineers. Thanks for listening. [APPLAUSE] DEMIS HASSABIS: Yeah? TOMASO POGGIO: You have– DEMIS HASSABIS: Yeah, go for it. Do you want to use this? TOMASO POGGIO: For a second. Thank you. So let’s have a
couple of questions. Anybody? Yeah? OK. Let me– AUDIENCE: If groups of people
play together could they beat AlphaGo? DEMIS HASSABIS: I
think the question was, can groups of players
together beat AlphaGo? Maybe. So that’s something that
we might play in the future actually, is a group of top
professionals versus AlphaGo. And it’d be quite
interesting to see, because it’s known that
some of these top players are really good at opening
or middle game or end game, and you could
switch between them. And I’m sure they’d be
a lot stronger together. So maybe we’ll do
that towards the end of the year or next year. Yes, behind you. Yeah. AUDIENCE: You mentioned
earlier using visualization to better understand
why AlphaGo– DEMIS HASSABIS: Yeah. AUDIENCE: [INAUDIBLE]
Can you talk about that? DEMIS HASSABIS: Yeah– AUDIENCE: Can you
repeat the question? DEMIS HASSABIS:
Yes, so the question was using visualizations
to understand better how AlphaGo works. So we think this is a huge issue
with the whole deep learning field actually, how can
we better understand these black boxes that are
doing these amazing things, but quite opaquely. And I think what we need is
a whole new suite of analysis tools and statistical tools and
visualization tools to do that. And again I look to my
neuroscience background for inspiration,
for those of you to fMRI or that
kind of analysis, I think we need the
equivalent of kind of SPM for a virtual brain. So we actually have a project
called virtual brain analytics, which is around building
these kinds of tools so that we can better
understand what representations these networks are building. So hopefully in
the next year or so we’ll have something much
more to say about that. Yeah? AUDIENCE: So you
mentioned that Deep Blue used sort of human
crafted moves, which sort of helped them. And then AlphaGo
didn’t have that, but it still learned from moves
and experiences of the game. DEMIS HASSABIS: Yeah. AUDIENCE: Is there any sort of
hope for completely reinforced learning– DEMIS HASSABIS: Yeah. AUDIENCE: In Go or
even in other agents. What is the– DEMIS HASSABIS: Yeah, so it’s a
really good question, actually. The question is, can we do away
with the supervised learning part, and just go all the way
from literally random, using reinforcement
learning, up to expert. We plan to do this
experiment actually. So we think it will be fine,
but it will take a lot longer to train, obviously,
without bootstrapping with the human expert play. So until now, we’ve
been just concentrating on trying to build the strongest
program we can in the fastest time. So we haven’t had time
to experiment with that, but there are a number of
experiments like that that we want to go back to and try. I will say that a very smart
master student from Imperial College in London did do
this for Chess from scratch, and they got to International
Master Standard. So it seems like this is
definitely sort of possible. And actually we’ve
hired him now, and so he may be the
person that would end up– Matthew Lai, he’s called– that may end up looking
at this as well. So maybe someone
from near the back. Yeah? AUDIENCE: So [INAUDIBLE] DEMIS HASSABIS: Sorry? AUDIENCE: [INAUDIBLE] DEMIS HASSABIS: Yes. AUDIENCE: That
algorithm [INAUDIBLE].. DEMIS HASSABIS:
Yes, potentially. So we’re thinking about adding
learning into that part too. And also maybe there
are ways of doing away with some of that
[INAUDIBLE] search too, there are other ways
of doing that search, more like imagination
based planning. So we’re thinking
about that as well. Maybe back there, yeah? AUDIENCE: [INAUDIBLE] DEMIS HASSABIS: So I think the
question, if I understand it correctly, is that if agents
play games well, is that AI? Is that what you’re
asking, or is that– AUDIENCE: Yes. Can AI [INAUDIBLE] DEMIS HASSABIS: Well,
I mean, obviously, that’s our thesis is
that this will work. But I think you have to be
careful how you build the AI. There are many ways you
could build AI for games that would not be generalizable. So I think that’s been
the history– that is what generally, for
commercial games, which I’ve also helped make lots
of commercial games, which have AI in them. And usually the M built
AI is a special case, usually it’s finite
state machines or something for the game. And it utilizes all kinds
of game state information that if you were
just using perception you wouldn’t have access to. So I think you
have to be careful that you use games
in the right way, and you treat the agent
really as a virtual robot with all of that that
entails, in terms of what it has access to. And I think as long as
you’re careful with that, then it’s fine. And one way we
enforce this is we have a whole separate
team, an evaluation team, of amazing programmers,
most of them are X Games programmers,
who build the environments and the APIs to the
environments and so on. And they’re entirely separate
from the algorithm development teams. And the only way the AIs
can interface with the games is by these very thin APIs. So we know there’s no way,
even if the researcher was to be laxes with this,
or lax with this, they can access things they’re
not supposed to– the agents. Pick from the left. So we’ll just go
around, any questions? Yeah, here. AUDIENCE: Why does
AlphaGo improve? Is it [INAUDIBLE],,
self-training, or do you tweak it? DEMIS HASSABIS: Well,
we’re doing both, actually. So there’s self-training
in terms of– they’re self-training
producing high quality data, it’s tweaking itself through
this deep reinforcement learning, and we’re
also actively doing tons of research in terms
of new architectures or parameters and other things. So it’s all of the above. So we really threw
everything at it.

You may also like...

14 Responses

  1. ryanshaver says:

    Thank you for uploading this!

  2. InfiniteCyclus says:

    Great sound!

  3. Luis Guillermo Restrepo Rivas says:

    Kudos to Demis Hassabis !

  4. 조유준 says:

    특이점이 온다! Singularity is coming!!

  5. Grandfather_Din_Racket says:

    @13:05 It would be easy for me to "get a better score" in life using the same program. Simply attach a large value for "obtaining optimal food" and "obtaining optimal sex" (goals easily met by having more money). Then, "obtain optimal health" (taking advice of Ray Kurzweil, undergoing checks at Frontier Medical Clinic and Craig Venter's HLI, and assisting Aubrey de Grey and other SENS researchers and cyborgists like Kevin Warwick by optimally funding them). Any AGI, to win in reality, simply treats finances as "score value," and then treats the prior "higher level objectives enabled by money" as alternating "score values." That's (essentially) all the thalamus does: alternates between "highest value patterns" modeled by the neocortex.

    I disagree with placing the AGI into a simulated world, even initially. It should have access to the simulated world, but should first encounter the simulated world the way people do, through television screens. Why? Because adapting to actual physical sensory input in a robot body might be very harsh, and that harshness would then come at a very advanced level, after the AGI had been alive a long time. We don't want superhuman minds to be "very uncomfortable in material reality." Think about that. Imagine a baby addicted to heroin that grows up getting steady doses of heroin until it's an adult. Then, take away the heroin, and it experiences muddy, dull, painful, awkward-due-to-gravity-and-frailty REALITY.

    Far better to have the intelligence begin in reality, adapt to it, and then choose additional options / enhancements on its own.

    This is why it's important to enter the problem with a voluntaryist perspective: That's the optimal human philosophy and "method of interacting with others."

    Hey, I'm an anonymous person on the internet, would I lie to you?

  6. Grandfather_Din_Racket says:

    @34:03 "So, after millions of games of self-play" …LOL (So, after millions of hours of enforced masturbation, we're ready to unleash our AGI on the world. What could go wrong?) LOL

  7. Grandfather_Din_Racket says:

    @44:30 Wonder if there will be a "Go pattern recognition module" in the brain, in Artilects.

  8. Frederic Mailhot says:

    Notwithstanding the amazingness of DeepMind (and Hassabis) "Nothing impressive had been done in RL" is a bit of a dickish and dismissive thing to say, given the development and pretty resounding success of TD-Gammon over 20 years ago.

  9. Akshay Aradhya says:

    Makes breakthroughs in science,
    but cant make a video with stereo sound properly

  10. Wes Brinsfield says:

    Audio is horrendous. Anyone find another source for this talk?

  11. Mehmet Yeşilyurt says:

    I'm curious if these networks (value and policy) are to be built for chess, would they beat the best narrow AI engines like Stockfish, Houdini or Komodo? How would this value network be compared with the evaluation functions of these chess engines? I'd love to watch such a competition. Narrow AI chess engine vs Neural Net Chess Engine

  12. nobiani says:

    That's the singularity, right?

  13. Micheal Bee says:

    You guys do great work. This was an interesting lecture the first time I heard it. Is your epsilon set to 0?

  14. Dmitriy Salikhov says:

    Not undermining the feat of beating humans in Go, how is it all related to AGI?

Leave a Reply

Your email address will not be published. Required fields are marked *