Skip to content

Richard Sutton

Joe Rasmussen edited this page Jan 17, 2026 · 1 revision

Sutton is a key bloke in the reinforcement learning (RL) approach to AI. He is also based super-close, physically, to @Madusha Kumarasiri, @Hemal Ekanayake, and @Hasitha Chinthaka at the University of Alberta, Edmonton

University of Alberta directory record for Sutton

Sutton’s personal assistant is Beverly Balaski, balaski@ualberta.ca

Dwarkesh Patel interview of Richard Sutton - are LLMs a dead end?

Sutton is a winner of the Turing Prize

Sutton is studying/has studied:

  • Reinforcement Learning

  • AlphaGo and ZeroGo

  • Markov Decision Process

  • Temporal Credit Assignment

Notes

The Bitter Lesson, 2019

In a classic two-page paper, Sutton makes a case that, in 70 years of AI research, models that use input from bodies of existing human knowledge have repeatedly been dead-ends. He gives examples where this approach has been out-competed by models that interact directly with whatever problem they have in their reward function (Reinforcement Learning) and use compute to figure out an approach for themselves.

Part of the argument is temporal: That, over the same time period time that a human team would spend trying to systematise a body of human knowledge and feed it into a computer, you could just let the computer have a direct crack at the problem from zero pre-training (ie AlphaZero). Comparing these two approaches, Sutton argues that Moore’s law has repeatedly delivered the second approach to the finish line faster. This is the ‘bitter lesson’ … the depressing effect on all those teams that spent a decade or two of their lives trying to work out how to stuff human knowledge into a box, only to be overtaken by an AI-zero novelty. Such teams typically end up accusing their opponent of cheating!

Super-interesting thesis. @Joe-Rasmussen' thoughts:

  1. Joe is reminded of the example in Richard Dawkins' Greatest Show on Earth, where a team led by John Endler conducts a beautiful, subtle, set of experiments that tease-out the interplay of sexual selection and natural selection in guppies. The example demonstrate the multiple layers of abstraction that we might call ‘intelligence’

    • Ender figures out some things about guppies, but that’s not his real objective …

    • … his real objective is to to use the guppy example to provide an eloquent elucidation of the interplay of sexual and predatory selection pressure

      • At the first level of abstraction of that, Dawkins is employing the example in his irritating campaign against religion

        • And at even another level, Dawkins is making a kind of Karl Popper observation about knowledge itself - the whole institutional framework of getting it, probing it, further tweaking the good bits and discarding the rubbish
    • In this ‘Dawkins’ case, what would the AI-Zero’s reinforcement learning reward mechanism be? “Write books that popularise science and sell a million copies?”

  2. I’d love ask/discuss with Sutton: What if we can turn the internet into a neural net? The Village Links project has (A) a network of computational processing units (including humans) (B) connections that form and break dynamically (C) connection strengths.

  3. Given point (2) above, what is the reward function?

  • I want to argue that we are misled to think about this question in terms of the macroscopic entities that we observe … particularly AIs and people, but in general any entity. In the case of the people we know we are much better served to think about their reward functions in terms of selfish genes … and that the people are just macroscopic emergent properties of those selfish genes.
  • Now surely in this context we can see that the other entities - AIs, servers, databases and so on are just emergent macroscopic outputs of selfish memes … and indeed in this context we can see that the gene is a special case of a meme.

Clone this wiki locally