Bitter lesson

The bitter lesson is a claim in artificial intelligence that, in the long run, simpler systems that can scale with available computational power will outperform more complex systems that integrate domain-specific human knowledge, because they take better advantage of Moore's law. The principle was proposed and named in a 2019 essay by Richard Sutton^[1] and is now widely accepted.^{[citation needed]}

The essay

Sutton gives several examples that, in retrospect, illustrate the lesson:

Game playing. In chess, the Deep Blue system that became the first computer opponent to defeat a world champion relied on a relatively simple alpha-beta search algorithm that scaled up by applying large amounts of specialized hardware. This defeated previous attempts to exploit the unique structure of chess or to include grandmaster knowledge directly. Likewise in the game of Go, the AlphaGo algorithm that surpassed human performance relied on much less on expert skill at the game itself than previous generations of AI, and was further surpassed by AlphaGo Zero that removed human expertise completely and trained only by self-play.
Speech recognition. Approaches based on training a hidden Markov model with large numbers of speech samples consistently outperformed the hand-crafted approaches of the 1970s, and deep learning has continued this trend.
Computer vision. Algorithms that were assumed to approximate the human visual system (such as explicitly encoded edge detection or detecting high-level features with SIFT) were outperformed by convolutional neural networks that make far fewer assumptions about the nature of visual perception.

Sutton concludes that time is better invested in finding simple scalable solutions that take can advantage of Moore's law, rather than introducing ever-more-complex human insights, and calls this the "bitter lesson". He also cites two general-purpose techniques that have been shown to scale effectively: search and learning. The lesson is considered "bitter" because it is less anthropocentric than many researchers expected and so they have been slow to accept it.

Influence

The essay was published on Sutton's website incompleteideas.net in 2019, and has received hundreds of formal citations according to Google Scholar. Some of these provide alternative statements of the same principle; for example, the 2022 paper "A Generalist Agent" from Google DeepMind summarized the lesson as:^[2]

Historically, generic models that are better at leveraging computation have also tended to overtake more specialized domain-specific approaches, eventually.

The principle is further referenced in many books on artificial intelligence. For example, From Deep Learning to Rational Machines draws a connection to long-standing debates in artificial intelligence, such as Moravec's paradox and the contrast between neats and scruffies.^[3]

Other work has looked to apply the principle and validate it in new domains. For example, the 2022 paper "Beyond the Imitation Game" applies the principle to large language models to conclude that "it is vitally important that we understand their capabilities and limitations" in order to "avoid devoting research resources to problems that are likely to be solved by scale alone".^[4] In 2024, "Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings" looked at further evidence from the field of computer vision and pattern recognition, and concludes that the previous twenty years of experience in the field shows "a strong adherence to the core principles of the 'bitter lesson'".^[5]

References

^ Sutton, Rich (March 13, 2019). "The Bitter Lesson". www.incompleteideas.net. Retrieved September 7, 2025.
^ Reed, Scott (2022). "A Generalist Agent". Transactions on Machine Learning Research (2834–8856). arXiv:2205.06175. Retrieved September 7, 2025.
^ Buckner, Cameron J. (11 December 2023). From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence. Oxford University Press. doi:10.1093/oso/9780197653302.001.0001. ISBN 9780197653302.
^ Srivastava, Aarohi. "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". The Fourteenth International Conference on Learning Representations.
^ Yousefi, Mojtaba; Collins, Jack. "Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings". Proceedings of the 1st Workshop on NLP for Science (NLP4Science). Association for Computational Linguistics. pp. 175–187. Retrieved 7 September 2025.

[1] Sutton, Rich (March 13, 2019). "The Bitter Lesson". www.incompleteideas.net. Retrieved September 7, 2025.

[2] Reed, Scott (2022). "A Generalist Agent". Transactions on Machine Learning Research (2834–8856). arXiv:2205.06175. Retrieved September 7, 2025.

[3] Buckner, Cameron J. (11 December 2023). From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence. Oxford University Press. doi:10.1093/oso/9780197653302.001.0001. ISBN 9780197653302.

[4] Srivastava, Aarohi. "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". The Fourteenth International Conference on Learning Representations.

[5] Yousefi, Mojtaba; Collins, Jack. "Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings". Proceedings of the 1st Workshop on NLP for Science (NLP4Science). Association for Computational Linguistics. pp. 175–187. Retrieved 7 September 2025.

[1]

[2]

[3]

[4]

[5]

Bitter lesson

The essay

Influence

References

Portal di Ensiklopedia Dunia