In preparation for my teaching duties, I found myself revisiting a seminal piece in the realm of AI research: The Bitter Lesson, authored by Richard Sutton in 2019. This led me to ponder which of his insights would stand the test of time and if any aspects might appear misguided today. At the end of this discussion, I will delve into the economic ramifications of his arguments.
Sutton draws from a rich tapestry of AI history to elucidate a âbitterâ truth: researchers often mistakenly believe that the next leap in intelligence will stem from specialized human expertise. Recent developments indicate that methods leveraging computational scaling consistently outperform those dependent on human knowledge. A prime illustration can be found in the realm of computer chess, where brute-force strategies on specialized hardware eclipsed knowledge-driven techniques. Sutton cautions that many in the field are reluctant to embrace this lesson, as embedding knowledge feels gratifying; however, genuine breakthroughs arise from the relentless expansion of computational power. In AI, scaling translates to enlarging models and training them on more extensive datasets with increased computational resources.
The essence of The Bitter Lesson lies not in any single algorithm but in the necessity of intellectual humility: advancements in AI have stemmed from the acknowledgment that persistently scaled, general-purpose learning surpasses our best efforts to hard-code intelligence. The significance of Sutton’s assertions extends beyond theoretical discourse, particularly as we navigate what has been termed âThe Scaling Eraâ by Dwarkesh Patel, a period that shows no signs of abating.
Guests on EconTalk have entertained extreme predictions, ranging from AI saving humanity to its potential destruction. Such drastic forecasts inherently presume that AI capabilities will continue to advance. While AI has indeed made rapid strides since Sutton’s 2019 insights, there is no natural law dictating that this trend must persist. Some skeptics even argue that AI capabilities may be plateauing, citing persistent hallucinations in advanced models as evidence of their limitations.
If scaling is indeed the pathway to greater intelligence, then we might expect AI to exceed expectations as we augment hardware resources. This hypothesis is currently under examination, with projections suggesting that US private investment in AI may surpass $100 billion annually, marking one of the largest technological gambles in history. Let us scrutinize Suttonâs thesis in the light of recent advancements.
Three compelling pieces of evidence support Sutton’s assertion regarding the efficacy of scaling. First, the evolution of game-playing AI serves as a pristine natural experiment. AlphaZero mastered chess and Go through self-play, devoid of human strategies or openings, and outperformed prior systems that relied on domain-specific expertise. Its triumph was rooted in scale and computational power, precisely as Sutton predicted.
Secondly, the field of natural language processing (NLP), which focuses on enabling computers to comprehend and generate human language, mirrors this trend. Earlier NLP systems leaned heavily on linguistically informed rules and symbolic structures, whereas OpenAI’s GPT-3 and its successors utilize generic architectures trained on vast datasets and immense computational resources. Performance enhancements correlate more consistently with scale than with architectural ingenuity.
The third example lies in the domain of computer vision. Techniques that involved manually engineered feature pipelinesâwhere programmers crafted algorithms to detect edges and shapesâwere sidelined once convolutional neural networks (inspired by the visual cortex) could be trained at scale. Accuracy in computer vision has improved significantly as both datasets and computational resources have increased.
While Suttonâs argument pertains to the scalability of methods, this scalability becomes evident only when capital investments alleviate computational constraints.
The pace of AI advancement reflects not merely technological potential but also the unprecedented mobilization of financial capital. The average individual using ChatGPT for mundane tasks might not fully grasp the significance of âscaling.â One reason for underestimating the speed of progress may stem not only from a misunderstanding of the technology but also from a failure to anticipate the substantial influx of funding.
This situation is akin to the Manhattan Project. People doubted the feasibility of the project not because it contradicted physical laws, but because it seemed prohibitively costly. Niels Bohr famously remarked it would require âturning the whole country into a factory.â And yet, we achieved it. We are repeating that process today, as we transform the country into a hub for AI innovation. Without this financial backing, progress would be considerably slower.
However, neither doomsayers nor utopians will find their prophecies fulfilled if we are nearing the limits of either the power of scaling or our ability to sustain that scaling. Is the bitter lesson applicable as we navigate the landscape of 2026 and beyond? This inquiry carries weight for contemporary unemployment and future existential risks.
Recent economic research offers a nuanced perspective on these developments. In a January 2026 paper, economist Joshua Gans proposes a model of âartificial jagged intelligence.â Gans notes that generative AI systems exhibit uneven performance across tasks that seem closely related; they may excel at one prompt while producing confidently incorrect answers on another with minimal alterations in wording or context. Anyone who has relied on ChatGPT for a work task only to witness it fabricate a plausible yet false statement is familiar with this inconsistency.
What makes Gansâs analysis economically intriguing is his exploration of scaling laws. In his model, increasing scale (represented by the density of known points in a knowledge landscape) reduces average gaps and enhances mean quality in a roughly linear manner. This finding bolsters Suttonâs thesis: more computational resources do translate to improved average performance. However, the jaggedness remains, and errors persist. Scaling enhances average performance without eradicating unexpected failures or long-tail issues.
Gans frames AI adoption as an information challenge: users prioritize local reliability (will the AI assist me in my tasks?), yet typically only observe broad, global quality indicators (benchmark scores). This disconnect creates tangible economic friction. A legal assistant might trust an AI that excels in 95% of contract reviews, only to be caught off guard by a confidently erroneous interpretation of a seemingly straightforward clause. The experienced errors, as Gans illustrates, are exacerbated by what statisticians refer to as the âinspection paradox.â Users encounter mistakes precisely in the scenarios where they require the most support.
Gansâs 2026 paper does not explicitly reference or challenge Sutton, but it can be interpreted as addressing a structural limitation that persists even when adhering to the Bitter Lesson. Scaling is effective, yet the economic advantages of scaling may be tempered by the ongoing unpredictability that scaling does not resolve.
This limitation carries significant implications for how businesses adopt AI: they cannot merely rely on benchmark performance but must invest in human oversight and domain-specific evaluations. Consequently, AI will not herald the demise of human employment.
While Sutton’s insights about the trajectory of AI are sound, it is crucial not to strip his perspective of its context. Scaling alone is insufficient; merely increasing scale is unlikely to propel us toward superintelligence. Models still necessitate human insight and structure to reach their full potential in business applications. Techniques such as RLHF (Reinforcement Learning from Human Feedback)âwhere human evaluators assess AI outputs to guide the model in learning helpful and safe responsesâare essential for infusing human values into AI systems. The evolution from earlier architectures to GPT-4 involved more than just amplifying data inputs.
Furthermore, we cannot simply keep âscaling upâ indefinitely. Real-world constraints, such as energy costs and data limitations, are ever-present. Therefore, for AI to achieve significant advancements, it will require not just brute force but also efficiency and algorithmic ingenuity. Human insight remains relevant, albeit in a transformed roleâone that focuses on shaping, constraining, and guiding scaled learning systems.
In conclusion, we must acknowledge Sutton’s contribution: scaling is indeed effective. However, the efficacy of that scaling hinges on human insight regarding how to structure and implement these systems. Economists will recognize this pattern: capital and labor are complementary, even when the capital manifests as GPUs and the labor entails crafting loss functions.
Gansâs work adds a crucial economic perspective: while scaling enhances average AI performance, the unpredictable and jagged nature of that performance incurs real costs for adopters. Businesses and individuals must navigate a landscape where AI is both increasingly capable and persistently unreliable in ways that are challenging to forecast. The economic benefits of AI investment depend not solely on raw capabilities but also on creating institutions and fostering human expertise to manage this unpredictability.
The bitter lesson may very well be that while scaling is a potent tool, the sweet corollary is that human ingenuity remains an indispensable component for future progress.
[1] Compute refers to the total computational power (usually quantified in floating-point operations (FLOPs)) utilized to train or operate a model in AI research.

