Intro. [Recording date: March 25, 2025.]
Russ Roberts: It’s March 25th, 2025, and joining me today is Dwarkesh Patel, a podcaster and author you can find on YouTube and Substack at Dwarkesh.com. He co-authored The Scaling Era: An Oral History of AI, 2019-2025 with Gavin Leech, which is our focal point for today’s discussion, among other intriguing topics. Dwarkesh, it’s a pleasure to have you on EconTalk.
Dwarkesh Patel: Thanks for having me, Russ! I’ve been a long-time fan, probably even before I started my podcast, so it’s exciting to finally converse with you.
Russ Roberts: I appreciate that. I admire your work as well, and I’m looking forward to diving into it.
Russ Roberts: In the early sections of your book, which I should mention is published by Stripe Press—known for their beautiful works—you emphasize the need to reevaluate the last six years, from 2019 to now. Why is that perspective important? What are we overlooking?
Dwarkesh Patel: There’s a prevalent view, even among researchers, that the major advancements in AI stem from breakthroughs in algorithms. While that’s certainly part of the story, the more significant backdrop involves substantial trends in computational power and data accumulation. These breakthroughs in algorithms are not isolated; they emerge from an evolutionary process where increased computational capacity allows for experimentation with various ideas. Without the compute advances, we wouldn’t have discerned why the transformer architecture outperformed previous models.
For instance, the evolution from GPT-2 to GPT-3 and then to GPT-4 is a narrative of escalating compute demands. This raises fundamental questions about intelligence itself: how does simply applying vast amounts of compute on diverse data lead to the emergence of intelligent agents? The trend of quadrupling compute annually, now costing hundreds of dollars—a far cry from being an academic pastime a decade ago—is the crucial narrative that often gets lost.
Russ Roberts: I should have noted earlier that you’re a computer science major, so your insights here are invaluable. Could you explain what a transformer is? It’s a pivotal component of this technology.
Dwarkesh Patel: The transformer is an architecture developed by Google researchers in 2018 and serves as the foundational breakthrough behind models like ChatGPT. What sets it apart from earlier architectures is its capacity for parallel training. This means that with vast GPU clusters, transformers scale more effectively than their predecessors, allowing us to apply more compute to achieve higher intelligence levels.
Another significant advancement was coupling this architecture with a straightforward training process focused on predicting the next word. Initially, it seemed simplistic, yet it allows the model to recognize increasingly complex patterns over time, culminating in a system capable of passing the Turing Test and assisting in various tasks.
Russ Roberts: You mentioned the term “intelligent” with quotes, which merits deeper discussion. At the end of the first chapter, you state that the book’s knowledge cut-off is November 2024, rendering anything after that irrelevant for your analysis. In today’s rapidly evolving landscape, that’s practically ancient history.
Dwarkesh Patel: Absolutely. Since then, significant breakthroughs like inference scaling have emerged, including models like o1 and o3, as well as DeepSeek’s reasoning model. This represents a departure from previous understandings. We once believed that merely expanding model size—think of the leap from GPT-3.5 to GPT-4—would drive progress. However, the incremental improvements seen with models like GPT-4.5—though better, not drastically so—suggest we need to explore more targeted training approaches. These involve focusing on specific tasks rather than simply predicting text and assessing performance on quantifiable challenges. The ability to automate tasks effectively hinges on the models’ performance in real-world applications, which is where the true economic value lies.
Russ Roberts: I appreciate your kind words about my research. Admittedly, my current focus is more on the intricacies of self-perception and ego, rather than cutting-edge AI. But thank you for the compliment.
Russ Roberts: I’ve developed a fondness for Claude. There’s speculation that it performs better with Hebrew than other LLMs, but I can’t verify that given my limited Hebrew skills. The embarrassing truth is, I like Claude primarily because of its elegant typeface—it’s visually appealing on my phone.
Some tools are clearly more adept than others for specific tasks. Are we aware of these discrepancies? Do industry professionals have a grasp of why certain models excel in particular areas?
I assume some are better at coding, while others might excel in deep research or processing complex meanings. For everyday users, are there noticeable differences, and can we pinpoint the reasons behind them?
Dwarkesh Patel: I would argue that regular users might have a clearer perspective than the AI researchers themselves. Looking ahead, I wonder what the long-term trajectory will be. Currently, many models seem quite similar and are even converging in capabilities. Companies are not only replicating each other’s products but also mimicking product names—like how Gemini has Deep Research while OpenAI has a similar offering.
Over time, it’s plausible that distinctions will emerge as labs pursue different objectives. For example, Anthropic seems to be focusing on developing fully autonomous software engineers, while others might prioritize consumer adoption or enterprise applications. However, at this stage, my impression is that the models feel quite homogenous.
Russ Roberts: I share that sentiment. In translation, for instance, a bilingual individual might have preferences, but it’s intriguing to contemplate how these models perform in personal contexts. Personally, I utilize them for brainstorming my thoughts on various issues, tutoring, and translations. I recently asked Claude to clarify the concept of a transformer, and it provided helpful insights. I even find myself seeking travel advice from Claude, which seems irrational given the wealth of travel sites available. Yet, I trust its suggestions.
What about you? How do you engage with these tools in your personal life?
Dwarkesh Patel: Primarily for research. As a podcaster, I dedicate considerable time prepping for each guest, seeking connections between ideas and understanding their significance. Engaging with these models helps clarify my confusions.
I’ve also experimented with integrating LLMs into my podcasting workflow for tasks like clip selection and automation. Their utility has been moderate, but they are indispensable for research. The pivotal question remains: once these models can effectively use computers, will that unlock a new level of value for users?
Russ Roberts: Could you elaborate on that?
Dwarkesh Patel: Currently, some labs have initiated features for computer interaction, but the performance is lacking. They struggle with practical tasks like booking flights or planning events, which are basic tasks a high school student could handle. In contrast, an LLM capable of solving advanced math problems or reasoning tasks still falters in these everyday scenarios. It raises fundamental questions: why can these models excel in abstract tasks yet stumble in practical applications? It could indicate that our understanding of intelligence is still evolving.
Russ Roberts: Until we grasp these models more thoroughly, it seems challenging to address this issue. You posed a question to Dario Amodei, the CEO of Anthropic: “What fundamentally explains why scaling works? Why does feeding vast compute into a broad data set result in the emergence of intelligence?” His response was candid: “The truth is we still don’t know. It’s largely a contingent empirical fact, observable in the data, yet we lack a satisfying explanation.” This uncertainty presents a significant barrier to enhancing the models’ capabilities, particularly in their ability to function as virtual assistants in practical scenarios.
Dwarkesh Patel: Indeed, this is a pressing question we will likely explore in the coming years. I also asked Dario about the implications of having an LLM that retains vast knowledge—if a human with such memory could draw insightful connections, why can’t LLMs leverage this advantage to generate innovative solutions? There are theories addressing this, but many questions linger.
Russ Roberts: You titled your book The Scaling Era, implying there’s another phase on the horizon. Can you speculate what that might be called?
Dwarkesh Patel: Perhaps the RL era? In essence, scaling refers to the exponential growth of these systems, wherein moving from models like GPT-2 to GPT-3—or even GPT-4—requires staggering amounts of compute. In theoretical terms, to achieve similar performance levels, you might need to multiply compute by a factor of 100. While efficiency improvements can alter this dynamic, the overarching trend remains: significant compute investment is essential for advancements.
Looking forward, we can expect this pattern to persist, especially with large compute clusters coming online following the surge in demand triggered by ChatGPT’s success. The real question is: how much compute will be necessary to achieve breakthroughs in reasoning, agency, and beyond?
Regarding artificial general intelligence (AGI)—
Russ Roberts: AGI—
Dwarkesh Patel: Exactly. There will come a time when AGI operates as efficiently as a human brain—at least as efficiently. A human brain consumes about 20 watts, while an H100 might draw around 1,000 watts for a single model. We understand that it’s physically feasible for human-level intelligence to operate at that low energy threshold, and it’s conceivable that future advancements will further enhance efficiency. However, before we reach that point, we may develop AGI that requires immense infrastructure and capital investment, resulting in a somewhat clunky system. It might even deploy inference scaling, where deeper reasoning over extended periods yields improvements. For instance, OpenAI has tackled challenges using visual processing puzzles, where performance improved significantly with extended reasoning. It raises the question: what breakthroughs occur at deeper levels of processing?
Ultimately, the first AGI we create may not be the most efficient model possible; it will be one driven by the immense value of having such a system, despite its potential inefficiencies.
Russ Roberts: Can you identify another technology where trial and error has been so crucial? I previously interviewed Matt Ridley, who discussed how often experts trail behind those who experiment. The Wright brothers, for example, were simply bicycle enthusiasts who achieved flight through persistence and experimentation. It’s fascinating that we have incredibly sophisticated computer scientists developing complex transformer architectures without fully understanding how they function. This duality—intense intellectual rigor paired with reliance on trial and error—is quite remarkable. Can you think of other technologies that exhibit this phenomenon?
Dwarkesh Patel: Most technologies embody this idea that individual brilliance is often overstated in favor of cumulative, incremental improvements. Many breakthroughs aren’t about singular events but arise from refining optimizers or enhancing hardware capabilities. The transformative potential we’re witnessing now could not have been realized in the 1990s, even if similar ideas were conceived. The difference lies in the scaled-up capabilities that have enabled us to unlock AI’s full potential.
[More to come, 20:25]