Introducing Dia: Nari Labs’ New AI Model for Generating Podcast-Style Clips
The world of synthetic speech tools is rapidly expanding, with numerous players entering the market to meet the growing demand. One such player is Nari Labs, a Korea-based startup co-founded by Toby Kim. Despite lacking extensive AI expertise, Kim and his fellow co-founder have developed an AI model named Dia that is now openly available for use.
Inspired by Google’s NotebookLM, Kim and his team set out to create a model that offered more control over generated voices and greater flexibility in script customization. After just three months of learning about speech AI, they utilized Google’s TPU Research Cloud program to train Dia, which boasts an impressive 1.6 billion parameters.
Parameters are crucial internal variables that models use to make predictions, and Dia’s large parameter count ensures high performance. Available on platforms like Hugging Face and GitHub, Dia can run on most modern PCs with at least 10GB of VRAM. It allows users to generate dialogue from scripts, customize speakers’ tones, and even insert nonverbal cues like coughs and laughs.
In a brief test conducted by JS, Dia performed admirably, generating realistic two-way conversations on various topics. The quality of the voices produced by Dia rivals that of other tools on the market, and its voice cloning feature is notably user-friendly.
However, like many voice generators, Dia lacks robust safeguards against misuse. Nari Labs warns against using the model for impersonation or deceptive purposes but disclaims responsibility for any misuse. Additionally, the source of the data used to train Dia remains undisclosed, raising questions about potential copyright infringement.
Despite these concerns, Nari Labs has ambitious plans for Dia, aiming to develop a synthetic voice platform with a social aspect and expand language support beyond English. Kim envisions a future where Dia and its successors revolutionize the way we interact with AI-generated voices.