Stay updated with the latest industry-leading AI coverage by subscribing to our daily and weekly newsletters. Join now for exclusive content and updates. Learn More
Diffbot, a Silicon Valley company renowned for maintaining a vast index of web knowledge, has unveiled a groundbreaking AI model aimed at tackling a major challenge in the field: factual accuracy.
The newly released model, a refined version of Meta’s LLama 3.3, represents an open-source implementation of a system known as Graph Retrieval-Augmented Generation (GraphRAG).
Unlike traditional AI models that rely on static training data, Diffbot’s LLM leverages real-time information from the company’s Knowledge Graph, a dynamic database with over a trillion interconnected facts.
“Our belief is that general-purpose reasoning will eventually be distilled into approximately 1 billion parameters,” stated Mike Tung, CEO of Diffbot, in an interview with VentureBeat. “The goal is not to store knowledge in the model but to enable it to effectively query external knowledge sources.”
How it Works
Diffbot’s Knowledge Graph is an automated database that continuously crawls the web, categorizing entities such as people, companies, products, and articles. This database is updated every few days with millions of new facts, ensuring its accuracy and relevance. The AI model queries this real-time information from the Knowledge Graph, enhancing its accuracy and transparency compared to traditional LLMs.
For instance, when asked about a recent news event, the model can retrieve the latest updates from the web, extract relevant facts, and cite the original sources, ensuring up-to-date and reliable information.
How Diffbot’s Knowledge Graph Outperforms Traditional AI in Fact Retrieval
In benchmark tests, Diffbot’s model has demonstrated impressive results, achieving an 81% accuracy score on FreshQA and 70.36% on MMLU-Pro, surpassing competitors like ChatGPT and Gemini. Moreover, Diffbot has made its model fully open-source, enabling companies to customize and deploy it on their own hardware, addressing concerns about data privacy and vendor lock-in.
Open-Source AI’s Impact on Data Handling
Diffbot’s release comes at a crucial juncture in AI development, with increasing scrutiny on large language models generating false information. By grounding AI systems in factual knowledge, Diffbot offers an alternative path forward, emphasizing accuracy and verifiability over model size.
Industry experts believe that Diffbot’s Knowledge Graph-based approach is particularly valuable for enterprise applications where precision and auditability are paramount. Major companies like Cisco, DuckDuckGo, and Snapchat already benefit from Diffbot’s data services.
The model is now available on GitHub and can be tested through a public demo on diffy.chat. Organizations can deploy the smaller 8-billion-parameter version on a single Nvidia A100 GPU, while the full 70-billion-parameter version requires two Nvidia H100 GPUs.
Looking ahead, Tung envisions a future where AI’s focus shifts from larger models to better organization and access to human knowledge. By moving towards explicit knowledge repositories with data provenance, the AI industry can overcome challenges related to accuracy and transparency.
Diffbot’s release challenges the notion that bigger models are always better in AI, offering a compelling alternative that prioritizes factual accuracy and real-time information retrieval. Whether this approach reshapes the industry remains to be seen, but it highlights the importance of precision over sheer size in the AI landscape.