Friday, 31 Oct 2025
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA
logo logo
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
  • 🔥
  • Trump
  • VIDEO
  • House
  • White
  • ScienceAlert
  • Trumps
  • Watch
  • man
  • Health
  • Season
Font ResizerAa
American FocusAmerican Focus
Search
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
Follow US
© 2024 americanfocus.online – All Rights Reserved.
American Focus > Blog > Tech and Science > Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
Tech and Science

Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

Last updated: April 29, 2025 2:16 pm
Share
Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
SHARE

Meta has made a groundbreaking announcement today, revealing a strategic partnership with Cerebras Systems to power its new Llama API. This collaboration will provide developers with access to inference speeds that are up to 18 times faster than traditional GPU-based solutions.

The announcement was made at Meta’s inaugural LlamaCon developer conference in Menlo Park, positioning the company to directly compete with industry giants such as OpenAI, Anthropic, and Google in the rapidly expanding AI inference service market. Developers across the globe are increasingly relying on tokens to power their applications, making speed and efficiency crucial factors in the competitive landscape.

Julie Shin Choi, Chief Marketing Officer at Cerebras, expressed excitement about the partnership, stating, “Meta has chosen Cerebras to collaborate in delivering the ultra-fast inference required to serve developers through their new Llama API. This marks our first CSP hyperscaler partnership, and we are thrilled to provide ultra-fast inference to all developers.”

This partnership signifies Meta’s entry into the business of selling AI computation, transforming its widely-used open-source Llama models into a commercial service. While Meta’s Llama models have amassed over one billion downloads, the company had not previously offered a first-party cloud infrastructure for developers to leverage these models in building applications.

James Wang, a senior executive at Cerebras, highlighted the significance of Meta’s move into the AI inference business, noting the exponential growth in demand for tokens by developers building AI applications. With the introduction of the Llama API, Meta is pioneering a new revenue stream from its AI investments while maintaining a commitment to open models.

See also  Man gets 10 years for running over a Chicago cop after cutting off his ankle monitor

The speed advantage provided by Cerebras’ specialized AI chips is a game-changer for Meta’s offering. The Cerebras system delivers over 2,600 tokens per second for Llama 4 Scout, significantly outperforming competitors in the market. This speed increase enables the development of new applications that were previously impractical, including real-time agents, low-latency voice systems, interactive code generation, and instant multi-step reasoning.

The Llama API represents a pivotal shift in Meta’s AI strategy, transitioning from being primarily a model provider to a full-service AI infrastructure company. By offering tools for fine-tuning and evaluation, starting with the Llama 3.3 8B model, developers can generate data, train on it, and test the quality of their custom models using the API.

Cerebras will power Meta’s new service through its network of data centers located throughout North America, ensuring optimal performance and scalability. The partnership with Groq further enhances developers’ access to high-performance inference options beyond traditional GPU-based solutions.

Meta’s entry into the inference API market with superior performance metrics has the potential to disrupt the established order dominated by industry leaders like OpenAI and Google. By combining the popularity of its open-source models with faster inference capabilities, Meta is positioning itself as a formidable competitor in the commercial AI space.

The Llama API is currently available as a limited preview, with plans for a broader rollout in the coming weeks and months. Developers interested in accessing the ultra-fast Llama 4 inference can request early access by selecting Cerebras from the model options within the Llama API.

Meta’s decision to utilize specialized silicon signifies a shift towards prioritizing speed and efficiency in AI applications. In the evolving landscape of AI technology, the ability to process information quickly is becoming increasingly crucial, and Meta’s partnership with Cerebras is a significant step towards meeting this demand.

See also  39 Things to Know About JD Vance, Trump’s Running Mate
TAGGED:18xAPICerebrasdeliversfasterLlamaMetaOpenAIpartnershiprunningtokensunleashes
Share This Article
Twitter Email Copy Link Print
Previous Article What was big at AACR? A drug, a diagnostic, and a dog. Oh and a vaccine, too. What was big at AACR? A drug, a diagnostic, and a dog. Oh and a vaccine, too.
Next Article These New Puritans and Harley Weir on Friendship, Creativity, and Their Bold New Alexander Skarsgård-Fronted Music Video These New Puritans and Harley Weir on Friendship, Creativity, and Their Bold New Alexander Skarsgård-Fronted Music Video
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular Posts

Luigi Mangione’s Lawyers Ask Judge to Unshackle Him for Future Court Appearances

Luigi Mangione Not Asking You to Free Luigi, Judge ... Could You Free Up His…

June 3, 2025

The Malaysia Aesthetic: Influencing Global Fashion

Malaysia's Fashion Revolution: A Closer Look at the Trends Shaping the Industry Malaysia has quickly…

October 26, 2024

Build Digital Literacy With The Edit Digital Storytelling Challenge!

Are you looking for a way to empower your students and give them a platform…

October 25, 2024

PSG vs. Inter live stream: Where to watch 2025 Champions League final, TV channel, odds, prediction, game pick

For Inter, the biggest question mark is the availability of star midfielder Nicolo Barella. The…

May 31, 2025

Countries that choose to do so can reduce premature death by half, researchers say

The Lancet Commission on Investing in Health recently published a report highlighting the progress made…

October 14, 2024

You Might Also Like

Your flight emissions are way higher than carbon calculators suggest
Tech and Science

Your flight emissions are way higher than carbon calculators suggest

October 31, 2025
USB-C Chargers: How to Choose the Best One
Tech and Science

USB-C Chargers: How to Choose the Best One

October 31, 2025
Physicists Just Ruled Out The Universe Being a Simulation : ScienceAlert
Tech and Science

Physicists Just Ruled Out The Universe Being a Simulation : ScienceAlert

October 31, 2025
Tim Cook says Apple is open to M&A on the AI front
Tech and Science

Tim Cook says Apple is open to M&A on the AI front

October 31, 2025
logo logo
Facebook Twitter Youtube

About US


Explore global affairs, political insights, and linguistic origins. Stay informed with our comprehensive coverage of world news, politics, and Lifestyle.

Top Categories
  • Crime
  • Environment
  • Sports
  • Tech and Science
Usefull Links
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA

© 2024 americanfocus.online –  All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?