Tuesday, 30 Jun 2026
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA
logo logo
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
  • 🔥
  • Trump
  • House
  • White
  • ScienceAlert
  • VIDEO
  • man
  • Trumps
  • Season
  • star
  • Years
Font ResizerAa
American FocusAmerican Focus
Search
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
Follow US
© 2024 americanfocus.online – All Rights Reserved.
American Focus > Blog > Tech and Science > Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
Tech and Science

Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

Last updated: April 29, 2025 2:16 pm
Share
Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
SHARE

Meta has made a groundbreaking announcement today, revealing a strategic partnership with Cerebras Systems to power its new Llama API. This collaboration will provide developers with access to inference speeds that are up to 18 times faster than traditional GPU-based solutions.

The announcement was made at Meta’s inaugural LlamaCon developer conference in Menlo Park, positioning the company to directly compete with industry giants such as OpenAI, Anthropic, and Google in the rapidly expanding AI inference service market. Developers across the globe are increasingly relying on tokens to power their applications, making speed and efficiency crucial factors in the competitive landscape.

Julie Shin Choi, Chief Marketing Officer at Cerebras, expressed excitement about the partnership, stating, “Meta has chosen Cerebras to collaborate in delivering the ultra-fast inference required to serve developers through their new Llama API. This marks our first CSP hyperscaler partnership, and we are thrilled to provide ultra-fast inference to all developers.”

This partnership signifies Meta’s entry into the business of selling AI computation, transforming its widely-used open-source Llama models into a commercial service. While Meta’s Llama models have amassed over one billion downloads, the company had not previously offered a first-party cloud infrastructure for developers to leverage these models in building applications.

James Wang, a senior executive at Cerebras, highlighted the significance of Meta’s move into the AI inference business, noting the exponential growth in demand for tokens by developers building AI applications. With the introduction of the Llama API, Meta is pioneering a new revenue stream from its AI investments while maintaining a commitment to open models.

See also  Three Epic Meteor Showers Are About to Light Up July – Here's Your Guide : ScienceAlert

The speed advantage provided by Cerebras’ specialized AI chips is a game-changer for Meta’s offering. The Cerebras system delivers over 2,600 tokens per second for Llama 4 Scout, significantly outperforming competitors in the market. This speed increase enables the development of new applications that were previously impractical, including real-time agents, low-latency voice systems, interactive code generation, and instant multi-step reasoning.

The Llama API represents a pivotal shift in Meta’s AI strategy, transitioning from being primarily a model provider to a full-service AI infrastructure company. By offering tools for fine-tuning and evaluation, starting with the Llama 3.3 8B model, developers can generate data, train on it, and test the quality of their custom models using the API.

Cerebras will power Meta’s new service through its network of data centers located throughout North America, ensuring optimal performance and scalability. The partnership with Groq further enhances developers’ access to high-performance inference options beyond traditional GPU-based solutions.

Meta’s entry into the inference API market with superior performance metrics has the potential to disrupt the established order dominated by industry leaders like OpenAI and Google. By combining the popularity of its open-source models with faster inference capabilities, Meta is positioning itself as a formidable competitor in the commercial AI space.

The Llama API is currently available as a limited preview, with plans for a broader rollout in the coming weeks and months. Developers interested in accessing the ultra-fast Llama 4 inference can request early access by selecting Cerebras from the model options within the Llama API.

Meta’s decision to utilize specialized silicon signifies a shift towards prioritizing speed and efficiency in AI applications. In the evolving landscape of AI technology, the ability to process information quickly is becoming increasingly crucial, and Meta’s partnership with Cerebras is a significant step towards meeting this demand.

See also  Trump Administration Targets Offshore Wind Farms, Citing National Security Concerns
TAGGED:18xAPICerebrasdeliversfasterLlamaMetaOpenAIpartnershiprunningtokensunleashes
Share This Article
Twitter Email Copy Link Print
Previous Article What was big at AACR? A drug, a diagnostic, and a dog. Oh and a vaccine, too. What was big at AACR? A drug, a diagnostic, and a dog. Oh and a vaccine, too.
Next Article These New Puritans and Harley Weir on Friendship, Creativity, and Their Bold New Alexander Skarsgård-Fronted Music Video These New Puritans and Harley Weir on Friendship, Creativity, and Their Bold New Alexander Skarsgård-Fronted Music Video
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Popular Posts

This Dividend Giant Yielding 4.5% Is Wall Street’s Top Telecom Pick for 2026

Investment bank JPMorgan has recently released its list of top stock picks for 2026, and…

December 26, 2025

Everything Team USA’s Men and Women’s Hockey Teams Said About Trump Drama

President Donald Trump’s recent joke regarding Team USA’s women’s hockey team has sparked a debate…

March 1, 2026

Carbon sink possibly found in the South Island

By Eloise Gibson of RNZ Researchers have potentially uncovered a significant carbon sink in the…

June 17, 2025

The Subtle Signs You’re Losing Yourself And How to Find Your Way Back

Reclaiming Your Inner Light: A Journey to Authenticity Have you ever found yourself in a…

August 5, 2025

The Superfine And Dandiest Outfits At The 2025 Met Gala

Daring. These were the words that defined the Met Gala 2025, a night where fashion…

May 6, 2025

You Might Also Like

The attack that hijacked Claude Code came through Sentry. Datadog, PagerDuty, and Jira have the same exposure.
Tech and Science

The attack that hijacked Claude Code came through Sentry. Datadog, PagerDuty, and Jira have the same exposure.

June 30, 2026
Chaotic pigeons are helping redefine what we know about learning
Tech and Science

Chaotic pigeons are helping redefine what we know about learning

June 30, 2026
iPhone 18 Release Date Just Got ‘Confirmed’
Tech and Science

iPhone 18 Release Date Just Got ‘Confirmed’

June 30, 2026
US government wants to have a useful quantum computer by 2028
Tech and Science

US government wants to have a useful quantum computer by 2028

June 29, 2026
logo logo
Facebook Twitter Youtube

About US


Explore global affairs, political insights, and linguistic origins. Stay informed with our comprehensive coverage of world news, politics, and Lifestyle.

Top Categories
  • Crime
  • Environment
  • Sports
  • Tech and Science
Usefull Links
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA

© 2024 americanfocus.online –  All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?