Wednesday, 10 Jun 2026
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA
logo logo
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
  • 🔥
  • Trump
  • House
  • White
  • ScienceAlert
  • VIDEO
  • man
  • Trumps
  • Season
  • star
  • Years
Font ResizerAa
American FocusAmerican Focus
Search
  • World
  • Politics
  • Crime
  • Economy
  • Tech & Science
  • Sports
  • Entertainment
  • More
    • Education
    • Celebrities
    • Culture and Arts
    • Environment
    • Health and Wellness
    • Lifestyle
Follow US
© 2024 americanfocus.online – All Rights Reserved.
American Focus > Blog > Tech and Science > Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
Tech and Science

Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

Last updated: April 29, 2025 2:16 pm
Share
Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
SHARE

Meta has made a groundbreaking announcement today, revealing a strategic partnership with Cerebras Systems to power its new Llama API. This collaboration will provide developers with access to inference speeds that are up to 18 times faster than traditional GPU-based solutions.

The announcement was made at Meta’s inaugural LlamaCon developer conference in Menlo Park, positioning the company to directly compete with industry giants such as OpenAI, Anthropic, and Google in the rapidly expanding AI inference service market. Developers across the globe are increasingly relying on tokens to power their applications, making speed and efficiency crucial factors in the competitive landscape.

Julie Shin Choi, Chief Marketing Officer at Cerebras, expressed excitement about the partnership, stating, “Meta has chosen Cerebras to collaborate in delivering the ultra-fast inference required to serve developers through their new Llama API. This marks our first CSP hyperscaler partnership, and we are thrilled to provide ultra-fast inference to all developers.”

This partnership signifies Meta’s entry into the business of selling AI computation, transforming its widely-used open-source Llama models into a commercial service. While Meta’s Llama models have amassed over one billion downloads, the company had not previously offered a first-party cloud infrastructure for developers to leverage these models in building applications.

James Wang, a senior executive at Cerebras, highlighted the significance of Meta’s move into the AI inference business, noting the exponential growth in demand for tokens by developers building AI applications. With the introduction of the Llama API, Meta is pioneering a new revenue stream from its AI investments while maintaining a commitment to open models.

See also  Everstone combines Wingify, AB Tasty for $100M+ digital experience optimization platform

The speed advantage provided by Cerebras’ specialized AI chips is a game-changer for Meta’s offering. The Cerebras system delivers over 2,600 tokens per second for Llama 4 Scout, significantly outperforming competitors in the market. This speed increase enables the development of new applications that were previously impractical, including real-time agents, low-latency voice systems, interactive code generation, and instant multi-step reasoning.

The Llama API represents a pivotal shift in Meta’s AI strategy, transitioning from being primarily a model provider to a full-service AI infrastructure company. By offering tools for fine-tuning and evaluation, starting with the Llama 3.3 8B model, developers can generate data, train on it, and test the quality of their custom models using the API.

Cerebras will power Meta’s new service through its network of data centers located throughout North America, ensuring optimal performance and scalability. The partnership with Groq further enhances developers’ access to high-performance inference options beyond traditional GPU-based solutions.

Meta’s entry into the inference API market with superior performance metrics has the potential to disrupt the established order dominated by industry leaders like OpenAI and Google. By combining the popularity of its open-source models with faster inference capabilities, Meta is positioning itself as a formidable competitor in the commercial AI space.

The Llama API is currently available as a limited preview, with plans for a broader rollout in the coming weeks and months. Developers interested in accessing the ultra-fast Llama 4 inference can request early access by selecting Cerebras from the model options within the Llama API.

Meta’s decision to utilize specialized silicon signifies a shift towards prioritizing speed and efficiency in AI applications. In the evolving landscape of AI technology, the ability to process information quickly is becoming increasingly crucial, and Meta’s partnership with Cerebras is a significant step towards meeting this demand.

See also  Sex toys maker Tenga says hacker stole customer information
TAGGED:18xAPICerebrasdeliversfasterLlamaMetaOpenAIpartnershiprunningtokensunleashes
Share This Article
Twitter Email Copy Link Print
Previous Article What was big at AACR? A drug, a diagnostic, and a dog. Oh and a vaccine, too. What was big at AACR? A drug, a diagnostic, and a dog. Oh and a vaccine, too.
Next Article These New Puritans and Harley Weir on Friendship, Creativity, and Their Bold New Alexander Skarsgård-Fronted Music Video These New Puritans and Harley Weir on Friendship, Creativity, and Their Bold New Alexander Skarsgård-Fronted Music Video
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Popular Posts

BREAKING: Hegseth Warns Active Duty Marines at Camp Pendleton Will Also Be Mobilized to Quell Los Angeles Riots… “They Are on High Alert” |

Marines Mobilized Amidst LA Anti-ICE Protests In a dramatic escalation of tensions, Defense Secretary Pete…

June 7, 2025

Mercedes Mone’s next match officially announced; but not in AEW

Mercedes Mone has been a prominent figure in AEW, but recently she has been absent…

February 5, 2026

Former Top DOJ Official Calls For Special Counsel to Investigate Biden Health Cover-Up (VIDEO) |

Calls for Special Counsel to Investigate Biden's Health Concerns John Yoo, a former Deputy Assistant…

May 24, 2025

Bode Spring 2026 Ready-to-Wear Collection

Bode Aujla's Latest Collection: A Musical Tribute to Moose Charlap One of the most captivating…

June 27, 2025

The PBM Bill Failed. Americans Dodged A Bullet.

The pharmaceutical industry middlemen known as Pharmacy Benefit Managers (PBMs) have long been under scrutiny…

January 2, 2025

You Might Also Like

Best Samsung Galaxy Phone 2026: Top Samsung Mobiles Tested
Tech and Science

Best Samsung Galaxy Phone 2026: Top Samsung Mobiles Tested

June 10, 2026
Hidden Coral World The Size of Vatican City Found Deep Beneath The Ocean : ScienceAlert
Tech and Science

Hidden Coral World The Size of Vatican City Found Deep Beneath The Ocean : ScienceAlert

June 10, 2026
How to watch the World Cup in 4K: UK Streaming Guide
Tech and Science

How to watch the World Cup in 4K: UK Streaming Guide

June 10, 2026
How the new FDA-approved ingredient bemotrizinol enhances sunscreen protection
Tech and Science

How the new FDA-approved ingredient bemotrizinol enhances sunscreen protection

June 9, 2026
logo logo
Facebook Twitter Youtube

About US


Explore global affairs, political insights, and linguistic origins. Stay informed with our comprehensive coverage of world news, politics, and Lifestyle.

Top Categories
  • Crime
  • Environment
  • Sports
  • Tech and Science
Usefull Links
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • DMCA

© 2024 americanfocus.online –  All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?