Nvidia Licenses Groq's AI Inference Technology in Reported $20 Billion Deal

Nvidia announced on December 24, 2025, a non-exclusive licensing agreement for Groq's inference technology, integrating Groq's low-latency Language Processing Unit (LPU) designs into Nvidia's AI architecture. The deal includes hiring Groq founder/CEO Jonathan Ross, president Sunny Madra, and key engineering team members to scale the technology. Groq continues independently, operating its GroqCloud service under new CEO Simon Edwards (former finance chief), while Nvidia gains IP rights without full company acquisition. Reports value the transaction at approximately $20 billion in cash, Nvidia's largest deal to date.

Significance

This marks a strategic consolidation in AI inference, where Groq's LPU excels in speed (claimed 10x faster, one-tenth energy vs. GPUs for LLMs). Nvidia neutralizes a direct competitor while bolstering its portfolio against rivals like Cerebras and AMD in the growing inference market, projected to dominate post-training workloads.

People Behind It

Jonathan Ross (Groq founder/CEO, joining Nvidia): Former Google engineer who initiated the TPU project; founded Groq in 2016 to develop deterministic LPUs for inference.
Sunny Madra (Groq president, joining Nvidia): Oversees operations and growth.
Jensen Huang (Nvidia CEO): Drives integration per internal memo, viewing it as expanding Nvidia's AI factory for real-time workloads.

Why Nvidia Is Doing This Deal

Nvidia faces intensifying competition in inference despite training dominance. Groq's architecture addresses latency bottlenecks; licensing eliminates a challenger, incorporates superior low-latency tech, and secures talent amid AI shift from training to deployment. The structure avoids full acquisition scrutiny while enhancing Nvidia's ecosystem.

Groq's Language Processing Unit (LPU) Architecture

Groq's Language Processing Unit (LPU) is a specialized AI accelerator designed specifically for low-latency, high-throughput inference in large language models (LLMs) and other sequential workloads. Unlike general-purpose GPUs, the LPU employs a deterministic, software-defined architecture optimized for predictable execution.

Core Design Principles

Deterministic and Statically Scheduled Execution: The LPU eliminates sources of non-determinism common in GPUs (e.g., dynamic scheduling, caches, interrupts). A compiler statically maps and schedules every instruction and data movement in advance, ensuring no wasted cycles or variable latency.
Tensor Streaming Processor (TSP) Foundation: Each LPU chip is built around a Tensor Streaming Processor with hundreds of functional units (vector, matrix, scalar) organized in a programmable "assembly line." Data and instructions flow via internal "conveyor belts" (high-bandwidth switches), where single-instruction multiple-data (SIMD) units process streams continuously.
SRAM-Centric Memory Hierarchy: Hundreds of MB of on-chip SRAM serve as primary storage for model weights (not cache), providing instant access and ultra-high bandwidth (up to 80 TB/s on-die). This avoids external HBM bottlenecks during inference.
Multi-Chip Scaling: Chips interconnect via high-bandwidth, plesiosynchronous protocols, allowing hundreds or thousands to function as a single logical processor with tensor parallelism (splitting layers across chips) and pipeline parallelism.

Key Advantages Over Traditional Architectures

The design prioritizes sequential token generation in LLMs, delivering consistent low latency (e.g., hundreds of tokens/second) and high efficiency (up to 10x better energy use than GPUs for inference). It supports mixed-precision formats and avoids quality-latency tradeoffs via precise numerics.

Groq's approach, rooted in founder Jonathan Ross's prior work on Google's TPU, represents a shift toward domain-specific hardware for production AI deployment.

References

Groq Official Announcement – https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale
CNBC Exclusive Report – https://www.cnbc.com/2025/12/24/nvidia-buying-ai-chip-startup-groq-for-about-20-billion-biggest-deal.html
Reuters Coverage – https://www.reuters.com/business/nvidia-buy-ai-chip-startup-groq-about-20-billion-cnbc-reports-2025-12-24/