Blog

By Volker Politz  |  September 1st, 2025

Cervell and the Changing Shape of AI Infrastructure

AI infrastructure is no longer defined by how much raw compute you can deliver, but by how efficiently you can run it at scale. Hyperscalers are under pressure from ballooning inference costs, memory bottlenecks, and the thermal and power ceilings of their datacenters. Adding TOPS alone doesn’t solve the problem. What matters is keeping pipelines full, racks utilized, and workloads flexible across different tiers of deployment.

That’s the philosophy behind Cervell, Semidynamics’ new NPU. It combines CPU, vector, and tensor processing in a single, fully programmable RISC-V design. Standard configurations can scale from 1 TOPS to 256 TOPS, enabling the same architecture to operate in everything from edge sensors to high-throughput datacenter racks. For hyperscalers, this means reducing the sprawl of accelerator types and deploying a common architecture that flexes with the workload instead of locking into a narrow niche.

Tackling the Datacenter Bottleneck

The real choke point in large-scale AI isn’t compute, it’s memory. Recommendation engines and trillion-parameter language models spend most of their cycles waiting for data. Cervell addresses this with Semidynamics’ Gazillion Misses™ subsystem, capable of handling up to 128 concurrent memory requests and sustaining over 60 bytes per cycle. By keeping inference engines fed, it avoids the stalls that drive up costs and waste rack space.

This advantage becomes even more pronounced with the utilization of a KV cache (Key Value Cache), which is now standard in transformer-based models. Use of a KV cache reduces computation by storing intermediate results, but it significantly increases DRAM traffic, often several times the model size, making a local SRAM buffer an unviable solution for this purpose. That means memory subsystems can quickly become the new bottleneck. Gazillion Misses is designed for exactly this challenge, absorbing the higher volume of DRAM accesses without starving the compute pipeline. This ensures that Cervell maintains high utilization even when running large-scale LLM inference with heavy KV cache workloads.

Because Cervell is based on RISC-V, it also avoids the lock-in of closed architectures. Hyperscalers can easily add custom operators and library functions to suit proprietary AI frameworks. This openness of the RISC-V ISA is becoming critical as the industry shifts toward chiplet-based and heterogeneous systems where independent software control at system level is as important as raw hardware performance.

AI infrastructure is fragmenting fast, and the economics are unforgiving. Fixed-function accelerators can’t meet the diversity of workloads ahead. Cervell is designed to be programmable, scalable, and memory-first, making it a practical choice for datacenter operators and cloud providers who need to control inference costs, sustain high utilization, and maintain flexibility across product lines. In a world where infrastructure has to scale without breaking, Cervell provides a silicon architecture built for the real economics of AI.

Semidynamics will be exhibiting in the RISC-V Pavilion at AI Infra Summit 2025. If you’d like to see our technology in action, discuss datacenter deployments, or explore customization options, we’re scheduling onsite meetings throughout the event.

 Please reach out to Laura Batlle at laura.batlle@semidynamics.com to book a time.