Technology

Vector Unit
Programmable acceleration for AI

The Vector Unit executes multiple data elements simultaneously, delivering high-throughput parallel compute for AI and HPC workloads  

What is a Vector Unit?

A Vector Unit is composed of several "Vector Cores", roughly equivalent to a GPU Core, that perform multiple calculations in parallel. Each Vector Core has arithmetic units capable of performing addition, subtraction, fused multiply-add, division, square root, and logic operations

Parallel Efficiency

Processes numerous computations simultaneously to maximize speed

Functional Versatility

Handles a wide array of tasks, from basic math to complex logic

Massive Throughput

Leverages GPU-like architecture to manage heavy data loads effectively

Key Benefits

Optimized performance through massive bandwidth, flexible data widths, and full RISC-V 1.0 integration

lorem ipsum

Up to 2048-bit data path

The VPU supports large memory bandwidth with its up to 2048-bit native data path

FP & Int, 8b to 64b

Supports all integer and floating points formats from 8 up to 64 bits, including bfloat16

DLEN up to 2048b

Data path length can be scaled to your needs, from 128 up to 2048 bits

VLEN up to 4096b

Vector length can be scaled to your needs, from 128 up to 4096 bits

AI - Ready

Tensor instructions seamlessly integrated

RVV1.0

Implements the complete RISC-V Vector 1.0 Specification

Programmable acceleration for AI

Our vector core can be tailored to support different data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16 or INT8, depending on the customer’s target application domain. The largest data type size in bits defines the vector core width or ELEN

You can build a vector unit by hooking together up to 32 vector cores (4, 8, 16 or 32 cores) catering for a very wide range of power-performance-area trade-off options

Once these choices are made, the total Vector Unit data path width or DLEN is ELEN x number of vector cores. We support DLEN configurations from 128b to 2048b

Our Vector Unit is equipped with a high-performance, cross-vector-core network that provides all-to-all connectivity between the vector cores at high bandwidth, even for the very large, 32-vector core option.

The cross-vector-core unit is used for specific instructions in the RISC-V standard that shuffle data between the different vector cores, such as vrgather, vslide, etc.

We also offer a second key choice in the Vector Unit: the number of bits of each vector register (known as VLEN) can also be tailored to customer’s needs.

While most other vendors assume that VLEN is equal to DLEN (i.e., 1X ratio), we offer 2X, 4X and 8X ratios. When the VLEN is larger than the DLEN, a vector operation uses multiple cycles to execute. This is a great feature for tolerating large memory latencies and for reducing power.

For example, when VLEN=2048 and DLEN=512, each vector arithmetic operation will take 4 clocks to execute.