Can be licensed alone or paired
with Atrevido and Avispado
The VPU supports large memory capacity with its 64-bit native data path.
FP & Int, 8b to 64b
Supports all integer and floating points formats from 8 up to 64 bits, including bfloat16.
DLEN up to 2048b
Data path length can be scaled to your needs, from 128 up to 2048 bits.
VLEN up to 4096b
Vector length can be scaled to your needs, from 128 up to 4096 bits.
AI - Ready
Tensor instructions seamlessly integrated.
Implements the complete RISC-V Vector 1.0 Specification.
What is a Vector Unit?
A Vector Unit is composed of several "Vector Cores", roughly equivalent to a GPU Core, that perform multiple calculations in parallel. Each Vector Core has arithmetic units capable of performing addition, subtraction, fused multiply-add, division, square root, and logic operations.
Semidynamics Vector Unit
Our vector core can be tailored to support different data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16 or INT8, depending on the customer’s target application domain.
The largest data type size in bits defines the vector core width or ELEN. Customers then select the number of vector cores to be implemented within the Vector Unit, either 4, 8, 16 or 32 cores, catering for a very wide range of power-performance-area trade-off options.
Once these choices are made, the total Vector Unit data path width or DLEN is ELEN x number of vector cores. We support DLEN configurations from 128b to 2048b.
Our Vector Unit is equipped with a high-performance, cross-vector-core network that provides all-to-all connectivity between the vector cores at high bandwidth, even for the very large, 32-vector core option.
The cross-vector-core unit is used for specific instructions in the RISC-V standard that shuffle data between the different vector cores, such as vrgather, vslide, etc.
We also offer a second key choice in the Vector Unit: the number of bits of each vector register (known as VLEN) can also be tailored to customer’s needs.
While most other vendors assume that VLEN is equal to DLEN (i.e., 1X ratio), we offer 2X, 4X and 8X ratios. When the VLEN is larger than the DLEN, a vector operation uses multiple cycles to execute. This is a great feature for tolerating large memory latencies and for reducing power.
For example, when VLEN=2048 and DLEN=512,
each vector arithmetic operation will take 4 clocks to execute.