FuriosaAI RNGD – Tensor Contraction Processor

4 months ago 29

FuriosaAI’s second-generation Neural Processing Unit (NPU), RNGD, is a chip designed for deep learning inference, supporting high-performance Large Language Models (LLM), Multi-Modal LLM, Vision models, and other deep learning models.

FuriosaAI RNGD

RNGD is based the Tensor Contraction Processor (TCP) architecture which utilizes TSMC’s 5nm process node, and operates at 1.0 GHz. It offers 512 TOPS and 1024 TOPS of INT8 and INT4 performance, respectively. RNGD is configured with two HBM3 modules providing a memory bandwidth of 1.5 TB/s, and supports PCIe Gen5 x16. For multi-tenant environments like Kubernetes, a single RNGD chip can work as 2, 4, or 8 individual NPUs, each fully isolated with its own cores and memory bandwidth. RNGD supports Single Root IO Virtualization (SR-IOV) and virtualization for multi-instance NPUs.

Please refer to the followings to learn more about TCP architecture and RNGD:

Read Entire Article