Go to main content

PDF

Description

Gemmini is an open-source generator for systolic-array architectures, allowing for systematic explorations of the accelerator design space. However, the lack of existing software support for Gemmini in machine-learning frameworks (e.g. PyTorch or Tensorflow) poses a significant bottleneck to neural network model evaluation and serving.

To address these limitations we present the design and implementation of a Gemmini backend for Microsoft's ONNX Runtime engine. By extending ONNX Runtime's support for heterogeneous architectures and model graph transformations, the Gemmini backend accelerates the primary computational kernels – matrix multiplications, convolutions, and pooling – while ensuring interoperability between the channel-layouts expected by Gemmini and the rest of ONNX Runtime. We then proceed to benchmark our implementation on a broad-set of networks including ResNet, BERT, and Mask-RCNN – our results show that the Gemmini backend is a performant drop-in replacement for accelerating real-world workloads.

Details

Files

Statistics

from
to
Export
Download Full History
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS