Specifications
Highly scalable and efficient second-generation ML inference processor
Build premium AI solutions with low cost in multiple market segments with the second generation highly scalable and efficient NPU. Ethos-N78 enables new immersive applications with 2.5x increased single core performance that is scalable from 1 to 10 TOP/s. It provides flexibility to optimize the ML capability with more than ninety configurations.

Arm Ethos-N78 premium ML inference processor
Key features | Performance |
10, 5, 2, 1 TOP/s |
MACs (8x8) | 4096, 2048, 1024, 512 |
|
Data types | Int-8 and Int-16 | |
Network support | CNN and RNN | |
Efficient convolution | Winograd support delivers 2.25x peak performance over baseline | |
Sparsity | Yes | |
Secure mode | TEE or SEE | |
Multicore capability | 8 NPUs in a cluster 64 NPUs in a mesh |
|
Memory system | Embedded SRAM | 384KB-4MB |
Bandwidth reduction |
Enhanced compression | |
Main interface |
1xAXI4 (128-bit), ACE-5 Lite | |
Development platform | Neural frameworks | TensorFlow, TensorFlow Lite, Caffe2, PyTorch, MXNet, ONNX |
Inference deployment | Ahead of Time Compiled with TVM Online Interpreted with Arm NN Android Neural Networks API (NNAPI) |
|
Software components | Arm NN, neural compiler, driver and support library | |
Debug and profile | Heterogeneous layer-by-layer visibility in Development Studio Streamline | |
Evaluation and early prototyping | Ethos-N Static Performance Analyzer (SPA), Arm Juno FPGA systems, Cycle Models |
Key features
Increased performance
Improves user experience with 2.5x increased single core performance that is scalable from 1-10 TOP/s and beyond through many-core technologies
Higher efficiency
Up to 40% lower DRAM bandwidth (MB/Infr) and up to 25% increase in efficiency (inf/s/mm2) enables demanding neural networks to be run in diverse solutions
Extended configurability
Target multiple markets with flexibility to optimize the ML capability with 90+ configurations and the Ethos-N Static Performance Analyzer
Unified software and tools
Develop, deploy and debug with the Arm AI platform using online or offline compilation and Arm Development Studio Streamline
Key benefits
- Deploy efficient AI with low cost in multiple markets through performance scalability and extensive configurability
- Extends battery life for AI workloads with up to 40% lower DRAM traffic (MB/Infr) through compression, clustering, and cascading
- Enables early performance feedback with Ethos-N NPU Static Performance Analyzer (SPA) and Arm Development Studio Streamline
- Supports comprehensive security solution along with Arm SMMU and CryptoCell IP
- Enables pre-silicon network performance tuning with interactive speed, bandwidth, and utilization reports
Ethos-N comparison
Ethos-N78 |
Ethos-N77 |
Ethos-N57 |
Ethos-N37 |
||
Key features | Performance (at 1GHz) |
10. 5. 2. 1 TOP/s | 4 TOP/s | 2 TOP/s | 1 TOP/s |
MAC/Cycle (8x8) | 4096, 2048, 1024, 512 | 2048 | 1024 |
512 | |
Efficient convolution |
Winograd support delivers 2.25x peak performance over baseline | ||||
Configurability | 90+ design options | Single product offering | |||
Network support | CNN and RNN | ||||
Data types |
Int-8 and Int-16 | ||||
Sparsity | Yes | ||||
Secure mode |
TEE or SEE | ||||
Multicore capability |
8 NPUs in a cluster 64 NPUs in a mesh |
||||
Memory system | Embedded SRAM | 384KB – 4MB | 1-4 MB | 512 KB | 512 KB |
Bandwidth reduction | Enhanced compression | Extended compression technology, layer/operator fusion, clustering, and workload tilling | |||
Main interface | 1xAXI4 (128-bit), ACE-5 Lite | ||||
Development platform | Neural frameworks | TensorFlow, TensorFlow Lite, Caffe2, PyTorch, MXNet, ONNX | |||
Inference deployment | Ahead of time compiled with TVM Online interpreted with Arm NN Android Neural Networks API (NNAPI) |
||||
Software components | Arm NN, compiler and support library, driver | ||||
Debug and profile | Heterogeneous layer-by-layer visibility in Arm Development Studio Streamline | ||||
Evaluation and early prototyping | Ethos-N Static Performance Analyzer (SPA), Arm Juno FPGA systems, Cycle Models |