Skip to Main Content
EventHero

Blog

Deeper Learning with oneDNN 3.7: Latest Neural Network Library Release

March 31, 2025

The latest version 3.7 of the oneAPI Deep Neural Network Library (oneDNN) has been released, an evolution of deep learning performance optimization. The release introduces substantial performance enhancements for Intel, ARM, NVIDIA, and other platforms.

 

As an open-source cross-platform performance library, oneDNN provides essential building blocks for deep learning applications across diverse hardware architectures. oneDNN is a critical component in the modern deep learning ecosystem and is used by frameworks including PyTorch and TensorFlow. Version 3.7 brings new capabilities to developers.

 

Why oneDNN?

 

oneDNN (oneAPI Deep Neural Network Library) serves as a foundational layer for many popular deep learning frameworks and applications. As part of the UXL Foundation and implementing the oneAPI specification, oneDNN provides highly optimized primitives for deep neural network operations that can be leveraged by higher-level software, including ONNX, MATLAB, PaddlePaddle, and Apache MXNet.

 

While initially developed with Intel hardware in mind, the library has expanded its scope to support multiple architectures, including ARM, RISC-V, POWER, AMD and NVIDIA. This cross-platform approach aligns with the UXL Foundation’s goal of creating a unified programming interface that delivers consistent performance regardless of the underlying hardware.

 

 

Key Enhancements in oneDNN 3.7

 

The latest release focuses on several areas of improvement:

 

 

These changes make it easier for developers to identify and resolve performance issues, validate their implementations, and adopt the latest features of the library. For developers and organizations building deep learning applications, the improvements in oneDNN 3.7 offer opportunities to extract maximum performance from available hardware while maintaining portability across platforms. As models continue to grow in size and complexity, libraries like oneDNN become increasingly important.

 

With its integration into popular frameworks and applications, the improvements in oneDNN 3.7 will benefit a wide range of users, from research institutions to enterprise deployments. oneDNN 3.7 introduces significant performance improvements for Intel processors, including:

 

 

These optimizations ensure that deep learning workloads can fully leverage the advanced instruction sets available in modern Intel processors, delivering significant performance gains for compute-intensive operations.

 

The release also incorporates many improvements for Intel Graphics cards including the upcoming Xe3 architecture and Arc GPUs for convolution operations..

 

oneDNN 3.7 extends support beyond Intel hardware with significant improvements for other architectures:

 

 

oneDNN 3.7 introduces new functionalities:

 

 

Technical Deep Dive

 

1. Precision and Performance Tradeoffs

 

One of the most notable aspects of oneDNN 3.7 is its expanded support for lower-precision operations. The introduction of 4-bit floating-point data types and enhanced INT8 operations reflect the industry’s continuing move toward precision-optimized computing, where the appropriate numerical precision is used for each task to maximize performance without sacrificing model accuracy.

 

This approach is particularly evident in the improved support for BF16 (Brain Floating Point) operations across multiple platforms. BF16 has emerged as an important format for deep learning, offering a balance between the precision of FP32 and the performance benefits of reduced precision.

 

2. Graph API Enhancements

 

The Graph API receives significant attention in this release with optimizations for several important neural network patterns:

 

 

These optimizations are particularly relevant for transformer-based models that rely heavily on attention mechanisms and multi-layer perceptron (MLP) blocks. By optimizing these specific patterns, oneDNN 3.7 can deliver substantial performance improvements for modern large language models and other transformer architectures.

 

3. Memory Management Improvements

 

oneDNN 3.7 brings important usability enhancements in memory management, particularly for the SYCL runtime. Memory objects on CPU engines are now reference-counted, eliminating the need to explicitly maintain their lifetime during primitive execution. This change aligns the behavior between CPU and GPU engines, simplifying development and reducing the risk of memory-related errors.

 

Additionally, improved support for large tensor sizes in convolution, matmul, and reduction primitives on Intel GPUs addresses the growing demands of larger models and datasets.

 

4. Usability and Developer Experience

 

Intel has made several improvements to enhance the developer experience:

 

 

GitHub – uxlfoundation/oneDNN: oneAPI Deep Neural Network Library (oneDNN)