Developer Resources

Open Source Tools

Start building today using oneAPI open source tools

oneAPI allows developers to choose accelerators based on what works best for their overall solution. Global support for industry-leading technology makes open source oneAPI a sure path for the future, enabling organizations to migrate their solutions to diverse hardware and move forward with confidence. The following tools, which you can use to build your own applications and projects, leverage oneAPI open source elements.

Languages

Julia

Compiler for data parallel programming in Julia

oneAPI.jl provides Julia support for the oneAPI unified programming model. The package is verified on the current implementation of this interface, which is part of Intel Compute Runtime, and is available only on Linux. The current version supports most of the oneAPI Level Zero interface, has good kernel programming capabilities, and fully implements the GPUArrays.jl array interfaces. This results in a full-featured GPU array type.

DPCTL

Python bindings for SYCL classes

Data Parallel Control (DPCTL) is a Python library that allows users to control the execution placement of a compute kernel on an XPU. The library is built on the SYCL standard and provides Python bindings for a subset of the standard runtime classes, allowing users to query platforms, discover and represent devices and sub-devices, and construct contexts and queues. The library helps authors of Python-native extensions written in C, Cython, or pybind11 access DPCTL objects representing SYCL devices, queues, memory, and tensors.

Numba

Compiler for data parallel programming in Python

Data Parallel Extension for Numba (numba-dpex) is an open source stand-alone extension for the Numba Python JIT compiler. Numba-dpex provides a SYCL-like API for kernel programming Python. The API allows expressing portable data-parallel kernels in Python and then JIT-compiling them for different hardware targets, such as CPUs and integrated and discrete GPUs.

hipSYCL++

Library-based implementation of SYCL

AdaptiveCpp (formerly known as hipSYCL / Open SYCL) is an independent, community-driven compiler for C++-based heterogeneous programming models targeting CPUs and GPUs from all major vendors. AdaptiveCpp lets applications adapt themselves to all the hardware found in the system, even at runtime.

DPC++

C++ compiler (clang++/LLVM based) with support for SYCL

DPC++ is an LLVM-based compiler project that implements compiler and runtime support for the latest SYCL standard. The project is hosted in the SYCL branch and synced with the tip of the LLVM upstream main branch on a regular basis.

Deep Learning

PaddlePaddle

Deep learning framework

PaddlePaddle is an industrial platform with advanced technologies and rich features that cover core deep learning frameworks, basic model libraries, end-to-end development kits, tools and components, as well as service platforms.

Apache MXNet

Deep learning framework

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph-optimization layer on top of that makes symbolic execution fast and memory-efficient. MXNet is portable, lightweight, and scalable to many GPUs and machines. *NOTE: This project has retired, and the repository is now read-only.

ONNX Runtime

Deep learning framework

ONNX Runtime is a cross-platform inference and training machine-learning accelerator that enables faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, and others. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. ONNX Runtime training can accelerate the model training time on multinode NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts.

PyTorch

Deep learning framework

PyTorch is a Python package that provides tensor computation (like NumPy) with strong GPU acceleration and deep neural networks built on a tape-based autograd system. You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

oneDPL

TensorFlow

Deep learning framework

TensorFlow is an end-to-end open source platform for machine learning (ML). It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications. TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

oneCCL

oneAPI Collective Communications Library

oneAPI Collective Communications Library (oneCCL) provides an efficient implementation of communication patterns used in deep learning. It is integrated into the Horovod distributed training framework and PyTorch machine learning framework. oneCCL is governed by the UXL Foundation and is an implementation of the oneAPI specification.

oneAPI

oneDNN

oneAPI Deep Neural Network Library

oneAPI Deep Neural Network Library (oneDNN) is an open source cross-platform performance library of basic building blocks for deep learning applications. It is intended for deep learning applications and framework developers interested in improving application performance on CPUs and GPUs. oneDNN is governed by the UXL Foundation and is an implementation of the oneAPI specification.

oneAPI

Data Science

Scikit-learn-intelex

Accelerated Scikit-learn

The extension for Scikit-learn is a free software AI accelerator designed to deliver over 10-100X acceleration for your existing scikit-learn code. The software acceleration is achieved with vector instructions, AI-specific memory optimizations, threading, and optimizations.

Modin

Accelerated pandas

Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs out of memory. Modin also comes with additional APIs to improve the user experience.

oneDAL

oneAPI Data Analytics Library

oneAPI Data Analytics Library (oneDAL) is a C++ and DPC++ library (powering the Extension for Scikit-learn in Python) that implements accelerated machine learning routines for tabular data (e.g., linear regression, K-means clustering, random forests) for CPUs, GPUs, and multinode distributed setups. Acceleration on CPUs is achieved by leveraging SIMD instructions and exploiting cache structures of modern hardware, while GPU acceleration leverages the SYCL framework and the oneMKL library. OneDAL is governed by the UXL Foundation and is an implementation of the oneAPI specification.

oneAPI

Libraries

MPICH

High-performance implementation of MPI

MPICH is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard from the Argonne National Laboratory. This implementation provides all MPI functions and features required by the standard with comprehensive support for parallel computing applications.

Level Zero

Low-level runtime for oneAPI

The Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver is an open source project providing compute API support (Level Zero, OpenCL) for Intel graphics hardware architectures (Intel® HD Graphics, Intel® Iris® Xe Graphics).

dpNP

NumPy-like API accelerated with SYCL

Data Parallel Extension for NumPy (dpnp) is a Python library that implements a subset of NumPy and can be executed on any data-parallel device. The subset is a drop-in replacement of core NumPy functions and numerical data types. dpnp is the core part of a larger family of data-parallel Python libraries and tools for programming on XPUs.

Ginkgo

High-performance linear algebra library, integrated into significant applications in the scientific domain such as deal.II, MFEM, OpenFOAM, HyTeG, Sundials, XGC, HiOp, and OpenCARP.

Ginkgo is a high-performance linear algebra library for many core systems, with a focus on solutions for sparse linear systems. It is implemented using modern C++ (you will need at least a C++17 compliant compiler to build it), with GPU kernels implemented in CUDA for NVIDIA devices, HIP for AMD devices, and SYCL/DPC++ for Intel devices and other supported hardware.

oneMath

oneAPI Math Kernel Library

oneMath is an open source implementation of the oneMath specification. It can work with multiple devices using multiple libraries (backends) underneath. The oneMath project was previously referred to as oneMKL Interfaces. oneMath is governed by the UXL Foundation and is an implementation of the oneAPI specification.

oneAPI

oneTBB

oneAPI Threading Building Blocks

oneTBB is a flexible C++ library that simplifies adding parallelism to complex applications, even if you are not a threading expert. The library lets you easily write parallel programs that take full advantage of the multicore performance. Such programs are portable, composable, and have future-proof scalability. oneTBB provides functions, interfaces, and classes to parallelize and scale your code. All you have to do is use the templates. oneTBB is governed by the UXL Foundation and is an implementation of the oneAPI specification.

oneAPI

oneDPL

oneAPI Data Parallel C++ Library

oneDPL works with the Intel® oneAPI DPC++/C++ Compiler to provide high-productivity APIs to developers, which can minimize Data Parallel C++ (DPC++) programming efforts across devices for high performance parallel applications.

oneAPI

Tools

HPCToolkit

Profiling toolkit from Rice University

HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to GPU-accelerated supercomputers. HPCToolkit provides accurate measurements of a program’s work, resource consumption, and inefficiency, correlates these metrics with the program’s source code, works with multilingual, fully optimized binaries, has very low measurement overhead, and scales to large parallel systems. HPCToolkit’s measurements provide support for analyzing a program’s execution cost, inefficiency, and scaling characteristics both within and across nodes of a parallel system.