Specification

Technical Overview

A common developer experience across accelerator architectures

oneAPI is an open, multiarchitecture, multivendor programming model that lets developers use a single modern codebase across accelerators for faster application performance, more productivity, and greater innovation. The oneAPI specification includes six core elements for creating parallel applications.

Parallel application development is a combination of API programming, where the parallel algorithm is hidden behind an API provided by the system, and direct programming, where the application programmer writes the parallel algorithm.

API programming

When using API programming, a developer implements performance-critical sections of the program with library calls. Well-defined and mature problem domains have high-performance solutions packaged as libraries.

oneAPI defines a set of APIs for the most commonly used data-parallel domains.
oneAPI platforms provide library implementations across a variety of accelerators.

Where possible, the API is based on established standards like BLAS. API programming enables programmers to achieve high performance across a diverse set of accelerators with minimal coding and tuning.

Direct programming

Some problem domains are not well suited to API programming because no standard solution exists or because solutions require a level of customization that cannot be easily implemented in a library. In this case, a developer uses direct programming and must explicitly code the parallel algorithm.

oneAPI’s programming model is based on data parallelism, where the same computation is performed on each data element, and parallelism of the application scales as the data scales.

By allowing programmers to express parallelism directly, data-parallel algorithms make it possible to productively create highly efficient algorithms for parallel architectures.

Data-parallel algorithms are used for many of the most computationally demanding problems, including scientific computing, artificial intelligence, and visualization, and can be efficiently mapped to a diverse set of architectures: multi-core CPUs, GPUs, systolic arrays, and FPGAs.

Start building and contributing

The UXL Foundation encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

Join UXL to participate in working groups and special interest groups evolving and expanding the oneAPI specification and oneAPI open source projects for accelerated computing.

Specification elements Become a member

Useful links

Learn more and get started

Project GitHub Documentation Slack Space Working Groups

Start building with open source tools that use oneAPI elements

All open source

dpNP

NumPy-like API accelerated with SYCL

Data Parallel Extension for NumPy (dpnp) is a Python library that implements a subset of NumPy and can be executed on any data-parallel device. The subset is a drop-in replacement of core NumPy functions and numerical data types. dpnp is the core part of a larger family of data-parallel Python libraries and tools for programming on XPUs.

DPCTL

Python bindings for SYCL classes

Data Parallel Control (DPCTL) is a Python library that allows users to control the execution placement of a compute kernel on an XPU. The library is built on the SYCL standard and provides Python bindings for a subset of the standard runtime classes, allowing users to query platforms, discover and represent devices and sub-devices, and construct contexts and queues. The library helps authors of Python-native extensions written in C, Cython, or pybind11 access DPCTL objects representing SYCL devices, queues, memory, and tensors.

DPC++

C++ compiler (clang++/LLVM based) with support for SYCL

DPC++ is an LLVM-based compiler project that implements compiler and runtime support for the latest SYCL standard. The project is hosted in the SYCL branch and synced with the tip of the LLVM upstream main branch on a regular basis.

PyTorch

Deep learning framework

PyTorch is a Python package that provides tensor computation (like NumPy) with strong GPU acceleration and deep neural networks built on a tape-based autograd system. You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

oneDPL

Apache MXNet

Deep learning framework

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph-optimization layer on top of that makes symbolic execution fast and memory-efficient. MXNet is portable, lightweight, and scalable to many GPUs and machines. *NOTE: This project has retired, and the repository is now read-only.

Ginkgo

High-performance linear algebra library, integrated into significant applications in the scientific domain such as deal.II, MFEM, OpenFOAM, HyTeG, Sundials, XGC, HiOp, and OpenCARP.

Ginkgo is a high-performance linear algebra library for many core systems, with a focus on solutions for sparse linear systems. It is implemented using modern C++ (you will need at least a C++17 compliant compiler to build it), with GPU kernels implemented in CUDA for NVIDIA devices, HIP for AMD devices, and SYCL/DPC++ for Intel devices and other supported hardware.

hipSYCL++

Library-based implementation of SYCL

AdaptiveCpp (formerly known as hipSYCL / Open SYCL) is an independent, community-driven compiler for C++-based heterogeneous programming models targeting CPUs and GPUs from all major vendors. AdaptiveCpp lets applications adapt themselves to all the hardware found in the system, even at runtime.

Explore implementations

All Case Studies