Use cases¶
Use cases inform the requirements for, and design choices made in, this array API standard. This section first discusses what types of use cases are considered, and then works out a few concrete use cases in more detail.
Types of use cases¶
Packages that depend on a specific array library currently, and would like to support multiple of them (e.g. for GPU or distributed array support, for improved performance, or for reaching a wider user base).
Writing new libraries/tools that wrap multiple array libraries.
Projects that implement new types of arrays with, e.g., hardware-specific optimizations or auto-parallelization behavior, and need an API to put on top that is familiar to end users.
End users that want to switch from one library to another without learning about all the small differences between those libraries.
Concrete use cases¶
Use case 1: add hardware accelerator and distributed support to SciPy
Use case 4: make JIT compilation of array computations easier and more robust
Use case 1: add hardware accelerator and distributed support to SciPy¶
When surveying a representative set of advanced users and research software engineers in 2019 (for this NSF proposal), the single most common pain point brought up about SciPy was performance.
SciPy heavily relies on NumPy (its only non-optional runtime dependency). NumPy provides an array implementation that’s in-memory, CPU-only and single-threaded. Common performance-related wishes users have are:
parallel algorithms (can be multi-threaded or multiprocessing based)
support for distributed arrays (with Dask in particular)
support for GPUs and other hardware accelerators (shortened to just “GPU” in the rest of this use case)
Some parallelism can be supported in SciPy, it has a workers
keyword
(similar to scikit-learn’s n_jobs
keyword) that allows specifying to use
parallelism in some algorithms. However SciPy itself will not directly start
depending on a GPU or distributed array implementation, or contain (e.g.)
CUDA code - that’s not maintainable given the resources for development.
However, there is a way to provide distributed or GPU support. Part of the
solution is provided by NumPy’s “array protocols” (see gh-1), that allow
dispatching to other array implementations. The main problem then becomes how
to know whether this will work with a particular distributed or GPU array
implementation - given that there are zero other array implementations that
are even close to providing full NumPy compatibility - without adding that
array implementation as a dependency.
It’s clear that SciPy functionality that relies on compiled extensions (C, C++, Cython, Fortran) directly can’t easily be run on another array library than NumPy (see C API for more details about this topic). Pure Python code can work though. There’s two main possibilities:
Testing with another package, manually or in CI, and simply provide a list of functionality that is found to work. Then make ad-hoc fixes to expand the set that works.
Start relying on a well-defined subset of the NumPy API (or a new NumPy-like API), for which compatibility is guaranteed.
Option (2) seems strongly preferable, and that “well-defined subset” is what an API standard should provide. Testing will still be needed, to ensure there are no critical corner cases or bugs between array implementations, however that’s then a very tractable task.
As a concrete example, consider the spectral analysis functions in scipy.signal
.
All of those functions (e.g., periodogram
, spectrogram
, csd
, welch
, stft
,
istft
) are pure Python - with the exception of lombscargle
which is ~40
lines of Cython - and uses NumPy function calls, array attributes and
indexing. The beginning of each function could be changed to retrieve the
module that implements the array API standard for the given input array type,
and then functions from that module could be used instead of NumPy functions.
If the user has another array type, say a CuPy or PyTorch array x
on their
GPU, doing:
from scipy import signal
signal.welch(x)
will result in:
# For CuPy
ValueError: object __array__ method not producing an array
# For PyTorch
TypeError: can't convert cuda:0 device type tensor to numpy.
and therefore the user will have to explicitly convert to and from a
numpy.ndarray
(which is quite inefficient):
# For CuPy
x_np = cupy.asnumpy(x)
freq, Pxx = (cupy.asarray(res) for res in signal.welch(x_np))
# For PyTorch
x_np = x.cpu().numpy()
# Note: ends up with tensors on CPU, may still have to move them back
freq, Pxx = (torch.tensor(res) for res in signal.welch(x_np))
This code will look a little different for each array library. The end goal here is to be able to write this instead as:
freq, Pxx = signal.welch(x)
and have freq
, Pxx
be arrays of the same type and on the same device as x
.
Note
This type of use case applies to many other libraries, from scikit-learn and scikit-image to domain-specific libraries like AstroPy and scikit-bio, to code written for a single purpose or user.
Use case 2: simplify einops by removing the backend system¶
einops is a library that provides flexible tensor operations and supports many array libraries (NumPy, TensorFlow, PyTorch, CuPy, MXNet, JAX).
Most of the code in einops
is:
einops.py contains the functions it offers as public API (
rearrange
,reduce
,repeat
)._backends.py contains the glue code needed to support that many array libraries.
The amount of code in each of those two files is almost the same (~550 LoC each).
The typical pattern in einops.py
is:
def some_func(x):
...
backend = get_backend(x)
shape = backend.shape(x)
result = backend.reduce(x)
...
With a standard array API, the _backends.py
glue layer could almost completely disappear,
because the purpose it serves (providing a unified interface to array operations from each
of the supported backends) is already addressed by the array API standard.
Hence the complete einops
code base could be close to 50% smaller, and easier to maintain or add to.
Note
Other libraries that have a similar backend system to support many array libraries include TensorLy, the (now discontinued) multi-backend version of Keras, Unumpy and EagerPy. Many end users and organizations will also have such glue code - it tends to be needed whenever one tries to support multiple array types in a single API.
Use case 3: adding a Python API to xtensor¶
xtensor is a C++ array library that is NumPy-inspired and provides lazy arrays. It has Python (and Julia and R) bindings, however it does not have a Python array API.
Xtensor aims to follow NumPy closely, however it only implements a subset of functionality and documents some API differences in Notable differences with NumPy.
Note that other libraries document similar differences, see for example this page for JAX and this page for TensorFlow.
Each time an array library author designs a new API, they have to choose (a) what subset of NumPy makes sense to implement, and (b) where to deviate because NumPy’s API for a particular function is suboptimal or the semantics don’t fit their execution model.
This array API standard aims to provide an API that can be readily adopted, without having to make the above-mentioned choices.
Note
XND is another array library, written in C, that still needs a Python API. Array implementations in other languages are often in a similar situation, and could translate this array API standard 1:1 to their language.
Use case 4: make JIT compilation of array computations easier and more robust¶
Numba is a Just-In-Time (JIT) compiler for
numerical functions in Python; it is NumPy-aware. PyPy
is an implementation of Python with a JIT at its core; its NumPy support relies
on running NumPy itself through a compatibility layer (cpyext
), while a
previous attempt to implement NumPy support directly was unsuccessful.
Other array libraries may have an internal JIT (e.g., TensorFlow, PyTorch, JAX, MXNet) or work with an external JIT like XLA or VTA.
Numba currently has to jump through some hoops to accommodate NumPy’s casting rules and may not attain full compatibility with NumPy in some cases - see, e.g., this or this example issue regarding (array) scalar return values.
An explicit suggestion from a Numba developer for this array API standard was:
for JIT compilers (e.g. Numba) it will be important, that the type of the returned value(s) depends only on the types of the input but not on the values.
A concrete goal for this use case is to have better matching between JIT-compiled and non-JIT execution. Here is an example from the Numba code base, the need for which should be avoided in the future:
def check(x, y):
got = cfunc(x, y)
np.testing.assert_array_almost_equal(got, pyfunc(x, y))
# Check the power operation conserved the input's dtype
# (this is different from Numpy, whose behaviour depends on
# the *values* of the arguments -- see PyArray_CanCastArrayTo).
self.assertEqual(got.dtype, x.dtype)