2023 release of the Array API Standard

The 2023 revision of the array API standard has been finalized and is ready for adoption by conforming array libraries.

Another year, another revision of the Array API Standard! We’re proud to announce the release of the 2023 revision of the Array API Standard. As was the case for 2022 revision, this release required extensive discussion and collaboration among array libraries and their downstream stakeholders as we continued reaching consensus on unified API design and behavior. We’re particularly excited to share that this year marked a significant milestone in our efforts to facilitate array interoperation within the PyData ecosystem, as we witnessed accelerated adoption of the standard, especially among downstream libraries, such as SciPy and scikit-learn.

Brief Background

For those who are not yet familiar with the Consortium and the Array API Standard, a bit of background. Our aim is to standardize the fundamental building blocks of scientific computation: multi-dimensional arrays (a.k.a. tensors). The PyData ecosystem has a rich set of libraries for working with arrays, including NumPy, CuPy, Dask, PyTorch, JAX, TensorFlow, oneAPI, and beyond. Historically, interoperation among array libraries has been challenging due to divergent API designs and subtle variation in behavior such that code written for one array library cannot be readily ported to another array library. To address these challenges, the Consortium for Python Data API Standards was established to facilitate coordination among array and dataframe library maintainers, sponsoring organizations, and key stakeholders and to provide a transparent and inclusive process–with input from the broader Python community–for standardizing array API design.

Soon after formation of the Consortium in May 2020, we released an initial draft of the array API specification and sought input from the broader PyData ecosystem during an extended community review period. Throughout 2021, we engaged in a tight feedback loop with array API adopters to refine and improve the initial draft specification.

During this time, we reached three key milestones. First, we introduced a data interchange protocol based on DLPack to facilitate zero-copy memory exchange between array libraries. Second, we standardized a core set of API designs for array creation, mutation, and element-wise computation. Third, we introduced “extensions”, which are defined as coherent sets of functionality that are commonly implemented across array libraries, but which conforming array libraries may choose not to implement. The first extension we included in the specification was the linalg extension, which defines a set of linear algebra APIs for computing eigenvalues, performing singular value decomposition, solving a system of linear equations, and other linear algebra operations.

Building on the success of the 2021 revision of the Array API Standard, we worked throughout 2022 on a subsequent specification revision with two key objectives: standardize complex number support and standardize an extension for Fast Fourier Transforms (FFTs). These efforts culminated in the 2022 revision of the Array API Standard, along with significant advancements in tooling to support specification adoption. Importantly, we released 1) a comprehensive portable test suite built on Pytest and Hypothesis for testing Array API Standard compliance and 2) an array compatibility layer which provides a small wrapper around existing array libraries to ensure Array API Standard compliant behavior.

With the 2022 revision out of the way, we summarized our work to date, publishing in SciPy Proceedings the paper “Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem”. Needless to say, it was a busy three years!

2023 Revision

Not wanting to rest on our laurels, immediately after tagging the 2022 release we got busy working on the 2023 revision with a singular goal: eliminate any and all barriers to adoption. While achieving buy-in from array libraries across the ecosystem marked a significant achievement, what is critical for the long-term success of this collective effort is driving adoption among downstream libraries, such as SciPy, scikit-learn, and others, in order to achieve our stated goal of facilitating interoperability among array libraries.

To this end, we solicited feedback from downstream adopters regarding missing APIs, pain points, and general blind spots. During our discussions, we made three key observations. First, for a small subset of APIs, the behavior required by the standard did not match the reality on the ground, and we needed to revise the standard in order to ensure array libraries and their consumers could both achieve compliance and maintain backward compatibility. Second, we noticed a common set of operations which downstream adopters kept needing and for which they were implementing inefficient workarounds, thus making these operations excellent candidates for standardization. And lastly, we found that downstream adopters needed robust and portable mechanisms for inspecting library and device capabilities.

Breaking Changes

To address our first observation, we made two breaking changes to the 2022 revision of the standard. First, we revised the guidance for type promotion in prod, sum, and linalg.trace such that, by default, input arrays having floating-point data types are not upcasted to higher precision. The previous guidance reflected the concern that summation of large arrays having low precision could easily lead to overflow. While this concern is certainly valid for arrays having integer data types (e.g., int8 and int16), this is less of a concern for floating-point data types which can typically handle a larger range of values and have a natural overflow value in infinity.

Second, we revised the guidance for portable input and output data types in FFT APIs. One of the specification’s overriding design principles is requiring users to be explicit about their intent. In the 2022 revision, we failed to fully adhere to this principle in the FFT APIs, leading to ambiguity of acceptable return types and the potential for undesired automatic upcasting of real-valued arrays to complex-valued arrays. We thus sought to correct this deficiency and subsequently backported the changes to the 2022 revision.

New Additions

To address our second observation, we identified and standardized several new APIs to ensure portable behavior among conforming array libraries.

  • clip: clamps each element of an array to a specified range.
  • copysign: composes a floating-point value from a magnitude and sign.
  • cumulative_sum: calculates the cumulative sum.
  • hypot: computes the square root of the sum of squares.
  • maximum: computes the maximum value for each element of an array relative to the respective element in another array.
  • minimum: computes the minimum value for each element of an array relative to the respective element in another array.
  • moveaxis: moves array axes to new positions.
  • repeat: repeats each element of an array a specified number of times.
  • searchsorted: finds insertion positions such that sorted order would be preserved.
  • signbit: determines whether the sign bit is set for each element in an array.
  • tile: constructs an array by tiling another array.
  • unstack: splits an array into a sequence of arrays along a given axis.

Inspection APIs

To address our third observation, we recognized that downstream library adopters needed more robust mechanisms for determining library and associated device capabilities. For libraries such as SciPy and scikit-learn who want to support array objects from multiple libraries, having a set of standardized top-level APIs is not sufficient. In order to devise concise mitigation strategies and gracefully handle varying hardware capabilities, having a means for reliably ascertaining device heterogeneity is critical. Accordingly, we worked to standardize inspection APIs to allow answering the following questions:

  • does a library support boolean indexing and data-dependent output shapes?
  • how can one portably obtain a library’s list of supported devices?
  • what is a library’s default device?
  • what data types does a library support?
  • what are a library’s default data types?
  • what data types does a specific device support?

After considerable discussion and coordination among array libraries and downstream stakeholders, we coalesced around an inspection API namespace

info = xp.__array_namespace_info__()

with the following initial set of APIs:

  • capabilities: returns a dictionary of array library capabilities.
  • default_device: returns the default device.
  • default_types: returns a dictionary containing default data types.
  • dtypes: returns a dictionary containing supported data types specific to a given device.
  • devices: returns a list of supported devices.

While these APIs may seem trivial on their surface, the reality is that array libraries have often lacked easy and portable programmatic access to data type and device information. We thus consider this outcome significant progress, and we’re particularly eager to hear from downstream library authors what other capabilities they would find useful to query.

Facilitating Array API Adoption

As mentioned above, 2023 was all about adoption, and adoption requires buy-in from both array libraries and the downstream consumers of those libraries. Adoption thus faces two key challenges. First, to facilitate development, array libraries need a robust mechanism for determining whether they are specification compliant. Second, while array libraries work to become fully specification compliant, downstream libraries need to be able to target a stable compatibility layer in order to smooth over subtle differences in array library behavior.

Test Suite

To address the first challenge, we’ve continued to develop a comprehensive portable test suite built on Pytest and Hypothesis for testing Array API Standard compliance. In addition to the 2022 revision, the test suite has been updated to support the most recent 2023 revision.

Compatibility Layer

To address the second challenge, we’ve continued work on an array compatibility layer which provides a small wrapper around existing array libraries to ensure Array API Standard compliant behavior. We’re proud to announce that, in addition to support for NumPy, CuPy, and PyTorch, we’ve added support for Dask and JAX.

To get started, install from PyPI

pip install array-api-compat

and take it for a spin! If you encounter any issues, please be sure to let us know over on the library issue tracker.

Adoption Milestones

Array libraries, such as NumPy, CuPy, PyTorch, JAX, and oneAPI, have continued work toward achieving full API compliance, which is a significant milestone in and of itself. But it’s all for naught if array library consumers are not able to reap the benefits of standardization. Needless to say, we’ve seen significant uptake of the Array API Standard among downstream libraries. In particular, both SciPy and sckit-learn have added experimental support, thus enabling support for both CPU and GPU tensors and marking a big win for end users. For the curious reader, we discussed some of the performance benefits in our recent paper published in SciPy Proceedings (2023).

NumPy

One development that is especially noteworthy is the adoption of the Array API Standard in the main namespace of NumPy 2.0. When we originally formed the Consortium and began the work of standardization, we didn’t know exactly how array libraries would prefer to adopt an eventual array API standard. Would they adopt it in their main namespace? Or would they prefer to avoid potentially breaking backward compatibility and implement in a strictly compliant sub-namespace?

We wrote the specification with both possibilities in mind. NumPy and its kin went down the sub-namespace path, while libraries such as PyTorch opted for their main namespace. Well, after a few years of experimentation, the NumPy community decided that they liked the standard so much that relegating a strictly compliant implementation to a sub-namespace was not enough, and subsequently sought to apply the API design principles not just to standardized APIs in their main namespace, but across all of NumPy. This is a significant win for portability, and we’re excited for the benefits NumPy 2.0 will bring to downstream libraries and the PyData ecosystem at large.

The Road Ahead

Phew! That’s a lot, and you’ve made it this far! So what’s in store for 2024?! Glad you asked. Nothing too different from the year before. We’re planning on staying the course, focusing on adoption, and continuing to address the gaps and pain points identified by downstream libraries.

In addition to normal specification work, we’re particularly keen on developing more robust tools for specification compliance and monitoring. Based on feedback we’ve received from downstream libraries, there’s still a lack of transparency around which APIs are supported and what are the potential edge cases. We have some ideas for how to increase visibility and will have more to share in the months to come.

Long story short, we’re excited for the year ahead, and we’d love to get your feedback! To provide feedback on the Array API Standard, please open issues or pull requests on https://github.com/data-apis/array-api, and come participate in our public discussions.

Cheers!

Published by in Consortium and Standardization and tagged APIs, arrays, community, consortium and standard using 2017 words.