Skip to content

DataFrame

To write a dataframe-agnostic function, the steps you'll want to follow are:

  1. Opt-in to the Dataframe API by calling __dataframe_consortium_standard__ on your dataframe.
  2. Express your logic using methods from the Dataframe API You may want to look at the official examples for inspiration.
  3. If you need to return a dataframe to the user in its original library, call DataFrame.dataframe.

Let's try writing a simple example.

Example 1: group-by and mean

Make a Python file t.py with the following content:

def func(df):
    # 1. Opt-in to the API Standard
    df_s = df.__dataframe_consortium_standard__(api_version='2023.11-beta')
    # 2. Use methods from the API Standard spec
    df_s = df_s.group_by('a').mean()
    # 3. Return a library from the user's original library
    return df_s.dataframe
Let's try it out:

import pandas as pd

df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
   a    b
0  1  4.5
1  2  6.0

import polars as pl

df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

AGGREGATE
    [col("b").mean()] BY [col("a")] FROM
  DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"

If you look at the two outputs, you'll see that:

  • For pandas, the output is a pandas.DataFrame.
  • But for Polars, the output is a polars.LazyFrame.

This is because the Dataframe API only has a single DataFrame class - so for Polars, all operations are done lazily in order to make full use of Polars' query engine. If you want to convert that to a polars.DataFrame, it is the caller's responsibility to call .collect. Check the modified example below:

import pandas as pd

df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
   a    b
0  1  4.5
1  2  6.0

import polars as pl

df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df).collect())
shape: (2, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  f64 
╞═════╪═════╡
 1    4.5 
 2    6.0 
└─────┴─────┘