Column object

A conforming implementation of the dataframe API standard must provide and support a column object having the following methods, attributes, and behavior.

class Column(*args, **kwargs)

Column object.

Note that this column object is not meant to be instantiated directly by users of the library implementing the dataframe API standard. Rather, use constructor functions or an already-created dataframe object retrieved via DataFrame.col().

The parent dataframe (which can be retrieved via the parent_dataframe() property) plays a key role here:

  • If two columns were retrieved from the same dataframe, then they can be combined and compared at will.

  • If two columns were retrieved from different dataframes, then there is no guarantee about how or whether they can be combined and compared, this may vary across implementations.

  • If two columns are both “free-standing” (i.e. not retrieved from a dataframe but constructed directly from a 1D array or sequence), then they can be combined and compared with each other. Note, however, that there’s no guarantee about whether they can be compared or combined with columns retrieved from a different dataframe, this may vary across implementations.

__abstractmethods__ = frozenset({})
__add__(other: Self | AnyScalar) Self

Add other column or scalar to this column.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

Returns:

Column

__and__(other: Self | bool | Scalar) Self

Apply logical ‘and’ to other Column (or scalar) and this Column.

Nulls should follow Kleene Logic.

Parameters:

other (Self or bool) – If Column, must have same length.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

Raises:

ValueError – If self or other is not boolean.

__column_namespace__() Namespace

Return an object that has all the Dataframe Standard API functions on it.

Returns:

namespace (Any) – An object representing the dataframe API namespace. It should have every top-level function defined in the specification as an attribute. It may contain other public names as well, but it is recommended to only include those names that are part of the specification.

__divmod__(other: Self | AnyScalar) tuple[Column, Column]

Return quotient and remainder of integer division. See divmod builtin.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__eq__(other: Self | AnyScalar) Self

Compare for equality.

Nulls should follow Kleene Logic.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__floordiv__(other: Self | AnyScalar) Self

Floor-divide other column or scalar to this column.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__ge__(other: Self | AnyScalar) Self

Compare for “greater than or equal to” other.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__gt__(other: Self | AnyScalar) Self

Compare for “greater than” other.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__init__(*args, **kwargs)
__invert__() Self

Invert truthiness of (boolean) elements.

Raises:

ValueError – If any of the Column’s columns is not boolean.

__iter__() NoReturn

Iterate over elements.

This is intentionally “poisoned” to discourage inefficient code patterns.

Raises:

NotImplementedError

__le__(other: Self | AnyScalar) Self

Compare for “less than or equal to” other.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__lt__(other: Self | AnyScalar) Self

Compare for “less than” other.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__mod__(other: Self | AnyScalar) Self

Return modulus of this column by other (% operator).

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__mul__(other: Self | AnyScalar) Self

Multiply other column or scalar with this column.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__ne__(other: Self | AnyScalar) Self

Compare for non-equality.

Nulls should follow Kleene Logic.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__or__(other: Self | bool | Scalar) Self

Apply logical ‘or’ to other Column (or scalar) and this column.

Nulls should follow Kleene Logic.

Parameters:

other (Self or Scalar) – If Column, must have same length.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

Raises:

ValueError – If self or other is not boolean.

__parameters__ = ()
__pow__(other: Self | AnyScalar) Self

Raise this column to the power of other.

Integer dtype to the power of non-negative integer dtype is integer dtype. Integer dtype to the power of float dtype is float dtype. Float dtype to the power of integer dtype or float dtype is float dtype.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__radd__(other: Self | AnyScalar) Self
__rand__(other: Self | bool) Self
__rfloordiv__(other: Self | AnyScalar) Self
__rmod__(other: Self | AnyScalar) Self
__rmul__(other: Self | AnyScalar) Self
__ror__(other: Self | bool) Self

Return value|self.

__rpow__(other: Self | AnyScalar) Self
__rsub__(other: Self | AnyScalar) Self
__rtruediv__(other: Self | AnyScalar) Self
__sub__(other: Self | AnyScalar) Self

Subtract other column or scalar from this column.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

__truediv__(other: Self | AnyScalar) Self

Divide this column by other column or scalar. True division, returns floats.

Parameters:

other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Returns:

Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

all(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a bool.

Raises:

ValueError – If column is not boolean.

any(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a bool.

Raises:

ValueError – If column is not boolean.

cast(dtype: DType) Self

Cast specified columns to specified dtypes.

The following is not specified and may vary across implementations:

  • Cross-kind casting (e.g. integer to string, or to float)

  • Behaviour in the case of overflows

property column: Any

Return underlying (not-necessarily-Standard-compliant) column.

If a library only implements the Standard, then this can return self.

cumulative_max() Self

Reduction returns a Column.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

cumulative_min() Self

Reduction returns a Column.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

cumulative_prod() Self

Reduction returns a Column.

Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.

cumulative_sum() Self

Reduction returns a Column.

Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.

day() Self

Return ‘day’ component of each element of Date and Datetime columns.

For example, return 2 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

property dtype: DType

Return data type of column.

fill_nan(value: float | NullType | Scalar, /) Self

Fill floating point nan values with the given fill value.

Parameters:

value (float or null) – Value used to replace any nan in the column with. Must be of the Python scalar type matching the dtype of the column (or be null).

fill_null(value: AnyScalar, /) Self

Fill null values with the given fill value.

Parameters:

value (Scalar) – Value used to replace any null values in the column with. Must be of the Python scalar type matching the dtype of the column.

filter(mask: Self) Self

Select a subset of rows corresponding to a mask.

Parameters:

mask (Self) –

Returns:

Column

Notes

Some participants preferred a weaker type Arraylike[bool] for mask, where ‘Arraylike’ denotes an object adhering to the Array API standard.

get_value(row_number: int) Scalar

Select the value at a row number, similar to ndarray.__getitem__(<int>).

Parameters:

row_number (int) – Row number of value to return.

Returns:

Scalar – Depends on the dtype of the Column, and may vary across implementations.

hour() Self

Return ‘hour’ component of each element of Date and Datetime columns.

For example, return 12 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

is_in(values: Self) Self

Indicate whether the value at each row matches any value in values.

Parameters:

values (Self) – Contains values to compare against. May include float('nan') and null, in which case 'nan' and null will respectively return True even though float('nan') == float('nan') isn’t True. The dtype of values must match the current column’s dtype.

Returns:

Column

is_nan() Self

Check for nan entries.

Returns:

Column

See also

is_null

Notes

This only checks for ‘NaN’. Does not include ‘missing’ or ‘null’ entries. In particular, does not check for np.timedelta64('NaT').

is_null() Self

Check for ‘missing’ or ‘null’ entries.

Returns:

Column

See also

is_nan

Notes

Does not include NaN-like entries. May optionally include ‘NaT’ values (if present in an implementation), but note that the Standard makes no guarantees about them.

iso_weekday() Self

Return ISO weekday for each element of Date and Datetime columns.

Note that Monday=1, …, Sunday=7.

Return column should be of integer dtype (signed or unsigned).

len() Scalar

Return the number of rows.

max(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

mean(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

median(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

microsecond() Self

Return number of microseconds since last second.

For example, return 123456 for 1981-01-02T12:34:56.123456.

Only supported for Date and Datetime columns. Return column should be of integer dtype (signed or unsigned).

min(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

minute() Self

Return ‘minute’ component of each element of Date and Datetime columns.

For example, return 34 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

month() Self

Return ‘month’ component of each element of Date and Datetime columns.

For example, return 1 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

n_unique(*, skip_nulls: bool = True) Scalar

Return number of unique values.

Notes

If the original column(s) contain multiple 'NaN' values, then they only count as one distinct value. Likewise for null values (if skip_nulls=False).

property name: str

Return name of column.

property parent_dataframe: DataFrame | None

Return parent DataFrame, if present.

For example, if we have the following

df: DataFrame
column = df.col('a')

then column.parent_dataframe should return df.

On the other hand, if we had:

column = column_from_1d_array(...)

then column.parent_dataframe should return None.

persist() Self

Hint that computation prior to this point should not be repeated.

This is intended as a hint, rather than as a directive. Implementations which do not separate lazy vs eager execution may ignore this method and treat it as a no-op.

Note

This method may trigger execution. If necessary, it should be called at most once per dataframe, and as late as possible in the pipeline.

prod(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Must be supported for numerical data types. The returned value has the same dtype as the column.

rename(name: str | Scalar) Self

Rename column.

Parameters:

name (str) – New name for column.

Returns:

Column – New column - this does not operate in-place.

second() Self

Return ‘second’ component of each element.

For example, return 56 for 1981-01-02T12:34:56.123456.

Only supported for Date and Datetime columns. Return column should be of integer dtype (signed or unsigned).

shift(offset: int | Scalar) Self

Shift values by offset positions, filling missing values with null.

For example, if the original column contains values [1, 4, 2], then:

  • .shift(1) will return [null, 1, 4],

  • .shift(-1) will return [4, 2, null],

Parameters:

offset (int) – How many positions to shift by.

slice_rows(start: int | None, stop: int | None, step: int | None) Self

Select a subset of rows corresponding to a slice.

Parameters:
  • start (int or None) –

  • stop (int or None) –

  • step (int or None) –

Returns:

Column

sort(*, ascending: bool = True, nulls_position: Literal['first', 'last'] = 'last') Self

Sort column.

If you need the indices which would sort the column, use sorted_indices.

Parameters:
  • ascending (bool) – If True, sort in ascending order. If False, sort in descending order.

  • nulls_position ({'first', 'last'}) – Whether null values should be placed at the beginning or at the end of the result. Note that the position of NaNs is unspecified and may vary based on the implementation.

Returns:

Column

sorted_indices(*, ascending: bool = True, nulls_position: Literal['first', 'last'] = 'last') Self

Return row numbers which would sort column.

If you need to sort the Column, use sort().

Parameters:
  • ascending (bool) – If True, sort in ascending order. If False, sort in descending order.

  • nulls_position ({'first', 'last'}) – Whether null values should be placed at the beginning or at the end of the result. Note that the position of NaNs is unspecified and may vary based on the implementation.

Returns:

Column

std(*, correction: float = 1, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

Parameters:
  • correction – Degrees of freedom adjustment. Setting this parameter to a value other than 0 has the effect of adjusting the divisor during the calculation of the standard deviation according to N-correction, where N corresponds to the total number of elements over which the standard deviation is computed. When computing the standard deviation of a population, setting this parameter to 0 is the standard choice (i.e., the provided column contains data constituting an entire population). When computing the corrected sample standard deviation, setting this parameter to 1 is the standard choice (i.e., the provided column contains data sampled from a larger population; this is commonly referred to as Bessel’s correction). Fractional (float) values are allowed. Default: 1.

  • skip_nulls – Whether to skip null values.

sum(*, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.

take(indices: Self) Self

Select a subset of rows, similar to ndarray.take.

Parameters:

indices – Positions of rows to select.

to_array() Any

Convert to array-API-compliant object.

The resulting array will have the corresponding dtype from the Array API:

  • Bool() -> ‘bool’

  • Int8() -> ‘int8’

  • Int16() -> ‘int16’

  • Int32() -> ‘int32’

  • Int64() -> ‘int64’

  • UInt8() -> ‘uint8’

  • UInt16() -> ‘uint16’

  • UInt32() -> ‘uint32’

  • UInt64() -> ‘uint64’

  • Float32() -> ‘float32’

  • Float64() -> ‘float64’

Null values are not supported and must be filled prior to conversion.

Returns:

Any – An array-API-compliant object.

Notes

While numpy arrays are not yet array-API-compliant, implementations may choose to return a numpy array (for numpy prior to 2.0), with the understanding that consuming libraries would then use the array-api-compat package to convert it to a Standard-compliant array.

unique_indices(*, skip_nulls: bool | Scalar = True) Self

Return indices corresponding to unique values in Column.

Returns:

Column – Indices corresponding to unique values.

Notes

There are no ordering guarantees. In particular, if there are multiple indices corresponding to the same unique value, there is no guarantee about which one will appear in the result. If the original Column contains multiple 'NaN' values, then only a single index corresponding to those values will be returned. Likewise for null values (if skip_nulls=False). To get the unique values, you can do col.take(col.unique_indices()).

unix_timestamp(*, time_unit: str | Scalar = 's') Self

Return number of seconds / milliseconds / microseconds since the Unix epoch.

The Unix epoch is 00:00:00 UTC on 1 January 1970.

Parameters:

time_unit – Time unit to use. Must be one of ‘s’, ‘ms’, or ‘us’.

Returns:

Column – Integer data type. For example, if the date is 1970-01-02T00:00:00.123456, and the time_unit is 's', then the result should be 86400, and not 86400.123456. Information smaller than the given time unit should be discarded.

var(*, correction: float | Scalar = 1, skip_nulls: bool | Scalar = True) Scalar

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

Parameters:
  • correction – Correction to apply to the result. For example, 0 for sample standard deviation and 1 for population standard deviation. See Column.std for a more detailed description.

  • skip_nulls – Whether to skip null values.

year() Self

Return ‘year’ component of each element of Date and Datetime columns.

For example, return 1981 for 1981-01-02T12:34:56.123456.

Return column should be of (signed) integer dtype.