Column object¶

A conforming implementation of the dataframe API standard must provide and support a column object having the following methods, attributes, and behavior.

class Column(*args, **kwargs)¶

Column object.

Note that this column object is not meant to be instantiated directly by users of the library implementing the dataframe API standard. Rather, use constructor functions or an already-created dataframe object retrieved via DataFrame.col().

The parent dataframe (which can be retrieved via the parent_dataframe() property) plays a key role here:

If two columns were retrieved from the same dataframe, then they can be combined and compared at will.
If two columns were retrieved from different dataframes, then there is no guarantee about how or whether they can be combined and compared, this may vary across implementations.
If two columns are both “free-standing” (i.e. not retrieved from a dataframe but constructed directly from a 1D array or sequence), then they can be combined and compared with each other. Note, however, that there’s no guarantee about whether they can be compared or combined with columns retrieved from a different dataframe, this may vary across implementations.

__abstractmethods__ = frozenset({})¶

__add__(other: Self | AnyScalar) → Self¶

Add other column or scalar to this column.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

Returns:: Column

__and__(other: Self | bool | Scalar) → Self¶

Apply logical ‘and’ to other Column (or scalar) and this Column.

Nulls should follow Kleene Logic.

Parameters:: other (Self or bool) – If Column, must have same length.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

Raises:: ValueError – If self or other is not boolean.

__column_namespace__() → Namespace¶

Return an object that has all the Dataframe Standard API functions on it.

Returns:: namespace (Any) – An object representing the dataframe API namespace. It should have every top-level function defined in the specification as an attribute. It may contain other public names as well, but it is recommended to only include those names that are part of the specification.

__divmod__(other: Self | AnyScalar) → tuple[Column, Column]¶

Return quotient and remainder of integer division. See divmod builtin.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__eq__(other: Self | AnyScalar) → Self¶

Compare for equality.

Nulls should follow Kleene Logic.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__floordiv__(other: Self | AnyScalar) → Self¶

Floor-divide other column or scalar to this column.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__ge__(other: Self | AnyScalar) → Self¶

Compare for “greater than or equal to” other.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__gt__(other: Self | AnyScalar) → Self¶

Compare for “greater than” other.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__init__(*args, **kwargs)¶

__invert__() → Self¶

Invert truthiness of (boolean) elements.

Raises:: ValueError – If any of the Column’s columns is not boolean.

__iter__() → NoReturn¶

Iterate over elements.

This is intentionally “poisoned” to discourage inefficient code patterns.

Raises:: NotImplementedError –

__le__(other: Self | AnyScalar) → Self¶

Compare for “less than or equal to” other.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__lt__(other: Self | AnyScalar) → Self¶

Compare for “less than” other.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__mod__(other: Self | AnyScalar) → Self¶

Return modulus of this column by other (% operator).

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__mul__(other: Self | AnyScalar) → Self¶

Multiply other column or scalar with this column.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__ne__(other: Self | AnyScalar) → Self¶

Compare for non-equality.

Nulls should follow Kleene Logic.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__or__(other: Self | bool | Scalar) → Self¶

Apply logical ‘or’ to other Column (or scalar) and this column.

Nulls should follow Kleene Logic.

Parameters:: other (Self or Scalar) – If Column, must have same length.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

Raises:: ValueError – If self or other is not boolean.

__parameters__ = ()¶

__pow__(other: Self | AnyScalar) → Self¶

Raise this column to the power of other.

Integer dtype to the power of non-negative integer dtype is integer dtype. Integer dtype to the power of float dtype is float dtype. Float dtype to the power of integer dtype or float dtype is float dtype.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__radd__(other: Self | AnyScalar) → Self¶

__rand__(other: Self | bool) → Self¶

__rfloordiv__(other: Self | AnyScalar) → Self¶

__rmod__(other: Self | AnyScalar) → Self¶

__rmul__(other: Self | AnyScalar) → Self¶

__ror__(other: Self | bool) → Self¶: Return value|self.

__rpow__(other: Self | AnyScalar) → Self¶

__rsub__(other: Self | AnyScalar) → Self¶

__rtruediv__(other: Self | AnyScalar) → Self¶

__sub__(other: Self | AnyScalar) → Self¶

Subtract other column or scalar from this column.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

__subclasshook__()¶

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

__truediv__(other: Self | AnyScalar) → Self¶

Divide this column by other column or scalar. True division, returns floats.

Parameters:: other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Returns:: Column

Notes

other’s parent DataFrame must be the same as self’s - else, the operation is unsupported and may vary across implementations.

all(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a bool.

Raises:: ValueError – If column is not boolean.

any(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a bool.

Raises:: ValueError – If column is not boolean.

cast(dtype: DType) → Self¶

Cast specified columns to specified dtypes.

The following is not specified and may vary across implementations:

Cross-kind casting (e.g. integer to string, or to float)
Behaviour in the case of overflows

property column: Any¶

Return underlying (not-necessarily-Standard-compliant) column.

If a library only implements the Standard, then this can return self.

cumulative_max() → Self¶

Reduction returns a Column.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

cumulative_min() → Self¶

Reduction returns a Column.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

cumulative_prod() → Self¶

Reduction returns a Column.

Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.

cumulative_sum() → Self¶

Reduction returns a Column.

Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.

day() → Self¶

Return ‘day’ component of each element of Date and Datetime columns.

For example, return 2 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

property dtype: DType¶: Return data type of column.

fill_nan(value: float | NullType | Scalar, /) → Self¶

Fill floating point nan values with the given fill value.

Parameters:: value (float or null) – Value used to replace any nan in the column with. Must be of the Python scalar type matching the dtype of the column (or be null).

fill_null(value: AnyScalar, /) → Self¶

Fill null values with the given fill value.

Parameters:: value (Scalar) – Value used to replace any null values in the column with. Must be of the Python scalar type matching the dtype of the column.

filter(mask: Self) → Self¶

Select a subset of rows corresponding to a mask.

Parameters:: mask (Self) –
Returns:: Column

Notes

Some participants preferred a weaker type Arraylike[bool] for mask, where ‘Arraylike’ denotes an object adhering to the Array API standard.

get_value(row_number: int) → Scalar¶

Select the value at a row number, similar to ndarray.__getitem__(<int>).

Parameters:: row_number (int) – Row number of value to return.
Returns:: Scalar – Depends on the dtype of the Column, and may vary across implementations.

hour() → Self¶

Return ‘hour’ component of each element of Date and Datetime columns.

For example, return 12 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

is_in(values: Self) → Self¶

Indicate whether the value at each row matches any value in values.

Parameters:: values (Self) – Contains values to compare against. May include float('nan') and null, in which case 'nan' and null will respectively return True even though float('nan') == float('nan') isn’t True. The dtype of values must match the current column’s dtype.
Returns:: Column

is_nan() → Self¶

Check for nan entries.

Returns:: Column

See also

is_nan

Notes

Does not include NaN-like entries. May optionally include ‘NaT’ values (if present in an implementation), but note that the Standard makes no guarantees about them.

iso_weekday() → Self¶

Return ISO weekday for each element of Date and Datetime columns.

Note that Monday=1, …, Sunday=7.

Return column should be of integer dtype (signed or unsigned).

len() → Scalar¶: Return the number of rows.

max(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

mean(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

median(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

microsecond() → Self¶

Return number of microseconds since last second.

For example, return 123456 for 1981-01-02T12:34:56.123456.

Only supported for Date and Datetime columns. Return column should be of integer dtype (signed or unsigned).

min(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.

minute() → Self¶

Return ‘minute’ component of each element of Date and Datetime columns.

For example, return 34 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

month() → Self¶

Return ‘month’ component of each element of Date and Datetime columns.

For example, return 1 for 1981-01-02T12:34:56.123456.

Return column should be of integer dtype (signed or unsigned).

n_unique(*, skip_nulls: bool = True) → Scalar¶

Return number of unique values.

Notes

If the original column(s) contain multiple 'NaN' values, then they only count as one distinct value. Likewise for null values (if skip_nulls=False).

property name: str¶: Return name of column.

property parent_dataframe: DataFrame | None¶

Return parent DataFrame, if present.

For example, if we have the following

df: DataFrame
column = df.col('a')

then column.parent_dataframe should return df.

On the other hand, if we had:

column = column_from_1d_array(...)

then column.parent_dataframe should return None.

persist() → Self¶

Hint that computation prior to this point should not be repeated.

This is intended as a hint, rather than as a directive. Implementations which do not separate lazy vs eager execution may ignore this method and treat it as a no-op.

Note

This method may trigger execution. If necessary, it should be called at most once per dataframe, and as late as possible in the pipeline.

prod(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Must be supported for numerical data types. The returned value has the same dtype as the column.

rename(name: str | Scalar) → Self¶

Rename column.

Parameters:: name (str) – New name for column.
Returns:: Column – New column - this does not operate in-place.

second() → Self¶

Return ‘second’ component of each element.

For example, return 56 for 1981-01-02T12:34:56.123456.

Only supported for Date and Datetime columns. Return column should be of integer dtype (signed or unsigned).

shift(offset: int | Scalar) → Self¶

Shift values by offset positions, filling missing values with null.

For example, if the original column contains values [1, 4, 2], then:

.shift(1) will return [null, 1, 4],
.shift(-1) will return [4, 2, null],

Parameters:: offset (int) – How many positions to shift by.

slice_rows(start: int | None, stop: int | None, step: int | None) → Self¶

Select a subset of rows corresponding to a slice.

Parameters:

start (int or None) –
stop (int or None) –
step (int or None) –

Returns:

Column

sort(*, ascending: bool = True, nulls_position: Literal['first', 'last'] = 'last') → Self¶

Sort column.

If you need the indices which would sort the column, use sorted_indices.

Parameters:

ascending (bool) – If True, sort in ascending order. If False, sort in descending order.
nulls_position ({'first', 'last'}) – Whether null values should be placed at the beginning or at the end of the result. Note that the position of NaNs is unspecified and may vary based on the implementation.

Returns:

Column

sorted_indices(*, ascending: bool = True, nulls_position: Literal['first', 'last'] = 'last') → Self¶

Return row numbers which would sort column.

If you need to sort the Column, use sort().

Parameters:

ascending (bool) – If True, sort in ascending order. If False, sort in descending order.
nulls_position ({'first', 'last'}) – Whether null values should be placed at the beginning or at the end of the result. Note that the position of NaNs is unspecified and may vary based on the implementation.

Returns:

Column

std(*, correction: float = 1, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

Parameters:

correction – Degrees of freedom adjustment. Setting this parameter to a value other than 0 has the effect of adjusting the divisor during the calculation of the standard deviation according to N-correction, where N corresponds to the total number of elements over which the standard deviation is computed. When computing the standard deviation of a population, setting this parameter to 0 is the standard choice (i.e., the provided column contains data constituting an entire population). When computing the corrected sample standard deviation, setting this parameter to 1 is the standard choice (i.e., the provided column contains data sampled from a larger population; this is commonly referred to as Bessel’s correction). Fractional (float) values are allowed. Default: 1.
skip_nulls – Whether to skip null values.

sum(*, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.

take(indices: Self) → Self¶

Select a subset of rows, similar to ndarray.take.

Parameters:: indices – Positions of rows to select.

to_array() → Any¶

Convert to array-API-compliant object.

The resulting array will have the corresponding dtype from the Array API:

Bool() -> ‘bool’
Int8() -> ‘int8’
Int16() -> ‘int16’
Int32() -> ‘int32’
Int64() -> ‘int64’
UInt8() -> ‘uint8’
UInt16() -> ‘uint16’
UInt32() -> ‘uint32’
UInt64() -> ‘uint64’
Float32() -> ‘float32’
Float64() -> ‘float64’

Null values are not supported and must be filled prior to conversion.

Returns:: Any – An array-API-compliant object.

Notes

While numpy arrays are not yet array-API-compliant, implementations may choose to return a numpy array (for numpy prior to 2.0), with the understanding that consuming libraries would then use the array-api-compat package to convert it to a Standard-compliant array.

unique_indices(*, skip_nulls: bool | Scalar = True) → Self¶

Return indices corresponding to unique values in Column.

Returns:: Column – Indices corresponding to unique values.

Notes

There are no ordering guarantees. In particular, if there are multiple indices corresponding to the same unique value, there is no guarantee about which one will appear in the result. If the original Column contains multiple 'NaN' values, then only a single index corresponding to those values will be returned. Likewise for null values (if skip_nulls=False). To get the unique values, you can do col.take(col.unique_indices()).

unix_timestamp(*, time_unit: str | Scalar = 's') → Self¶

Return number of seconds / milliseconds / microseconds since the Unix epoch.

The Unix epoch is 00:00:00 UTC on 1 January 1970.

Parameters:: time_unit – Time unit to use. Must be one of ‘s’, ‘ms’, or ‘us’.
Returns:: Column – Integer data type. For example, if the date is 1970-01-02T00:00:00.123456, and the time_unit is 's', then the result should be 86400, and not 86400.123456. Information smaller than the given time unit should be discarded.

var(*, correction: float | Scalar = 1, skip_nulls: bool | Scalar = True) → Scalar¶

Reduction returns a scalar.

Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.

Parameters:

correction – Correction to apply to the result. For example, 0 for sample standard deviation and 1 for population standard deviation. See Column.std for a more detailed description.
skip_nulls – Whether to skip null values.

year() → Self¶

Return ‘year’ component of each element of Date and Datetime columns.

For example, return 1981 for 1981-01-02T12:34:56.123456.

Return column should be of (signed) integer dtype.