Column object¶
A conforming implementation of the dataframe API standard must provide and support a column object having the following methods, attributes, and behavior.
- class Column(*args, **kwargs)¶
Column object.
Note that this column object is not meant to be instantiated directly by users of the library implementing the dataframe API standard. Rather, use constructor functions or an already-created dataframe object retrieved via
DataFrame.col()
.The parent dataframe (which can be retrieved via the
parent_dataframe()
property) plays a key role here:If two columns were retrieved from the same dataframe, then they can be combined and compared at will.
If two columns were retrieved from different dataframes, then there is no guarantee about how or whether they can be combined and compared, this may vary across implementations.
If two columns are both “free-standing” (i.e. not retrieved from a dataframe but constructed directly from a 1D array or sequence), then they can be combined and compared with each other. Note, however, that there’s no guarantee about whether they can be compared or combined with columns retrieved from a different dataframe, this may vary across implementations.
- __abstractmethods__ = frozenset({})¶
- __add__(other: Self | AnyScalar) Self ¶
Add
other
column or scalar to this column.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.- Returns:
Column
- __and__(other: Self | bool | Scalar) Self ¶
Apply logical ‘and’ to
other
Column (or scalar) and this Column.Nulls should follow Kleene Logic.
- Parameters:
other (Self or bool) – If Column, must have same length.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.- Raises:
ValueError – If
self
orother
is not boolean.
- __column_namespace__() Namespace ¶
Return an object that has all the Dataframe Standard API functions on it.
- Returns:
namespace (Any) – An object representing the dataframe API namespace. It should have every top-level function defined in the specification as an attribute. It may contain other public names as well, but it is recommended to only include those names that are part of the specification.
- __divmod__(other: Self | AnyScalar) tuple[Column, Column] ¶
Return quotient and remainder of integer division. See
divmod
builtin.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __eq__(other: Self | AnyScalar) Self ¶
Compare for equality.
Nulls should follow Kleene Logic.
- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __floordiv__(other: Self | AnyScalar) Self ¶
Floor-divide
other
column or scalar to this column.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __ge__(other: Self | AnyScalar) Self ¶
Compare for “greater than or equal to”
other
.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __gt__(other: Self | AnyScalar) Self ¶
Compare for “greater than”
other
.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __init__(*args, **kwargs)¶
- __invert__() Self ¶
Invert truthiness of (boolean) elements.
- Raises:
ValueError – If any of the Column’s columns is not boolean.
- __iter__() NoReturn ¶
Iterate over elements.
This is intentionally “poisoned” to discourage inefficient code patterns.
- Raises:
NotImplementedError –
- __le__(other: Self | AnyScalar) Self ¶
Compare for “less than or equal to”
other
.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __lt__(other: Self | AnyScalar) Self ¶
Compare for “less than”
other
.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __mod__(other: Self | AnyScalar) Self ¶
Return modulus of this column by
other
(%
operator).- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __mul__(other: Self | AnyScalar) Self ¶
Multiply
other
column or scalar with this column.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __ne__(other: Self | AnyScalar) Self ¶
Compare for non-equality.
Nulls should follow Kleene Logic.
- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __or__(other: Self | bool | Scalar) Self ¶
Apply logical ‘or’ to
other
Column (or scalar) and this column.Nulls should follow Kleene Logic.
- Parameters:
other (Self or Scalar) – If Column, must have same length.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.- Raises:
ValueError – If
self
orother
is not boolean.
- __parameters__ = ()¶
- __pow__(other: Self | AnyScalar) Self ¶
Raise this column to the power of
other
.Integer dtype to the power of non-negative integer dtype is integer dtype. Integer dtype to the power of float dtype is float dtype. Float dtype to the power of integer dtype or float dtype is float dtype.
- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __radd__(other: Self | AnyScalar) Self ¶
- __rand__(other: Self | bool) Self ¶
- __rfloordiv__(other: Self | AnyScalar) Self ¶
- __rmod__(other: Self | AnyScalar) Self ¶
- __rmul__(other: Self | AnyScalar) Self ¶
- __ror__(other: Self | bool) Self ¶
Return value|self.
- __rpow__(other: Self | AnyScalar) Self ¶
- __rsub__(other: Self | AnyScalar) Self ¶
- __rtruediv__(other: Self | AnyScalar) Self ¶
- __sub__(other: Self | AnyScalar) Self ¶
Subtract
other
column or scalar from this column.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- __subclasshook__()¶
Abstract classes can override this to customize issubclass().
This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).
- __truediv__(other: Self | AnyScalar) Self ¶
Divide this column by
other
column or scalar. True division, returns floats.- Parameters:
other (Self or Scalar) – If Column, must have same length. “Scalar” here is defined implicitly by what scalar types are allowed for the operation by the underling dtypes.
- Returns:
Column
Notes
other
’s parent DataFrame must be the same asself
’s - else, the operation is unsupported and may vary across implementations.
- all(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a bool.
- Raises:
ValueError – If column is not boolean.
- any(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a bool.
- Raises:
ValueError – If column is not boolean.
- cast(dtype: DType) Self ¶
Cast specified columns to specified dtypes.
The following is not specified and may vary across implementations:
Cross-kind casting (e.g. integer to string, or to float)
Behaviour in the case of overflows
- property column: Any¶
Return underlying (not-necessarily-Standard-compliant) column.
If a library only implements the Standard, then this can return
self
.
- cumulative_max() Self ¶
Reduction returns a Column.
Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.
- cumulative_min() Self ¶
Reduction returns a Column.
Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.
- cumulative_prod() Self ¶
Reduction returns a Column.
Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.
- cumulative_sum() Self ¶
Reduction returns a Column.
Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.
- day() Self ¶
Return ‘day’ component of each element of
Date
andDatetime
columns.For example, return 2 for 1981-01-02T12:34:56.123456.
Return column should be of integer dtype (signed or unsigned).
- property dtype: DType¶
Return data type of column.
- fill_nan(value: float | NullType | Scalar, /) Self ¶
Fill floating point
nan
values with the given fill value.- Parameters:
value (float or
null
) – Value used to replace anynan
in the column with. Must be of the Python scalar type matching the dtype of the column (or benull
).
- fill_null(value: AnyScalar, /) Self ¶
Fill null values with the given fill value.
- Parameters:
value (Scalar) – Value used to replace any
null
values in the column with. Must be of the Python scalar type matching the dtype of the column.
- filter(mask: Self) Self ¶
Select a subset of rows corresponding to a mask.
- Parameters:
mask (Self) –
- Returns:
Column
Notes
Some participants preferred a weaker type Arraylike[bool] for mask, where ‘Arraylike’ denotes an object adhering to the Array API standard.
- get_value(row_number: int) Scalar ¶
Select the value at a row number, similar to
ndarray.__getitem__(<int>)
.- Parameters:
row_number (int) – Row number of value to return.
- Returns:
Scalar – Depends on the dtype of the Column, and may vary across implementations.
- hour() Self ¶
Return ‘hour’ component of each element of
Date
andDatetime
columns.For example, return 12 for 1981-01-02T12:34:56.123456.
Return column should be of integer dtype (signed or unsigned).
- is_in(values: Self) Self ¶
Indicate whether the value at each row matches any value in
values
.- Parameters:
values (Self) – Contains values to compare against. May include
float('nan')
andnull
, in which case'nan'
andnull
will respectively returnTrue
even thoughfloat('nan') == float('nan')
isn’tTrue
. The dtype ofvalues
must match the current column’s dtype.- Returns:
Column
- is_nan() Self ¶
Check for nan entries.
- Returns:
Column
See also
Notes
This only checks for ‘NaN’. Does not include ‘missing’ or ‘null’ entries. In particular, does not check for
np.timedelta64('NaT')
.
- is_null() Self ¶
Check for ‘missing’ or ‘null’ entries.
- Returns:
Column
See also
Notes
Does not include NaN-like entries. May optionally include ‘NaT’ values (if present in an implementation), but note that the Standard makes no guarantees about them.
- iso_weekday() Self ¶
Return ISO weekday for each element of
Date
andDatetime
columns.Note that Monday=1, …, Sunday=7.
Return column should be of integer dtype (signed or unsigned).
- len() Scalar ¶
Return the number of rows.
- max(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.
- mean(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.
- median(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.
- microsecond() Self ¶
Return number of microseconds since last second.
For example, return 123456 for 1981-01-02T12:34:56.123456.
Only supported for
Date
andDatetime
columns. Return column should be of integer dtype (signed or unsigned).
- min(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Any data type that supports comparisons must be supported. The returned value has the same dtype as the column.
- minute() Self ¶
Return ‘minute’ component of each element of
Date
andDatetime
columns.For example, return 34 for 1981-01-02T12:34:56.123456.
Return column should be of integer dtype (signed or unsigned).
- month() Self ¶
Return ‘month’ component of each element of
Date
andDatetime
columns.For example, return 1 for 1981-01-02T12:34:56.123456.
Return column should be of integer dtype (signed or unsigned).
- n_unique(*, skip_nulls: bool = True) Scalar ¶
Return number of unique values.
Notes
If the original column(s) contain multiple
'NaN'
values, then they only count as one distinct value. Likewise for null values (ifskip_nulls=False
).
- property name: str¶
Return name of column.
- property parent_dataframe: DataFrame | None¶
Return parent DataFrame, if present.
For example, if we have the following
df: DataFrame column = df.col('a')
then
column.parent_dataframe
should returndf
.On the other hand, if we had:
column = column_from_1d_array(...)
then
column.parent_dataframe
should returnNone
.
- persist() Self ¶
Hint that computation prior to this point should not be repeated.
This is intended as a hint, rather than as a directive. Implementations which do not separate lazy vs eager execution may ignore this method and treat it as a no-op.
Note
This method may trigger execution. If necessary, it should be called at most once per dataframe, and as late as possible in the pipeline.
- prod(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Must be supported for numerical data types. The returned value has the same dtype as the column.
- rename(name: str | Scalar) Self ¶
Rename column.
- Parameters:
name (str) – New name for column.
- Returns:
Column – New column - this does not operate in-place.
- second() Self ¶
Return ‘second’ component of each element.
For example, return 56 for 1981-01-02T12:34:56.123456.
Only supported for
Date
andDatetime
columns. Return column should be of integer dtype (signed or unsigned).
- shift(offset: int | Scalar) Self ¶
Shift values by
offset
positions, filling missing values withnull
.For example, if the original column contains values
[1, 4, 2]
, then:.shift(1)
will return[null, 1, 4]
,.shift(-1)
will return[4, 2, null]
,
- Parameters:
offset (int) – How many positions to shift by.
- slice_rows(start: int | None, stop: int | None, step: int | None) Self ¶
Select a subset of rows corresponding to a slice.
- Parameters:
start (int or None) –
stop (int or None) –
step (int or None) –
- Returns:
Column
- sort(*, ascending: bool = True, nulls_position: Literal['first', 'last'] = 'last') Self ¶
Sort column.
If you need the indices which would sort the column, use
sorted_indices
.- Parameters:
ascending (bool) – If
True
, sort in ascending order. IfFalse
, sort in descending order.nulls_position (
{'first', 'last'}
) – Whether null values should be placed at the beginning or at the end of the result. Note that the position of NaNs is unspecified and may vary based on the implementation.
- Returns:
Column
- sorted_indices(*, ascending: bool = True, nulls_position: Literal['first', 'last'] = 'last') Self ¶
Return row numbers which would sort column.
If you need to sort the Column, use
sort()
.- Parameters:
ascending (bool) – If
True
, sort in ascending order. IfFalse
, sort in descending order.nulls_position (
{'first', 'last'}
) – Whether null values should be placed at the beginning or at the end of the result. Note that the position of NaNs is unspecified and may vary based on the implementation.
- Returns:
Column
- std(*, correction: float = 1, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.
- Parameters:
correction – Degrees of freedom adjustment. Setting this parameter to a value other than
0
has the effect of adjusting the divisor during the calculation of the standard deviation according toN-correction
, whereN
corresponds to the total number of elements over which the standard deviation is computed. When computing the standard deviation of a population, setting this parameter to0
is the standard choice (i.e., the provided column contains data constituting an entire population). When computing the corrected sample standard deviation, setting this parameter to1
is the standard choice (i.e., the provided column contains data sampled from a larger population; this is commonly referred to as Bessel’s correction). Fractional (float) values are allowed. Default:1
.skip_nulls – Whether to skip null values.
- sum(*, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Must be supported for numerical and datetime data types. The returned value has the same dtype as the column.
- take(indices: Self) Self ¶
Select a subset of rows, similar to
ndarray.take
.- Parameters:
indices – Positions of rows to select.
- to_array() Any ¶
Convert to array-API-compliant object.
The resulting array will have the corresponding dtype from the Array API:
Bool() -> ‘bool’
Int8() -> ‘int8’
Int16() -> ‘int16’
Int32() -> ‘int32’
Int64() -> ‘int64’
UInt8() -> ‘uint8’
UInt16() -> ‘uint16’
UInt32() -> ‘uint32’
UInt64() -> ‘uint64’
Float32() -> ‘float32’
Float64() -> ‘float64’
Null values are not supported and must be filled prior to conversion.
- Returns:
Any – An array-API-compliant object.
Notes
While numpy arrays are not yet array-API-compliant, implementations may choose to return a numpy array (for numpy prior to 2.0), with the understanding that consuming libraries would then use the
array-api-compat
package to convert it to a Standard-compliant array.
- unique_indices(*, skip_nulls: bool | Scalar = True) Self ¶
Return indices corresponding to unique values in Column.
- Returns:
Column – Indices corresponding to unique values.
Notes
There are no ordering guarantees. In particular, if there are multiple indices corresponding to the same unique value, there is no guarantee about which one will appear in the result. If the original Column contains multiple
'NaN'
values, then only a single index corresponding to those values will be returned. Likewise for null values (ifskip_nulls=False
). To get the unique values, you can docol.take(col.unique_indices())
.
- unix_timestamp(*, time_unit: str | Scalar = 's') Self ¶
Return number of seconds / milliseconds / microseconds since the Unix epoch.
The Unix epoch is 00:00:00 UTC on 1 January 1970.
- Parameters:
time_unit – Time unit to use. Must be one of ‘s’, ‘ms’, or ‘us’.
- Returns:
Column – Integer data type. For example, if the date is 1970-01-02T00:00:00.123456, and the time_unit is
's'
, then the result should be 86400, and not 86400.123456. Information smaller than the given time unit should be discarded.
- var(*, correction: float | Scalar = 1, skip_nulls: bool | Scalar = True) Scalar ¶
Reduction returns a scalar.
Must be supported for numerical and datetime data types. Returns a float for numerical data types, and datetime (with the appropriate timedelta format string) for datetime dtypes.
- Parameters:
correction – Correction to apply to the result. For example,
0
for sample standard deviation and1
for population standard deviation. SeeColumn.std
for a more detailed description.skip_nulls – Whether to skip null values.
- year() Self ¶
Return ‘year’ component of each element of
Date
andDatetime
columns.For example, return 1981 for 1981-01-02T12:34:56.123456.
Return column should be of (signed) integer dtype.