pyspark.pandas.groupby.GroupBy.quantile#

GroupBy.quantile(q=0.5, accuracy=10000)[source]#

Return group values at the given quantile.

New in version 3.4.0.

Parameters
qfloat, default 0.5 (50% quantile)

Value between 0 and 1 providing the quantile to compute.

accuracyint, optional

Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy. This is a panda-on-Spark specific parameter.

Returns
pyspark.pandas.Series or pyspark.pandas.DataFrame

Return type determined by caller of GroupBy object.

Notes

quantile in pandas-on-Spark are using distributed percentile approximation algorithm unlike pandas, the result might be different with pandas, also interpolation parameter is not supported yet.

Examples

>>> df = ps.DataFrame([
...     ['a', 1], ['a', 2], ['a', 3],
...     ['b', 1], ['b', 3], ['b', 5]
... ], columns=['key', 'val'])

Groupby one column and return the quantile of the remaining columns in each group.

>>> df.groupby('key').quantile()
     val
key
a    2.0
b    3.0