pyspark.RDD.reduce

RDD.reduce(f: Callable[[T, T], T]) → T[source]

Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.

New in version 0.7.0.

Parameters
ffunction

the reduce function

Returns
T

the aggregated result

Examples

>>> from operator import add
>>> sc.parallelize([1, 2, 3, 4, 5]).reduce(add)
15
>>> sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add)
10
>>> sc.parallelize([]).reduce(add)
Traceback (most recent call last):
    ...
ValueError: Can not reduce() empty RDD