pyspark.streaming.DStream.countByWindow#

DStream.countByWindow(windowDuration, slideDuration)[source]#

Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream. windowDuration and slideDuration are as defined in the window() operation.

This is equivalent to window(windowDuration, slideDuration).count(), but will be more efficient if window is large.