pyspark.sql.functions.slice#

pyspark.sql.functions.slice(x, start, length)[source]#

Array function: Returns a new array column by slicing the input array column from a start index to a specific length. The indices start at 1, and can be negative to index from the end of the array. The length specifies the number of elements in the resulting array.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
xColumn or str

Input array column or column name to be sliced.

startColumn, str, or int

The start index for the slice operation. If negative, starts the index from the end of the array.

lengthColumn, str, or int

The length of the slice, representing number of elements in the resulting array.

Returns
Column

A new Column object of Array type, where each value is a slice of the corresponding list from the input column.

Examples

Example 1: Basic usage of the slice function.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
>>> df.select(sf.slice(df.x, 2, 2)).show()
+--------------+
|slice(x, 2, 2)|
+--------------+
|        [2, 3]|
|           [5]|
+--------------+

Example 2: Slicing with negative start index.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
>>> df.select(sf.slice(df.x, -1, 1)).show()
+---------------+
|slice(x, -1, 1)|
+---------------+
|            [3]|
|            [5]|
+---------------+

Example 3: Slice function with column inputs for start and length.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([1, 2, 3], 2, 2), ([4, 5], 1, 3)], ['x', 'start', 'length'])
>>> df.select(sf.slice(df.x, df.start, df.length)).show()
+-----------------------+
|slice(x, start, length)|
+-----------------------+
|                 [2, 3]|
|                 [4, 5]|
+-----------------------+