pyspark.sql.functions.array_insert#
- pyspark.sql.functions.array_insert(arr, pos, value)[source]#
Array function: Inserts an item into a given array at a specified array index. Array indices start at 1, or start from the end if index is negative. Index above array size appends the array, or prepends the array if index is negative, with ‘null’ elements.
New in version 3.4.0.
- Parameters
- Returns
Column
an array of values, including the new specified value
Notes
Supports Spark Connect.
Examples
Example 1: Inserting a value at a specific position
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data']) >>> df.select(sf.array_insert(df.data, 2, 'd')).show() +------------------------+ |array_insert(data, 2, d)| +------------------------+ | [a, d, b, c]| +------------------------+
Example 2: Inserting a value at a negative position
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data']) >>> df.select(sf.array_insert(df.data, -2, 'd')).show() +-------------------------+ |array_insert(data, -2, d)| +-------------------------+ | [a, b, d, c]| +-------------------------+
Example 3: Inserting a value at a position greater than the array size
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data']) >>> df.select(sf.array_insert(df.data, 5, 'e')).show() +------------------------+ |array_insert(data, 5, e)| +------------------------+ | [a, b, c, NULL, e]| +------------------------+
Example 4: Inserting a NULL value
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StringType >>> df = spark.createDataFrame([(['a', 'b', 'c'],)], ['data']) >>> df.select(sf.array_insert(df.data, 2, sf.lit(None).cast(StringType())) ... .alias("result")).show() +---------------+ | result| +---------------+ |[a, NULL, b, c]| +---------------+
Example 5: Inserting a value into a NULL array
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import ArrayType, IntegerType, StructType, StructField >>> schema = StructType([ ... StructField("data", ArrayType(IntegerType()), True) ... ]) >>> df = spark.createDataFrame([(None,)], schema=schema) >>> df.select(sf.array_insert(df.data, 1, 5)).show() +------------------------+ |array_insert(data, 1, 5)| +------------------------+ | NULL| +------------------------+