pyspark.sql.functions.to_csv#

pyspark.sql.functions.to_csv(col, options=None)[source]#

CSV Function: Converts a column containing a StructType into a CSV string. Throws an exception, in the case of an unsupported type.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

Name of column containing a struct.

options: dict, optional

Options to control converting. Accepts the same options as the CSV datasource. See Data Source Option for the version you use.

Returns
Column

A CSV string converted from the given StructType.

Examples

Example 1: Converting a simple StructType to a CSV string

>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice'))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show()
+-------------+
|to_csv(value)|
+-------------+
|      2,Alice|
+-------------+

Example 2: Converting a complex StructType to a CSV string

>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show(truncate=False)
+-------------------------+
|to_csv(value)            |
+-------------------------+
|2,Alice,"[100, 200, 300]"|
+-------------------------+

Example 3: Converting a StructType with null values to a CSV string

>>> from pyspark.sql import Row, functions as sf
>>> from pyspark.sql.types import StructType, StructField, IntegerType, StringType
>>> data = [(1, Row(age=None, name='Alice'))]
>>> schema = StructType([
...   StructField("key", IntegerType(), True),
...   StructField("value", StructType([
...     StructField("age", IntegerType(), True),
...     StructField("name", StringType(), True)
...   ]), True)
... ])
>>> df = spark.createDataFrame(data, schema)
>>> df.select(sf.to_csv(df.value)).show()
+-------------+
|to_csv(value)|
+-------------+
|       ,Alice|
+-------------+

Example 4: Converting a StructType with different data types to a CSV string

>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', isStudent=True))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show()
+-------------+
|to_csv(value)|
+-------------+
| 2,Alice,true|
+-------------+