pyspark.sql.functions.to_csv#
- pyspark.sql.functions.to_csv(col, options=None)[source]#
CSV Function: Converts a column containing a
StructType
into a CSV string. Throws an exception, in the case of an unsupported type.New in version 3.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str Name of column containing a struct.
- options: dict, optional
Options to control converting. Accepts the same options as the CSV datasource. See Data Source Option for the version you use.
- col
- Returns
Column
A CSV string converted from the given
StructType
.
Examples
Example 1: Converting a simple StructType to a CSV string
>>> from pyspark.sql import Row, functions as sf >>> data = [(1, Row(age=2, name='Alice'))] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(sf.to_csv(df.value)).show() +-------------+ |to_csv(value)| +-------------+ | 2,Alice| +-------------+
Example 2: Converting a complex StructType to a CSV string
>>> from pyspark.sql import Row, functions as sf >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(sf.to_csv(df.value)).show(truncate=False) +-------------------------+ |to_csv(value) | +-------------------------+ |2,Alice,"[100, 200, 300]"| +-------------------------+
Example 3: Converting a StructType with null values to a CSV string
>>> from pyspark.sql import Row, functions as sf >>> from pyspark.sql.types import StructType, StructField, IntegerType, StringType >>> data = [(1, Row(age=None, name='Alice'))] >>> schema = StructType([ ... StructField("key", IntegerType(), True), ... StructField("value", StructType([ ... StructField("age", IntegerType(), True), ... StructField("name", StringType(), True) ... ]), True) ... ]) >>> df = spark.createDataFrame(data, schema) >>> df.select(sf.to_csv(df.value)).show() +-------------+ |to_csv(value)| +-------------+ | ,Alice| +-------------+
Example 4: Converting a StructType with different data types to a CSV string
>>> from pyspark.sql import Row, functions as sf >>> data = [(1, Row(age=2, name='Alice', isStudent=True))] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(sf.to_csv(df.value)).show() +-------------+ |to_csv(value)| +-------------+ | 2,Alice,true| +-------------+