pyspark.sql.DataFrameReader.csv#
- DataFrameReader.csv(path, schema=None, sep=None, encoding=None, quote=None, escape=None, comment=None, header=None, inferSchema=None, ignoreLeadingWhiteSpace=None, ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, positiveInf=None, negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None, maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None, columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None, samplingRatio=None, enforceSchema=None, emptyValue=None, locale=None, lineSep=None, pathGlobFilter=None, recursiveFileLookup=None, modifiedBefore=None, modifiedAfter=None, unescapedQuoteHandling=None)[source]#
Loads a CSV file and returns the result as a
DataFrame
.This function will go through the input once to determine the input schema if
inferSchema
is enabled. To avoid going through the entire data once, disableinferSchema
option or specify the schema explicitly usingschema
.New in version 2.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- pathstr or list
string, or list of strings, for input path(s), or RDD of Strings storing CSV rows.
- schema
pyspark.sql.types.StructType
or str, optional an optional
pyspark.sql.types.StructType
for the input schema or a DDL-formatted string (For examplecol0 INT, col1 DOUBLE
).
- Other Parameters
- Extra options
For the extra options, refer to Data Source Option for the version you use.
Examples
Write a DataFrame into a CSV file and read it back.
>>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="csv") as d: ... # Write a DataFrame into a CSV file ... df = spark.createDataFrame([{"age": 100, "name": "Hyukjin Kwon"}]) ... df.write.mode("overwrite").format("csv").save(d) ... ... # Read the CSV file as a DataFrame with 'nullValue' option set to 'Hyukjin Kwon'. ... spark.read.csv(d, schema=df.schema, nullValue="Hyukjin Kwon").show() +---+----+ |age|name| +---+----+ |100|NULL| +---+----+