SparkException: Parquet column cannot be converted in file

Got the error while reading nyc yellow taxi data…

24/04/09 16:15:12 INFO FileScanRDD: Reading File path: /nyc_yellow_cab_data/2015/12/yellow_tripdata_2015-12.parquet, range: 0-134217728, partition values: [empty row]
24/04/09 16:15:12 WARN TaskSetManager: Lost task 4.0 in stage 2.0 (TID 101) (192.168.0.2 executor driver): org.apache.spark.SparkException: Parquet column cannot be converted in file file:nyc_yellow_cab_data/2015/04/yellow_tripdata_2015-04.parquet.

Column: [congestion_surcharge], Expected: double, Found: INT32.at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:868)

The parquet directory containing the data with has files with different schemas.

i.e. one file above has congestion_surcharge with type DOUBLE and another file has Type INT32.

hence the mismatch. Fix the files.

you can try mergeSchema = true, but in this case it wont work.

	vishnu213 on Hack – How 2 add jars 2…
	Niteen B Dhule on Hack – How 2 add jars 2…
	xuanzhui on Scylla : received an invalid g…
	vishnu213 on Should I worry about the Query…
	vishnu213 on Scylla : received an invalid g…

MAS*H – MySQL Army Surgical Hospital 213

Best Care Anywhere – mash 213 ✚

SparkException: Parquet column cannot be converted in file

Leave a comment Cancel reply

Share your Love

Related

Leave a comment Cancel reply