SparkException: Parquet column cannot be converted in file

Got the error while reading nyc yellow taxi data…


24/04/09 16:15:12 INFO FileScanRDD: Reading File path: /nyc_yellow_cab_data/2015/12/yellow_tripdata_2015-12.parquet, range: 0-134217728, partition values: [empty row]
24/04/09 16:15:12 WARN TaskSetManager: Lost task 4.0 in stage 2.0 (TID 101) (192.168.0.2 executor driver): org.apache.spark.SparkException: Parquet column cannot be converted in file file:nyc_yellow_cab_data/2015/04/yellow_tripdata_2015-04.parquet.

Column: [congestion_surcharge], Expected: double, Found: INT32.at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:868)



The parquet directory containing the data with has files with different schemas.

i.e. one file above has congestion_surcharge with type DOUBLE and another file has Type INT32.

hence the mismatch. Fix the files.

you can try mergeSchema = true, but in this case it wont work.

Leave a comment