Transforming complex STRUCT type to another struct type / Data type using SPARK

Rajat Kumar
2 min readAug 17, 2021

Spark does not allow directly modifying struct type or complex data type column to other data types or complex struct type.

However, spark support a few features which we can leverage to achieve the STRUCT TYPE modification OR we can write custom UDF.

Prerequisite:

Understanding of spark lib UDFs like from_json, to_json .

Now lets we have DATAFRAME with below schema :

ORIGINAL DATA TYPE for “mymixes_recommendations” column :

val customSchema = new ArrayType(new StructType()

.add(“cluster_id”, LongType)

.add(“title”, ArrayType(StringType))

.add(“itemIds”, ArrayType(StringType))

.add(“productTag”, StringType), true )

TARGETED SCHEMA for “mymixes_recommendations” column :

ie. [ add “RATING” column in struct type under an array type ]

val requiredSchema = new ArrayType(new StructType()

.add(“cluster_id”, LongType)

.add(“title”, ArrayType(StringType))

.add(“itemIds”, ArrayType(StringType))

.add(“productTag”, StringType), true )

.add(“rating”, StringType), true )

There is no direct way of converting the original schema to the required schema in spark.

So we will use JSON as an intermediate data conversion format using the spark UDF.

df.select(col(“user_id”), col(“day”), from_json(to_json(col(“recommendations”)), requiredSchema).as(“recommendations”)).printSchema

OUTPUT SCHEMA after above operation :

--

--