Spark SQL UDF returning scala immutable Map with df.WithColumn() -


i have case class

case class mycaseclass(city : string, extras : map[string, string]) 

and user defined function returns scala.collection.immutable.map

def extrasudf = spark.udf.register(    "extras_udf",     (age : int, name : string) => map("age" -> age.tostring, "name" -> name) ) 

but breaks exception:

import spark.implicits._  spark.read.options(...).load(...)       .select('city, 'age, 'name)       .withcolumn("extras", extrasudf('age, 'name))       .drop('age)       .drop('name)       .as[mycaseclass] 

i should use spark sql's maptype(datatypes.stringtype, datatypes.integertype) can't find working example...

and works if use scala.collection.map need immutable map

there many problems code:

  • you using def extrastudf =, creates function registering udf opposed creating/registering udf. use val extrasudf = instead.

  • you mixing value types in map (string , int), makes map map[string, any] any common superclass of string , int. spark not support any. can @ least 2 things: (a) switch using string map (with age.tostring, in case don't need udf can use map()) or (b) switch using named structs using named_struct() (again, without need udf). rule, write udf if cannot need existing functions. prefer @ hive documentation because spark docs rather sparse.

  • also, keep in mind type specification in spark schema (e.g., maptype) different scala types (e.g., map[_, _]) , separate how types represented internally , mapped between scala & spark data structures. in other words, has nothing mutable vs. immutable collections.

hope helps!


Comments