i have case class
case class mycaseclass(city : string, extras : map[string, string])
and user defined function returns scala.collection.immutable.map
def extrasudf = spark.udf.register( "extras_udf", (age : int, name : string) => map("age" -> age.tostring, "name" -> name) )
but breaks exception:
import spark.implicits._ spark.read.options(...).load(...) .select('city, 'age, 'name) .withcolumn("extras", extrasudf('age, 'name)) .drop('age) .drop('name) .as[mycaseclass]
i should use spark sql's maptype(datatypes.stringtype, datatypes.integertype) can't find working example...
and works if use scala.collection.map need immutable map
there many problems code:
you using
def extrastudf =
, creates function registering udf opposed creating/registering udf. useval extrasudf =
instead.you mixing value types in map (
string
,int
), makes mapmap[string, any]
any
common superclass ofstring
,int
. spark not supportany
. can @ least 2 things: (a) switch using string map (withage.tostring
, in case don't need udf can usemap()
) or (b) switch using named structs usingnamed_struct()
(again, without need udf). rule, write udf if cannot need existing functions. prefer @ hive documentation because spark docs rather sparse.also, keep in mind type specification in spark schema (e.g.,
maptype
) different scala types (e.g.,map[_, _]
) , separate how types represented internally , mapped between scala & spark data structures. in other words, has nothing mutable vs. immutable collections.
hope helps!
Comments
Post a Comment