i'm working on data processing application using spark needs transform , combine many sources (10-20) resulting in 1 file. sources have common key join on. best approach here join 1 one materializing each operation parquet files? or join multiple dfs @ once? i've run performance issues latter know best practices are.
update: after switching spark 2.0 noticed joins on many tables more reliable , more performant.
Comments
Post a Comment