Apache Spark multiple joins on data frames -

i'm working on data processing application using spark needs transform , combine many sources (10-20) resulting in 1 file. sources have common key join on. best approach here join 1 one materializing each operation parquet files? or join multiple dfs @ once? i've run performance issues latter know best practices are.

update: after switching spark 2.0 noticed joins on many tables more reliable , more performant.

Trigger

Search This Blog

Apache Spark multiple joins on data frames -

Comments

Post a Comment