i new apache spark, created several rdd's , dataframes, cached them, want unpersist of them using command below
rddname.unpersist()
but can't remember names. used sc.getpersistentrdds
output not include names. used browser view cached rdds again no name information. missing something?
@dikei's answer correct believe looking sc.getpersistentrdds
:
scala> val rdd1 = sc.makerdd(1 100) # rdd1: org.apache.spark.rdd.rdd[int] = parallelcollectionrdd[0] @ makerdd @ <console>:27 scala> val rdd2 = sc.makerdd(10 1000) # rdd2: org.apache.spark.rdd.rdd[int] = parallelcollectionrdd[1] @ makerdd @ <console>:27 scala> rdd2.cache.setname("rdd_2") # res0: rdd2.type = rdd_2 parallelcollectionrdd[1] @ makerdd @ <console>:27 scala> sc.getpersistentrdds # res1: scala.collection.map[int,org.apache.spark.rdd.rdd[_]] = map(1 -> rdd_2 parallelcollectionrdd[1] @ makerdd @ <console>:27) scala> rdd1.cache.setname("foo") # res2: rdd1.type = foo parallelcollectionrdd[0] @ makerdd @ <console>:27 scala> sc.getpersistentrdds # res3: scala.collection.map[int,org.apache.spark.rdd.rdd[_]] = map(1 -> rdd_2 parallelcollectionrdd[1] @ makerdd @ <console>:27, 0 -> foo parallelcollectionrdd[0] @ makerdd @ <console>:27)
now let's add rdd
, name :
scala> rdd3.setname("bar") # res4: rdd3.type = bar parallelcollectionrdd[2] @ makerdd @ <console>:27 scala> sc.getpersistentrdds # res5: scala.collection.map[int,org.apache.spark.rdd.rdd[_]] = map(1 -> rdd_2 parallelcollectionrdd[1] @ makerdd @ <console>:27, 0 -> foo parallelcollectionrdd[0] @ makerdd @ <console>:27)
we noticed isn't persisted.
Comments
Post a Comment