python - Pulling data from Neo4j using PySpark -


i have time series stored graph (using time tree structure, similar this) in neo4j server instance, version 2.3.6 (so rest interface only, no bolt). trying perform analytics of these time series in distributed way, using pyspark.

now, aware of existing projects connect spark neo4j, ones listed here. problem these focus on creating interface work graphs. in case graphs not relevant, since neo4j cypher queries meant produce arrays of values. downstream handling these arrays time series; again, not graph.

my question is: has queried rest-only neo4j instance in parallel using pyspark, , if yes, how did it? py2neo library seemed candidate until realize connection object not shared across partitions (or if can, not know how). right i'm considering having spark jobs run independent rest queries on neo4j server, wanted see how community may have solved problem.

best, aurélien


Comments