将Spark Streaming RDD推送到Neo4j -Scala

Pushing Spark Streaming RDDs to Neo4j -Scala

我需要建立一个从Spark Streaming到Neo4j图形数据库的连接.RDD的类型为((is，I)，(am，Hello)(sam，happy)....)。我需要在Neo4j中的每对单词之间建立一条边。

在Spark Streaming文档中，我找到了

1
2
3
4
5
6
7
8
9
10

dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection) // return to the pool for future reuse
}
}

to the push to the data to an external database.

我正在Scala中进行此操作。我对如何处理感到困惑？我找到了AnormCypher和Neo4jScalapackage器。我可以用这些来完成工作吗？如果是这样，我该怎么做？如果没有，那么还有其他更好的选择吗？

谢谢大家。...

我用AnormCypher做了一个实验

赞：

1
2
3
4
5
6
7
8
9
10

implicit val connection = Neo4jREST.setServer("localhost", 7474,"/db/data/")
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(FILE, 4).cache()
val count = logData
.flatMap( _.split(""))
.map( w =>
Cypher("CREATE(:Word {text:{text}})")
.on("text" -> w ).execute()
).filter( _ ).count()

Neo4j 2.2.x具有出色的并发写入性能，您可以从Spark使用它。因此，您必须向Neo4j写入的并发线程越多越好。如果每个请求可以对100到1000个批处理语句进行批处理，那就更好了。