JanusGraph学习笔记

title: JanusGraph学习笔记
date: 2019-07-15 10:09:30
tags: [‘JanusGraph’,‘学习笔记’,‘图数据库’]
categories: 后台

jansusgraph是一款分布式开源图数据库。

JanusGraph Server

使用默认的数据库及搜索引擎配置：bin/janusgraph.sh start

自定义存储及搜索引擎：bin/gremlin-server.sh ./conf/gremlin-server/gremlin-server.yaml

gremlin-server.yaml内容：

host: localhost
port: 8182
scriptEvaluationTimeout: 120000
# ws+http
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
#http
#channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer
#ws
#channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {#数据库及搜索引擎配置graph: conf/gremlin-server/socket-janusgraph-hbase-server.properties
}
..............

http发送请求

POST  HTTP/1.1
Host: localhost:8182
Content-Type: application/json
User-Agent: PostmanRuntime/7.19.0
Accept: */*
Cache-Control: no-cache
Postman-Token: c8c2454d-8daa-4ac4-ad4c-1d9ad784a6bb,c6e81a95-2862-4cfb-a8ef-3cbcfa8c482c
Host: 172.16.21.46:8182
Accept-Encoding: gzip, deflate
Content-Length: 36
Connection: keep-alive
cache-control: no-cache{"gremlin": "g.V().limit(1).next()"}

socket-janusgraph-hbase-server.properties内容：

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hbase.table=janusgraph_demo_x  //hbase 表名字
storage.hostname=3-142-bigdata-2.jianganwei.com,3-141-bigdata-1.jianganwei.com,3-143-bigdata-3.jianganwei.com
storage.batch-loading=true
ids.block-size=100000000
storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1

还有权限认证功能见官方文档

Gremlin

终端连接远程数据库

:remote connect tinkerpop.server conf/remote.yaml

remote.yaml :

hosts: [localhost]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}

基础语句

graph = JanusGraphFactory.open('conf/gremlin-server/socket-janusgraph-hbase-server.properties') 打开数据库连接
g=graph.traversal() 的到实列
V()：查询顶点，一般作为图查询的第1步，后面可以续接的语句种类繁多。例，g.V()，g.V('v_id')，查询所有点和特定点；
E()：查询边，一般作为图查询的第1步，后面可以续接的语句种类繁多；
id()：获取顶点、边的id。例：g.V().id()，查询所有顶点的id；
label()：获取顶点、边的 label。例：g.V().label()，可查询所有顶点的label。
key() / values()：获取属性的key/value的值。
properties()：获取顶点、边的属性；可以和 key()、value()搭配使用，以获取属性的名称或值。例：g.V().properties('name')，查询所有顶点的 name 属性；
valueMap()：获取顶点、边的属性，以Map的形式体现，和properties()比较像；
values()：获取顶点、边的属性值。例，g.V().values() 等于 g.V().properties().value()

遍历（以定点为基础）

out(label)：根据指定的 Edge Label 来访问顶点的 OUT 方向邻接点（可以是零个 Edge Label，代表所有类型边；也可以一个或多个 Edge Label，代表任意给定 Edge Label 的边，下同）；
in(label)：根据指定的 Edge Label 来访问顶点的 IN 方向邻接点；
both(label)：根据指定的 Edge Label 来访问顶点的双向邻接点；
outE(label)：根据指定的 Edge Label 来访问顶点的 OUT 方向邻接边；
inE(label)：根据指定的 Edge Label 来访问顶点的 IN 方向邻接边；
bothE(label)：根据指定的 Edge Label 来访问顶点的双向邻接边；

遍历（以边为基础）

outV()：访问边的出顶点，出顶点是指边的起始顶点；
inV()：访问边的入顶点，入顶点是指边的目标顶点，也就是箭头指向的顶点；
bothV()：访问边的双向顶点；
otherV()：访问边的伙伴顶点，即相对于基准顶点而言的另一端的顶点；

过滤

has(key,value): 通过属性的名字和值来过滤顶点或边；
has(label, key, value): 通过label和属性的名字和值过滤顶点和边；
has(key,predicate): 通过对指定属性用条件过滤顶点和边，例：g.V().has('age', gt(20))，可得到年龄大于20的顶点；
hasLabel(labels…): 通过 label 来过滤顶点或边，满足label列表中一个即可通过；
hasId(ids…): 通过 id 来过滤顶点或者边，满足id列表中的一个即可通过；
hasKey(keys…): 通过 properties 中的若干 key 过滤顶点或边；
hasValue(values…): 通过 properties 中的若干 value 过滤顶点或边；
has(key): properties 中存在 key 这个属性则通过，等价于hasKey(key)；
hasNot(key): 和 has(key) 相反；

分支（if-else）

choose(predicate,true,false) :判断第一个参数的条件，满足走true分支，不满足走false分支可以与optional()搭配使用。列子：

g.V().hasLabel('person').choose(values('age').is(lte(30)),__.in(),__.out()).values('name')
g.V().hasLabel('person').choose(values('age')).option(27, __.in()).option(32, __.out()).values('name')

循环

repeat()：指定要重复执行的语句；
times()：指定要重复执行的次数，如执行3次；
until()：指定循环终止的条件，如一直找到某个名字的朋友为止；
loops()：当前循环的次数，可用于控制最大循环次数等，如最多执行3次。
repeat() 和 until() 的位置不同，决定了不同的循环效果：
- repeat() + until()：等同 do-while；
- until() + repeat()：等同 while-do
```
g.V(1).repeat(out()).until(outE().count().is(0)).path().by('name')g.V('1').repeat(out()).until(loops().is(3)).path()
```

Match

匹配数据类似neo4j的match 和as()配合使用

g.V().match(__.as('a').out('created').as('b'),__.as('b').has('name', 'lop'),__.as('b').in('created').as('c'),__.as('c').has('age', 29)).select('a','c').by('name')

project

通过by来进行分支运算

g.V().groupCount().by(label()).unfold().project('k','v').by(map{it->it.get().getKey()}).by(map{it->it.get().getValue()}).order().by(select('v'))g.V().groupCount().by(label()).unfold().project('k','v','v1').by(map{it->it.get().getValue()}).by(map{it->it.get().getKey()}).by(map{it->it.get().getValue()})

路径

path()当前遍历过的路径
simplePath()过滤掉路径中含有环路的对象
cyclicPath()过滤掉路径中不含有环路的对象

shortestPath()最短路径

g.V().shortestPath().with(ShortestPath.edges, Direction.IN).with(ShortestPath.target, __.has('name','josh'))

统计

sum()：将 traversal 中的所有的数字求和；
max()：对 traversal 中的所有的数字求最大值；
min()：对 traversal 中的所有的数字求最小值；
mean()：将 traversal 中的所有的数字求均值；
count()：统计 traversal 中 item 总数。

or

这个也是一个过滤语句

g.V().or(out().has('test','test'),in().has('test','test'))

注意这里返回的是满足or 里面任一条件的节点，而不是返回in或者out之后的节点和下面的语句等价

g.V().where(out().has('test','test').or().in().has('test','test'))

and,not 和这个类似

union

联合多个结果比如需要查某个节点的 in() 和out()节点

g.V().union(__.in(),out())

当然这里可以用both代替，如果遇到比较复杂的联合就需要 union
更多语法见官方文档

批量导入

janusGraph 本身不带有导入功能我们使用github 开源工具 https://github.com/IBM/janusgraph-utils

下载工具ropo 打包

git clone https://github.com/IBM/janusgraph-utils.git
cd janusgraph-utils/
mvn package

根据配置生成模版文件：

./run.sh gencsv csv-conf/twitter-like-w-date.json /tmp

里面包括schama.json 描述各个字段类型，及索引等等，datamapper.json 描述各个节点及边所在的文件名，及各个字段对应关系。

导入数据

设置环境变量

export JANUSGRAPH_HOME=/opt/app/janusgraph-0.3.1-hadoop2
export PATH=$PATH:$JANUSGRAPH_HOME/bin

./run.sh import ~/janusgraph/conf/janusgraph-cql-es.properties /tmp /tmp/schema.json /tmp/datamapper.json

批量导入官方版本

官方导入方式注意配置 hadoop 环境变量最新版本的

环境变量

export HADOOP_CONF_DIR=/etc/hadoop/conf
export CLASSPATH=$HADOOP_CONF_DIR
bin/gremlin.sh

配置文件

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
##读取数据的格式
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.Cassandra3InputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormatgremlin.hadoop.jarsInDistributedCache=true
##输入文件地址
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
........

:load data/grateful-dead-janusgraph-schema.groovy
graph = JanusGraphFactory.open('conf/gremlin-server/jianganwei-janusgraph-hbase-server.properties')
defineGratefulDeadSchema(graph)
graph.close()
hdfs.copyFromLocal('data/grateful-dead.kryo','data/grateful-dead.kryo')
graph = GraphFactory.open('conf/hadoop-graph/hadoop-graphson.properties')
blvp = BulkLoaderVertexProgram.build().writeGraph('conf/gremlin-server/jianganwei-janusgraph-hbase-server.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()

清除数据

:i org.janusgraph.core.util.JanusGraphCleanup
graph = JanusGraphFactory.open('conf/gremlin-server/jianganwei-janusgraph-hbase-server.properties')
graph.close()
JanusGraphCleanup.clear(graph)

查看schema

graph = JanusGraphFactory.open('conf/gremlin-server/jianganwei-janusgraph-hbase-server.properties')
mgmt = graph.openManagement()
mgmt.printSchema()

添加节点

tx = graph.newTransaction()
vertex = tx.addVertex("Customer")
vertex.property("id_no",510800200000000000)
vertex.property("cust_no","123")
vertex.property("cust_name","李四")
vertex.property("gender","男")
vertex.property("create_time",new java.util.Date())
tx.commit()

添加边

left=g.V(7372800004272).next()
right=g.V(8216).next()
edge=left.addEdge("CALL", right)
edge.property("content","我好想你")

建立混合索引

indexBuilder = mgmt.buildIndex("call_content_index", Edge.class)
indexBuilder.indexOnly(mgmt.getEdgeLabel("CALL"))
indexBuilder.unique();
indexBuilder.addKey(mgmt.getPropertyKey("content"), Mapping.STRING.asParameter())
indexBuilder.buildMixedIndex("search")

建立精准索引

indexBuilder = mgmt.buildIndex("test_set_index", Vertex.class)
indexBuilder.indexOnly(mgmt.getVertexLabel("Customer"))
indexBuilder.unique();
indexBuilder.addKey(mgmt.getPropertyKey("test_set"));
indexBuilder.buildCompositeIndex();

添加属性

mgmt.makePropertyKey("content").dataType(java.lang.String.class).cardinality(Cardinality.SINGLE).make()

按混合索引查询数据

 graph.indexQuery("call_content_index","你").edgeTotals()