Janusgraph with HBase backend, graph traversal is slow #3335

Kukant · 2022-11-22T10:23:33Z

Kukant
Nov 22, 2022

We are using Janusgraph with Hbase backend for storing large data lineage graphs. The basic usage is to find a node, then do an impact analysis by traversing all the nodes that are affected by this node (recursively).

The speed I am currently getting is about 620 edge traversals per second. I consider that quite slow.

Here is the gremlin query:

g.V().has('name', 'xxx').
repeat(
 outE('flows_into').dedup().inV()
).
until(
 or(
  outE('flows_into').count().is(0),
  cyclicPath()
 )
).
path().
unfold().
dedup().
group().by(label).by(count())

Here is our configuration/environment:

Janusgraph server 0.6.2

RAM 32 GB
CPU - enough
HBase 2.1.4

13 nodes
750 GB RAM on each node
The graph has around 4 million nodes and 5 million edges.

Is this speed normal? Is there way to make the query run faster? Would Cassandra be better for our usecase?

I am adding all the details - explain step output and configuration:

Query explain output:

Original Traversal                                     [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]

ConnectiveStrategy                               [D]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
IdentityRemovalStrategy                          [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
MatchPredicateStrategy                           [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
FilterRankingStrategy                            [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
InlineFilterStrategy                             [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
RepeatUnrollStrategy                             [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
IncidentToAdjacentStrategy                       [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
CountStrategy                                    [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],vertex)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
PathRetractionStrategy                           [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],vertex)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
AdjacentToIncidentStrategy                       [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
EarlyLimitStrategy                               [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
LazyBarrierStrategy                              [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
ByModulatorOptimizationStrategy                  [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
ProductiveByStrategy                             [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
AdjacentVertexHasIdOptimizerStrategy             [P]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
AdjacentVertexIsOptimizerStrategy                [P]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
AdjacentVertexHasUniquePropertyOptimizerStrategy [P]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
AdjacentVertexFilterOptimizerStrategy            [P]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
JanusGraphLocalQueryOptimizerStrategy            [P]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
JanusGraphIoRegistrationStrategy                 [P]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
JanusGraphStepStrategy                           [P]   [JanusGraphStep([],[fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
JanusGraphMultiQueryStrategy                     [P]   [JanusGraphStep([],[fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
JanusGraphMixedIndexCountStrategy                [P]   [JanusGraphStep([],[fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
ProfileStrategy                                  [F]   [JanusGraphStep([],[fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
StandardVerificationStrategy                     [V]   [JanusGraphStep([],[fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]

Final Traversal                                        [JanusGraphStep([],[fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([JanusGraphVertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[NotStep([VertexStep(OUT,[flows_into],edge)])], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
Original Traversal                                     [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]

ConnectiveStrategy                               [D]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
IdentityRemovalStrategy                          [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
MatchPredicateStrategy                           [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
FilterRankingStrategy                            [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]
InlineFilterStrategy                             [O]   [GraphStep(vertex,[]), HasStep([fqn.eq(data_stage - DW_DI_CK_MS - User_Date_9)]), RepeatStep([VertexStep(OUT,[flows_into],edge), DedupGlobalStep(null,null), EdgeVertexStep(IN), RepeatEndStep],until([OrStep([[VertexStep(OUT,[flows_into],vertex), CountGlobalStep, IsStep(eq(0))], [PathFilterStep(cyclic,null,null)]])]),emit(false)), PathStep, UnfoldStep, DedupGlobalStep(null,null), GroupStep(label,[CountGlobalStep])]

gremlin-server.yaml:

evaluationTimeout: 3000000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  graph: conf/janusgraph-hbase.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536

janusgraph-hbase.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=...
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
storage.batch-loading=true

Traversal Metrics:

Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[fqn.eq(data_stage - DW_DI_TR...                     1           1           2.700     0.01
  constructGraphCentricQuery                                                                   0.172
  GraphCentricQuery                                                                        46369.068
    \_condition=(fqn = data_stage - DW_DI_TR_Posted - Tran_Amt)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]
    \_index=byFqn
    backend-query                                                      1                       2.309
    \_query=byFqn:multiKSQ[1]
RepeatStep([JanusGraphVertexStep(OUT,[flows_int...                 23219       23219       24760.566    53.40
  OrStep([[NotStep([VertexStep(OUT,[flows_into]...                                         23056.866
    NotStep([VertexStep(OUT,[flows_into],edge)])                                           22600.990
      VertexStep(OUT,[flows_into],edge)                                                    22365.036
    PathFilterStep(cyclic,null,null)                                                         110.269
  JanusGraphVertexStep(OUT,[flows_into],edge)                     111731      111731        1103.404
    \_condition=type[flows_into]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=flows_into:SliceQuery[0x70C0,0x70C1)
    \_vertices=1
    optimization                                                                               0.016
    backend-query                                                      5                       1.155
    \_query=flows_into:SliceQuery[0x70C0,0x70C1)
    optimization                                                                               0.002
    optimization                                                                               0.006
    optimization                                                                               0.005
    optimization                                                                               0.014
    optimization                                                                               0.007
    ... this repeats many times
    optimization                                                                               0.003
    optimization                                                                               0.006
    optimization                                                                               0.008
    optimization                                                                               0.008
  DedupGlobalStep(null,null)                                       44683       44683         128.983
  EdgeVertexStep(IN)                                               44683       44683          70.857
  RepeatEndStep                                                    23219       23219       23426.519
PathStep                                                           23219       23219          30.952     0.07
UnfoldStep                                                        438411      438411         235.873     0.51
DedupGlobalStep(null,null)                                         71268       71268         298.530     0.64
GroupStep(label,[CountGlobalStep])                                     1           1       21041.233    45.38
  CountGlobalStep                                                      2           2         130.478
                                            >TOTAL                     -           -       46369.856        -

Answered by vtslab

Nov 25, 2022

Just to be sure that "storage.batch-loading=true" is not interfering with the query.batch setting. After initial loading of the graph, the proper setting is "storage.batch-loading=false".

View full answer

li-boxuan · 2022-11-22T15:33:59Z

li-boxuan
Nov 22, 2022
Maintainer

Can you show the profile result by adding .profile() at the end of your query?

You mentioned 13 nodes - how many of them are running JanusGraph server and how many of them are running HBase?

22 replies

Kukant Nov 25, 2022
Author

I I also tried .profile() again, but once I run the query with profile, it goes back to the old numbers - about 620 ops/s. The same query without .profile is way faster. Also, I still cannot see \multi=true under the JanusGraphVertexStep .

porunov Nov 25, 2022
Maintainer

Sorry I didn't follow the discussion above, so probably my comment out of the scope, but wanted to pointed out that multi query doesn't work for has and valueMap steps. There are opened tasks to resolve this, but they are not trivial to be implemented: MultiQuery optimization revamp (view)
In other words even if you have an index covering part of the filters there other part of the has filters will be applied in-memory with quite big time overhead because of sequential checks. Current workaround is to rewrite some queries using where steps instead of has steps as suggested by @li-boxuan here: #3244
Again, could be that this comment is out of context.

li-boxuan Nov 25, 2022
Maintainer

storage.batch-loading shouldn't interfere with query.batch since storage.batch-loading only impacts write path but not read path. So what you said looks very weird to me...

In many cases I am getting over 30k ops/s! However it is not very consistent, in other cases, I am getting closer to 800 ops/s.

Are you running the same (type of) queries?

li-boxuan Nov 25, 2022
Maintainer

Regarding parallelism, I don't think this is an HBase issue. @Kukant If you can try removing parts of your query one by one (e.g. remove everything after until step and see if it makes a difference), maybe you can find out why parallelism is not enabled. I just tried graph.traversal().V(v0).repeat(__.outE("out").dedup().inV()).profile() locally and did see parallelism was enabled.

vtslab Nov 26, 2022

For simplifying the query to get a groupcount of vertex labels, this should be enough:
g.V().has('name', 'xxx'). repeat( out('flows_into') ). until(cyclicPath()). emit(). dedup(). groupCount().by(label)
In particular the dedup() step inside the repeat loop may cause interference and is not required because of the bulking mechanism of TinkerPop. In the profile() output the groupcount also takes a long time, which I do not understand. Maybe it is a memory/GC issue. By emitting the vertices as they arrive memory usage may be reduced. This does not count the unique edges by label, though.

vtslab · 2022-11-25T06:45:50Z

vtslab
Nov 25, 2022

Just to be sure that "storage.batch-loading=true" is not interfering with the query.batch setting. After initial loading of the graph, the proper setting is "storage.batch-loading=false".

1 reply

Kukant Jan 25, 2023
Author

Weirdly, this is probably the answer.

Uh oh!

Janusgraph with HBase backend, graph traversal is slow #3335

Uh oh!

Uh oh!

Kukant Nov 22, 2022

Replies: 2 comments · 23 replies

Uh oh!

Uh oh!

li-boxuan Nov 22, 2022 Maintainer

Uh oh!

Kukant Nov 25, 2022 Author

Uh oh!

porunov Nov 25, 2022 Maintainer

Uh oh!

li-boxuan Nov 25, 2022 Maintainer

Uh oh!

li-boxuan Nov 25, 2022 Maintainer

Uh oh!

Uh oh!

vtslab Nov 26, 2022

Uh oh!

vtslab Nov 25, 2022

Uh oh!

Kukant Jan 25, 2023 Author

Kukant
Nov 22, 2022

Replies: 2 comments 23 replies

li-boxuan
Nov 22, 2022
Maintainer

Kukant Nov 25, 2022
Author

porunov Nov 25, 2022
Maintainer

li-boxuan Nov 25, 2022
Maintainer

li-boxuan Nov 25, 2022
Maintainer

vtslab
Nov 25, 2022

Kukant Jan 25, 2023
Author