Dependency Graph based ExecutionStrategy #1409

gkesler · Jan 29, 2019

In order to simplify complexity of ExecutionStrategy, especially the batched one, it makes sense to re-implement it with a formal DependencyGraph.

GraphQL spec pretty much defines an algorithm to execute request. Unfortunately, if implemented literally, it leads to a famous N+1 problems.

DependencyGraph allows to separate concerns and focus strictly on the topological sorting of the field resolution instructions.
Additionally, looking forward, DependencyGraph based execution will drastically simplify:

resolving dependencies between multiple operations within the same request
evaluating search criteria
cascading mutations
cost analysis

@bbakerman @andimarek
This is a POC that DependencyGraph based execution works.
I would be happy to work with you to make it into the production.

- Added DependencyGraph basic tests - Added "start" and "finish" methods to TraverserVisitor and modified Traverser to fire "start" and "finish" events

- got rid of TraverserVisitor "start" and "finish" notifications to keep it simple - added default, size and copy constructors to DependencyGraph

…ap that improved reliability.

…Edge.disconnectEndpoints methods

…o resolve

…rsively parent contexts or provide default values - simplified field collection in ExecutionPlanBuilder to recursively search fr vars/results using TraverserContext.computeXXXIfAbsent methods

…salContext

…rContext

…ved ExecutionPlanBuilder inside it. - Created ExecutionStrategy interface - Created Execution class

…ertices when [automatically] resolved - updated ExecutionPlan to replace FieldVertices dependency on OperationVertex with dependency on DocumentVertex

- added DependencyGraphContext signature interface

- propagated DependencyGraphContext into topological sorting process

- added javadoc comments

andimarek · Jan 29, 2019

Thanks a lot @gkesler for this effort! Looking forward to look into it more, but give us some time :-)

A first glimpse showed a couple of changed files which are not really changes: could you make sure that you are using the graphql-java code style: https://github.com/graphql-java/graphql-java/blob/master/graphql-java-code-style.xml. Thanks!

gkesler · Jan 30, 2019

@andimarek just merged with latest and greatest from master.
Please, take a look.

gkesler · Jan 30, 2019

Unfortunately, I am a dinosaur using NetBeans.
Will try to get better at not touching files next time :)

src/main/java/graphql/util/DependencyGraph.java

…olved nodes from traversing - reverted a fragment in Traverser to add to visited *after* visiting a node

andimarek · Apr 5, 2019

Hi @gkesler,

sorry for the long delay.

One thing I would be interested is how Interfaces/Union will work with that:

As far as I understand the ExecutionPlan will be created and then executed in the ExecutionStrategy.

But Interfaces/Union can only be evaluated while executing, so I would expect that some kind of feedback loop into the ExecutionPlan must happen while executing. But this would somehow contradict the idea of the prepared ExecutionPlan. What do you think?

Thanks again for this PR.

P.S.: My idea complains a couple of times about Cannot infer functional interface type. @exbe do you have similar problems?

gkesler · Apr 8, 2019

@andimarek
Thank you for the question.
It is my understanding that strong typing in GraphQL helps to predict the execution upfront.
This helps to build an execution plan that contains vertices for each field.
Then execution of the query is a simple topological sorting over the vertices in the execution plan. Topological sorting operation is implemented in https://github.com/graphql-java/graphql-java/pull/1409/files#diff-863135c7b81c8c2aef712d485b3d6f5a L.157.
Each vertex before placing into a "closure" set of vertices to be resolved externally is verified if it can be "autoClosed". That gives an opportunity to the vertex and the client code to inject an auto closing behavior.

As I understand the problem statement, during execution of a query that references interface/union fields some of the execution paths may be not executed due to runtime type resolution of fragments.
If there would be no way to alter topological sorting than we would never have completed the sequence due to some unsatisfied dependencies caused by runtime selection of fragments to execute.

This could be easily implemented in the ExecutionPlan with adding a new type of Vertex, call it ConditionalVertex, that would be recorded into the execution plan for every inline fragment or fragment spread and its auto close method (it is delegated to ExecutionPlanContext) and it is implemented for FieldVertex not to resolve further in case its source is empty or null.

Does this contradict the idea of prepared execution plan? I think it perfectly fits in it. The plan represents a query execution state machine, i.e. it has all possible states and state transitions to guarantee execution of the query. State transitions can have guard conditions that trigger them only under some circumstances. State machine execution is finished when state machine reaches the final state. During the execution not every state might be entered, it depends on the particular sequence of triggering events/accumulated data.

andimarek · Apr 13, 2019

thanks @gkesler for the explanation.

I wanna go through some steps to document the idea here a bit and to understand the whole approach better.

First thing:

Taking the example test with the following query:

we get the following execution plan:

and the direction of the edge is the dependency: so foo/id depends on foo to be resolved first.

That correct?

gkesler · Apr 16, 2019

accurate picture @andimarek

Dependency arrow points at the vertex resolved first. So in foo/id --> foo dependency, vertex foo is resolved first and then foo/id.

The execution of the query is based on topological sorting of graph vertices.
Vertices with the least dependencies resolved before their dependent vertices.
So in this example the dependency resolution is:

"document" --> has 0 dependencies (outbound edges).
"foo" --> depends only on "document"
"foo/id", "bar" --> depend on "foo"
"bar/id", "name" --> depend on "bar"
"operation" --> depends on all of them.

Please note, I have introduced dependencies "foo" --> "document", "operation" --> "document" and "operation" --> all nodes for the convenience. Each dependency edge is associated with an action that gets executed when source vertex is resolved (either via auto resolution or external field resolution). This way when "document" vertex is auto resolved its dependent "operation" vertex is notified and at that time we can inject some bootstrapping code to prepare operation execution, for instance initializing root and context values for the operation execution.
Similarly, when "document" vertex is resolved its dependent "foo" vertex is notified and at that time the code that obtains type of "foo" field definition from the root query type and prepares source object for "foo" data fetcher.
And so on, each time a field vertex is resolved a corresponding action is executed to set up correct context to the dependent field vertices.

"operation" vertex depends on each "field" vertices mainly because of 2 reasons:

"operation" vertex can oversee the whole result of the operation and this way it can verify validity of the output and join results.
it is very convenient to make "operation" vertex to depend on all fields in order to ensure it is resolved after all field vertices are resolved so the resolution code could finalize results and perform cleanup if necessary.

amatiushkin · Apr 17, 2019

@gkesler I have some trouble to understand "foo" -> "document" dependency.
GraphQL document is a file or request string and might represent executable and type system definitions. Namely it might have several operations, some type definitions and extensions.
What is an example of resolved "document" vertex? How dependency graph would look like for document which defines Foo type and have two operations?

PS
I assume documentVertex represents document as in the spec.

gkesler · Apr 17, 2019

@amatiushkin
"document" here is not a file nor request string.
"document" vertex wraps "document" AST node, which is the root of AST.
I am using this node to bootstrap the execution, for instance initialize "foo" node GraphQL type, set "source" and "root" values etc. without making a special case, but instead using the generic dependency graph topological sorting algorithm.

It does not have to be this way, but it is quite convenient.

When a vertex is resolved, the algorithm executes actions associated with all that vertex outbound edges. Those actions prepare contexts for the next vertices resolutions. As well as actions associated with edges that point to "operation" vertex perform join functions.

"document" vertex does not require external resolution, i.e. data fetching. Therefore it is resolved automatically. Consider it to be a state machine "initial" state. When resolved (state entered) it fires action implemented in https://github.com/graphql-java/graphql-java/pull/1409/files#diff-42fc714ebd9e1cc4e5a3edd8801738d1 L.102 that sets up source for the very first vertex. Please note that action is different from the consequent similar actions between field vertices.

gkesler · Apr 17, 2019

ah, one more. In case there are 2 operations, the graph would look like below

And the topological sorting would produce this sequence:

"document"
"foo" (1), "foo" (2)
"foo/id" (1), "bar" (1), "foo/id" (2), "bar (2)
"bar/id" (1), "name" (1), "bar/id" (2), "name" (2)
"operation1", "operation2"

andimarek · Apr 20, 2019

thanks @gkesler @amatiushkin .

I would like to see how we can model the following query:

# Write your query or mutation here
{
  nodes {
    id
    ... on Cat {
      eatsTuna
      friends {
        ... on Human {
          name
        }
      }
    }
    ... on Dog {
      breed
      friends {
        ... on Cat {
          name
        }
      }
    }
  }
}

with the following schema:

type Query {
  nodes: [Node]
}
interface Node {
  id: ID
}
type Human implements Node {
  id: ID
  name: String
}
type Dog implements Node {
  id: ID
  breed: String
  friends: [Node]
}

type Cat implements Node {
  id: ID
  eatsTuna: Bool
  friends: [Node]
}

I see the challenge here in the combination of Fragments and lists: nodes and friends return a list of Node and for each returned value in the list it needs to be determined which types it is.

gkesler · Apr 22, 2019

@andimarek, thanks a lot for the example.
Here is a dependency graph I have created out of it.

Please, note, I have added 1 more vertex type (in yellow on the diagram) associated with fragment conditions. These nodes don't need an external resolution via sending external requests, but instead they resolve automatically to the list of source objects if the type condition match or to empty list if it does not, so the dependent vertices won't need resolution.

The topological order is:

"document"
"nodes"
"id", "fragment Cat", "fragment Dog"
"eatsTuna", "fields" (Cat), "breed", "friends" (Dog)
"fragment Human", "fragment Cat"
"name" (Human), "name" (Cat)
"operation"

So everything, except "fragment" vertices is straightforward and similar to what is in the PR. Each dependent vertex receives a list of resolved sources from its dependency.
Let's consider how "fragment" vertex works.

Let's consider "fragment Cat" in #3.

"fragment Cat" vertex depends on "nodes" vertex and therefore it will be topologically ordered after "nodes" vertex/field is resolved.
after "nodes" vertex is resolved, "fragment Cat" vertex will receive a list of Node objects received in the previous step via call ExecutionPlanContext.prepareResolve where it can examine the content of the list of Node objects and copy instances of Cat interface into the list of sources for this vertex. If there are no Cat instances, the list of sources is empty.
"fragment Cat" vertex then is called to see if it can be auto resolved via the call to ExecutionPlanContext.resolve method. For 'FragmentVertex' that method can simply copy vertex sources into the vertex results and mark the vertex as auto resolved, as fragments don't have their own DataFetchers.
after "fragment Cat" vertex is resolved, the dependent vertices "eatsTuna" and "friends" will receive their lists of sources via ExecutionPlanContext.prepareResolve call.
topological ordering will then call ExecutionPlanContext.resolve to see if vertices "eatsTuna" and "friends" can be auto resolved. If their list of sources is empty, there is nothing to resolve and these vertices will report they are resolved automatically. Otherwise the topological ordering algorithm will select these vertices into the next closure set and resolve them externally via their DataFetchers.
...
and so on.
Here is a sequence diagram that demonstrates this mechanism.

Topological ordering is essentially a BFS traversal that keeps going a particular path until it sees a vertex that cannot be automatically resolved. So, in case of new FragmentVertices if a fragment should not be activated, all its dependents are resolved automatically and then added to the resolved set at one shot when calculating DependencyIterator.next() closure.

andimarek · Apr 26, 2019

thanks @gkesler. I see how this could work for the described query.

Let me now add the final complexity: merged fields. I think in order build the full dependency graph for a general query we need to analyze it more than we do so far: just traversing is not enough.

We actually need to operate on merged fields, without actually executing the query.

For example:

query myQyery($myVar: Boolean) {
  nodes {
    id
    ... on Cat {
      eatsTuna
      friends {
        ... on Human {
          name
        }
        ... @skip(if: $myVar) {
          ... on Cat {
            ...CatF1
            friends {
              ... on Human {
                name @skip(if: $myVar)
              }
            }
          }
        }
      }
      ... {
        friends {
          id
        }
      }
      friends {
        id
        ... on Cat {
          id
        }
      }
    }
    ... on Dog {
      breed
      friends {
        ...CatF1
        ...CatF2
      }
    }
  }
}

fragment CatF1 on Cat {
  eatsTuna
  friends {
    ...CatF2
  }
}

fragment CatF2 on Cat {
  eatsTuna
}

Sorry for this rather complex query.

The point is: fields must be merged together and Fragments need to be inlined in order to build the correct dependency graph.

A simpler example is:

query myQyery($myVar: Boolean) {
  nodes {
    id
    ... {
      id
    }
    id @skip(if: $myVar)
    ... on Human {
      id
    }
    ... on Cat {
      id 
    }
  }
}

Here id is declared 5 times but it represents only one field id, but for different Objects at runtime (for Cat and Human it is explicit, but for Dog it is implicit based on the Interface)

I think building a traverser which gives us a merged fields dependency tree is a pre condition for this kind of execution strategy. I always wanted to build such an analyzer, because it also useful for other use cases, but we haven't done it so far.

gkesler · Apr 26, 2019

@andimarek thanks for the new challenging question.

The major benefit of using dependency graph is that once the graph is populated, it is always executed using the very same algorithm based on the topological sorting. IMHO, this leads to an interesting consequence that the linked graph could be transformed into a matrix, which can drastically reduce execution overhead.

Merging fields capability is yet great benefit of the graph.
The graph is a set of vertices that could be obtained from the graph by their keys. And I believe can help us to achieve what you are asking about.

Let's assume we have created a FieldVertex for field nodes. The graph builder
traverser then could do the following:

visit nodes.id field and create FieldVertex("nodes.id") that depends on FieldVertex("nodes").
visit inline fragment ..., detect that it does not have type condition and therefore it could simply inline its fields. The builder will inline id field into nodes.id. Since FieldVertex("nodes.id") already exists, the builder would do nothing. PS. It is also possible to record this fragment exactly as the other ones in this example graphql-java on android: DexException #4 and Fix typo in method name #5, so its id field is treated differently from nodes.id. Fragment without type condition simply has an implicit predicate that always returns true, i.e. such fragment is always activated (included).
visit next field, which is another nodes.id. FieldVertex("nodes.id") already exists. We can evaluate @skip directive against current variables and mark FieldVertex("nodes.id") as already resolved so it won't be evaluated. PS. In case we precompile the query into a matrix and reuse it between multiple calls, this directive as well as @include would turn into the FieldVertex predicate that is evaluated during topological sorting rather than during building the graph.
visit fragment ... on Human and create FragmentVertext("... on Human"). All fragment fields will be converted into FieldVertices that depend on the conditional resolution of that FragmentVertex. As I understand it, @skip directive does not affect id field specified in this fragment.
visit fragment ...on Cat and create FragmentVertex("... on Cat"). The builder will repeat the same steps as in graphql-java on android: DexException #4.

andimarek · Apr 27, 2019

@gkesler yes I agree that the ExecutionPlan is probably the right place to create exactly this kind of MergedField based dependency graph.
Maybe we could separate this in an extra PR which just deals with the dependency graph? I think this is a big enough task on its own.

gkesler · Apr 29, 2019

absolutely @andimarek.

Let me create a PR that just adds DependencyGraph.
Then we'll work on the proper population of it in the ExecutionStrategy.

thanks!

andimarek · Jul 11, 2021

@gkesler this is a long overdue, but we introduced and successfully used a new structure called NormalizedExecutableOperation
and ExecutableNormalizedField

I believe that this is the right structure to be used as input for creating an execution plan.

gkesler added 30 commits December 28, 2018 23:27

- DependencyGraph initial implementation

0ba108b

- Added DependencyGraph basic tests - Added "start" and "finish" methods to TraverserVisitor and modified Traverser to fire "start" and "finish" events

- simplified Vertex definition not to rely on a DependencyGraph

17f6a30

- got rid of TraverserVisitor "start" and "finish" notifications to keep it simple - added default, size and copy constructors to DependencyGraph

- refactored DependencyGraph.edges set to be backed by its vertices m…

77a20a2

…ap that improved reliability.

- further simplified implementation, added Edge.connectEndpoints and …

61ae699

…Edge.disconnectEndpoints methods

- some code cleanup

73cfcbc

- removed redundant check when calculating next closure of vertices t…

4a4340d

…o resolve

- added accessor to Edge.action property

931b7fe

- initial implementation of ExecuitonPlanBuilder

1788694

- saving current state

700b2ac

- saving current state

231d4fb

- saving current state

c6d8f04

- added computeXXXIfAbsent methods to TraverserContext to search recu…

5a43f28

…rsively parent contexts or provide default values - simplified field collection in ExecutionPlanBuilder to recursively search fr vars/results using TraverserContext.computeXXXIfAbsent methods

- code cleanu

f00e617

- added TriXXX functors

8a918f6

- saving current state

9b98db9

- updated traverse managers to allow custom implementations of Traver…

15a3204

…salContext

- refactored traversal in ExecutionPlanBuilder to use custom Traverse…

b26d5f4

…rContext

- saving current state

6f7cd5e

- removed newly introduced unused funators

5d06db0

eaned up ExecutionPlanBuilder implementation

48f4d5d

- saving current state

60ae13b

- code cleanup

e74310a

- some code/expressions simplifications

ef8d638

- saving current state

882b99d

- Created ExecutionPlan class as a subclass of DependencyGraph and mo…

4125068

…ved ExecutionPlanBuilder inside it. - Created ExecutionStrategy interface - Created Execution class

- introduced DocumentVertex that passes its [root] data to the FieldV…

c2a94c0

…ertices when [automatically] resolved - updated ExecutionPlan to replace FieldVertices dependency on OperationVertex with dependency on DocumentVertex

- added functional interfaces

5790160

- added DependencyGraphContext signature interface

- added @FunctionalInterface annotation to TernaryOPerator

d96e7f1

- added DependencyGraphContext interface

1fb8f7f

- propagated DependencyGraphContext into topological sorting process

- saving current state

03e052d

gkesler added 4 commits January 27, 2019 15:28

- code cleanup

4af8a2f

- simplifications and code cleanup

be7dad8

- code cleanup

48cd644

- added javadoc comments

- code cleanup

7eabd64

Merge branch 'master' into gkesler-dependency-graph

48a1eb2

- fixed formatting to reduce number of really changed files

cb3f38f

exbe reviewed Jan 30, 2019

View reviewed changes

src/main/java/graphql/util/DependencyGraph.java Show resolved Hide resolved

exbe reviewed Jan 30, 2019

View reviewed changes

src/main/java/graphql/util/DependencyGraph.java Show resolved Hide resolved

- corrected DependencyGraph traverser to automatically excluide unres…

92f5071

…olved nodes from traversing - reverted a fragment in Traverser to add to visited *after* visiting a node

andimarek added the Not to be merged spikes or other stuff that should never or not yet to be merged label Feb 6, 2019

Merge branch 'master' into gkesler-dependency-graph

32da077

andimarek mentioned this pull request Jun 10, 2019

Dependency graph design #1566

Closed

dondonz closed this May 17, 2022

Search code, repositories, users, issues, pull requests...

Dependency Graph based ExecutionStrategy #1409

Dependency Graph based ExecutionStrategy #1409

Uh oh!

Conversation

gkesler commented Jan 29, 2019

Uh oh!

andimarek commented Jan 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkesler commented Jan 30, 2019

Uh oh!

gkesler commented Jan 30, 2019

Uh oh!

Uh oh!

Uh oh!

andimarek commented Apr 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkesler commented Apr 8, 2019

Uh oh!

andimarek commented Apr 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkesler commented Apr 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amatiushkin commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkesler commented Apr 17, 2019

Uh oh!

gkesler commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andimarek commented Apr 20, 2019

Uh oh!

gkesler commented Apr 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andimarek commented Apr 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkesler commented Apr 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andimarek commented Apr 27, 2019

Uh oh!

gkesler commented Apr 29, 2019

Uh oh!

andimarek commented Jul 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andimarek commented Jan 29, 2019 •

edited

Loading

andimarek commented Apr 5, 2019 •

edited

Loading

andimarek commented Apr 13, 2019 •

edited

Loading

gkesler commented Apr 16, 2019 •

edited

Loading

amatiushkin commented Apr 17, 2019 •

edited

Loading

gkesler commented Apr 17, 2019 •

edited

Loading

gkesler commented Apr 22, 2019 •

edited

Loading

andimarek commented Apr 26, 2019 •

edited

Loading

gkesler commented Apr 26, 2019 •

edited

Loading