Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@gkesler
Copy link
Contributor

@gkesler gkesler commented Jan 29, 2019

In order to simplify complexity of ExecutionStrategy, especially the batched one, it makes sense to re-implement it with a formal DependencyGraph.

GraphQL spec pretty much defines an algorithm to execute request. Unfortunately, if implemented literally, it leads to a famous N+1 problems.

DependencyGraph allows to separate concerns and focus strictly on the topological sorting of the field resolution instructions.
Additionally, looking forward, DependencyGraph based execution will drastically simplify:

  1. resolving dependencies between multiple operations within the same request
  2. evaluating search criteria
  3. cascading mutations
  4. cost analysis

@bbakerman @andimarek
This is a POC that DependencyGraph based execution works.
I would be happy to work with you to make it into the production.

- Added DependencyGraph basic tests
- Added "start" and "finish" methods to TraverserVisitor and modified Traverser to fire "start" and "finish" events
- got rid of TraverserVisitor "start" and "finish" notifications to keep it simple
- added default, size and copy constructors to DependencyGraph
…rsively parent contexts or provide default values

- simplified field collection in ExecutionPlanBuilder to recursively search fr vars/results using TraverserContext.computeXXXIfAbsent methods
…ved ExecutionPlanBuilder inside it.

- Created ExecutionStrategy interface
- Created Execution class
…ertices when [automatically] resolved

- updated ExecutionPlan to replace FieldVertices dependency on OperationVertex with dependency on DocumentVertex
- added DependencyGraphContext signature interface
- propagated DependencyGraphContext into topological sorting process
@andimarek
Copy link
Member

andimarek commented Jan 29, 2019

Thanks a lot @gkesler for this effort! Looking forward to look into it more, but give us some time :-)

A first glimpse showed a couple of changed files which are not really changes: could you make sure that you are using the graphql-java code style: https://github.com/graphql-java/graphql-java/blob/master/graphql-java-code-style.xml. Thanks!

@gkesler
Copy link
Contributor Author

gkesler commented Jan 30, 2019

@andimarek just merged with latest and greatest from master.
Please, take a look.

@gkesler
Copy link
Contributor Author

gkesler commented Jan 30, 2019

Unfortunately, I am a dinosaur using NetBeans.
Will try to get better at not touching files next time :)

…olved nodes from traversing

- reverted a fragment in Traverser to add to visited *after* visiting a node
@andimarek andimarek added the Not to be merged spikes or other stuff that should never or not yet to be merged label Feb 6, 2019
@andimarek
Copy link
Member

andimarek commented Apr 5, 2019

Hi @gkesler,

sorry for the long delay.

One thing I would be interested is how Interfaces/Union will work with that:

As far as I understand the ExecutionPlan will be created and then executed in the ExecutionStrategy.

But Interfaces/Union can only be evaluated while executing, so I would expect that some kind of feedback loop into the ExecutionPlan must happen while executing. But this would somehow contradict the idea of the prepared ExecutionPlan. What do you think?

Thanks again for this PR.

P.S.: My idea complains a couple of times about Cannot infer functional interface type. @exbe do you have similar problems?

@gkesler
Copy link
Contributor Author

gkesler commented Apr 8, 2019

@andimarek
Thank you for the question.
It is my understanding that strong typing in GraphQL helps to predict the execution upfront.
This helps to build an execution plan that contains vertices for each field.
Then execution of the query is a simple topological sorting over the vertices in the execution plan. Topological sorting operation is implemented in https://github.com/graphql-java/graphql-java/pull/1409/files#diff-863135c7b81c8c2aef712d485b3d6f5a L.157.
Each vertex before placing into a "closure" set of vertices to be resolved externally is verified if it can be "autoClosed". That gives an opportunity to the vertex and the client code to inject an auto closing behavior.

As I understand the problem statement, during execution of a query that references interface/union fields some of the execution paths may be not executed due to runtime type resolution of fragments.
If there would be no way to alter topological sorting than we would never have completed the sequence due to some unsatisfied dependencies caused by runtime selection of fragments to execute.

This could be easily implemented in the ExecutionPlan with adding a new type of Vertex, call it ConditionalVertex, that would be recorded into the execution plan for every inline fragment or fragment spread and its auto close method (it is delegated to ExecutionPlanContext) and it is implemented for FieldVertex not to resolve further in case its source is empty or null.

Does this contradict the idea of prepared execution plan? I think it perfectly fits in it. The plan represents a query execution state machine, i.e. it has all possible states and state transitions to guarantee execution of the query. State transitions can have guard conditions that trigger them only under some circumstances. State machine execution is finished when state machine reaches the final state. During the execution not every state might be entered, it depends on the particular sequence of triggering events/accumulated data.

@andimarek
Copy link
Member

andimarek commented Apr 13, 2019

thanks @gkesler for the explanation.

I wanna go through some steps to document the idea here a bit and to understand the whole approach better.

First thing:

Taking the example test with the following query:

 {foo {
            id
            bar {
                id
                name
            }
        }}

we get the following execution plan:

Screen Shot 2019-04-13 at 11 42 58 am

and the direction of the edge is the dependency: so foo/id depends on foo to be resolved first.

That correct?

@gkesler
Copy link
Contributor Author

gkesler commented Apr 16, 2019

accurate picture @andimarek

Dependency arrow points at the vertex resolved first. So in foo/id --> foo dependency, vertex foo is resolved first and then foo/id.

The execution of the query is based on topological sorting of graph vertices.
Vertices with the least dependencies resolved before their dependent vertices.
So in this example the dependency resolution is:

  1. "document" --> has 0 dependencies (outbound edges).
  2. "foo" --> depends only on "document"
  3. "foo/id", "bar" --> depend on "foo"
  4. "bar/id", "name" --> depend on "bar"
  5. "operation" --> depends on all of them.

Please note, I have introduced dependencies "foo" --> "document", "operation" --> "document" and "operation" --> all nodes for the convenience. Each dependency edge is associated with an action that gets executed when source vertex is resolved (either via auto resolution or external field resolution). This way when "document" vertex is auto resolved its dependent "operation" vertex is notified and at that time we can inject some bootstrapping code to prepare operation execution, for instance initializing root and context values for the operation execution.
Similarly, when "document" vertex is resolved its dependent "foo" vertex is notified and at that time the code that obtains type of "foo" field definition from the root query type and prepares source object for "foo" data fetcher.
And so on, each time a field vertex is resolved a corresponding action is executed to set up correct context to the dependent field vertices.

"operation" vertex depends on each "field" vertices mainly because of 2 reasons:

  1. "operation" vertex can oversee the whole result of the operation and this way it can verify validity of the output and join results.
  2. it is very convenient to make "operation" vertex to depend on all fields in order to ensure it is resolved after all field vertices are resolved so the resolution code could finalize results and perform cleanup if necessary.

@amatiushkin
Copy link
Contributor

amatiushkin commented Apr 17, 2019

@gkesler I have some trouble to understand "foo" -> "document" dependency.
GraphQL document is a file or request string and might represent executable and type system definitions. Namely it might have several operations, some type definitions and extensions.
What is an example of resolved "document" vertex? How dependency graph would look like for document which defines Foo type and have two operations?

PS
I assume documentVertex represents document as in the spec.

@gkesler
Copy link
Contributor Author

gkesler commented Apr 17, 2019

@amatiushkin
"document" here is not a file nor request string.
"document" vertex wraps "document" AST node, which is the root of AST.
I am using this node to bootstrap the execution, for instance initialize "foo" node GraphQL type, set "source" and "root" values etc. without making a special case, but instead using the generic dependency graph topological sorting algorithm.

It does not have to be this way, but it is quite convenient.

When a vertex is resolved, the algorithm executes actions associated with all that vertex outbound edges. Those actions prepare contexts for the next vertices resolutions. As well as actions associated with edges that point to "operation" vertex perform join functions.

"document" vertex does not require external resolution, i.e. data fetching. Therefore it is resolved automatically. Consider it to be a state machine "initial" state. When resolved (state entered) it fires action implemented in https://github.com/graphql-java/graphql-java/pull/1409/files#diff-42fc714ebd9e1cc4e5a3edd8801738d1 L.102 that sets up source for the very first vertex. Please note that action is different from the consequent similar actions between field vertices.

@gkesler
Copy link
Contributor Author

gkesler commented Apr 17, 2019

ah, one more. In case there are 2 operations, the graph would look like below

image

And the topological sorting would produce this sequence:

  1. "document"
  2. "foo" (1), "foo" (2)
  3. "foo/id" (1), "bar" (1), "foo/id" (2), "bar (2)
  4. "bar/id" (1), "name" (1), "bar/id" (2), "name" (2)
  5. "operation1", "operation2"

@andimarek
Copy link
Member

thanks @gkesler @amatiushkin .

I would like to see how we can model the following query:

# Write your query or mutation here
{
  nodes {
    id
    ... on Cat {
      eatsTuna
      friends {
        ... on Human {
          name
        }
      }
    }
    ... on Dog {
      breed
      friends {
        ... on Cat {
          name
        }
      }
    }
  }
}

with the following schema:

type Query {
  nodes: [Node]
}
interface Node {
  id: ID
}
type Human implements Node {
  id: ID
  name: String
}
type Dog implements Node {
  id: ID
  breed: String
  friends: [Node]
}

type Cat implements Node {
  id: ID
  eatsTuna: Bool
  friends: [Node]
}

I see the challenge here in the combination of Fragments and lists: nodes and friends return a list of Node and for each returned value in the list it needs to be determined which types it is.

@gkesler
Copy link
Contributor Author

gkesler commented Apr 22, 2019

@andimarek, thanks a lot for the example.
Here is a dependency graph I have created out of it.

image

Please, note, I have added 1 more vertex type (in yellow on the diagram) associated with fragment conditions. These nodes don't need an external resolution via sending external requests, but instead they resolve automatically to the list of source objects if the type condition match or to empty list if it does not, so the dependent vertices won't need resolution.

The topological order is:

  1. "document"
  2. "nodes"
  3. "id", "fragment Cat", "fragment Dog"
  4. "eatsTuna", "fields" (Cat), "breed", "friends" (Dog)
  5. "fragment Human", "fragment Cat"
  6. "name" (Human), "name" (Cat)
  7. "operation"

So everything, except "fragment" vertices is straightforward and similar to what is in the PR. Each dependent vertex receives a list of resolved sources from its dependency.
Let's consider how "fragment" vertex works.

Let's consider "fragment Cat" in #3.

  • "fragment Cat" vertex depends on "nodes" vertex and therefore it will be topologically ordered after "nodes" vertex/field is resolved.
  • after "nodes" vertex is resolved, "fragment Cat" vertex will receive a list of Node objects received in the previous step via call ExecutionPlanContext.prepareResolve where it can examine the content of the list of Node objects and copy instances of Cat interface into the list of sources for this vertex. If there are no Cat instances, the list of sources is empty.
  • "fragment Cat" vertex then is called to see if it can be auto resolved via the call to ExecutionPlanContext.resolve method. For 'FragmentVertex' that method can simply copy vertex sources into the vertex results and mark the vertex as auto resolved, as fragments don't have their own DataFetchers.
  • after "fragment Cat" vertex is resolved, the dependent vertices "eatsTuna" and "friends" will receive their lists of sources via ExecutionPlanContext.prepareResolve call.
  • topological ordering will then call ExecutionPlanContext.resolve to see if vertices "eatsTuna" and "friends" can be auto resolved. If their list of sources is empty, there is nothing to resolve and these vertices will report they are resolved automatically. Otherwise the topological ordering algorithm will select these vertices into the next closure set and resolve them externally via their DataFetchers.
    ...
    and so on.
    Here is a sequence diagram that demonstrates this mechanism.

image

Topological ordering is essentially a BFS traversal that keeps going a particular path until it sees a vertex that cannot be automatically resolved. So, in case of new FragmentVertices if a fragment should not be activated, all its dependents are resolved automatically and then added to the resolved set at one shot when calculating DependencyIterator.next() closure.

@andimarek
Copy link
Member

andimarek commented Apr 26, 2019

thanks @gkesler. I see how this could work for the described query.

Let me now add the final complexity: merged fields. I think in order build the full dependency graph for a general query we need to analyze it more than we do so far: just traversing is not enough.

We actually need to operate on merged fields, without actually executing the query.

For example:

query myQyery($myVar: Boolean) {
  nodes {
    id
    ... on Cat {
      eatsTuna
      friends {
        ... on Human {
          name
        }
        ... @skip(if: $myVar) {
          ... on Cat {
            ...CatF1
            friends {
              ... on Human {
                name @skip(if: $myVar)
              }
            }
          }
        }
      }
      ... {
        friends {
          id
        }
      }
      friends {
        id
        ... on Cat {
          id
        }
      }
    }
    ... on Dog {
      breed
      friends {
        ...CatF1
        ...CatF2
      }
    }
  }
}

fragment CatF1 on Cat {
  eatsTuna
  friends {
    ...CatF2
  }
}

fragment CatF2 on Cat {
  eatsTuna
}

Sorry for this rather complex query.

The point is: fields must be merged together and Fragments need to be inlined in order to build the correct dependency graph.

A simpler example is:

query myQyery($myVar: Boolean) {
  nodes {
    id
    ... {
      id
    }
    id @skip(if: $myVar)
    ... on Human {
      id
    }
    ... on Cat {
      id 
    }
  }
}

Here id is declared 5 times but it represents only one field id, but for different Objects at runtime (for Cat and Human it is explicit, but for Dog it is implicit based on the Interface)

I think building a traverser which gives us a merged fields dependency tree is a pre condition for this kind of execution strategy. I always wanted to build such an analyzer, because it also useful for other use cases, but we haven't done it so far.

@gkesler
Copy link
Contributor Author

gkesler commented Apr 26, 2019

@andimarek thanks for the new challenging question.

The major benefit of using dependency graph is that once the graph is populated, it is always executed using the very same algorithm based on the topological sorting. IMHO, this leads to an interesting consequence that the linked graph could be transformed into a matrix, which can drastically reduce execution overhead.

Merging fields capability is yet great benefit of the graph.
The graph is a set of vertices that could be obtained from the graph by their keys. And I believe can help us to achieve what you are asking about.

Let's assume we have created a FieldVertex for field nodes. The graph builder
traverser then could do the following:

  1. visit nodes.id field and create FieldVertex("nodes.id") that depends on FieldVertex("nodes").
  2. visit inline fragment ..., detect that it does not have type condition and therefore it could simply inline its fields. The builder will inline id field into nodes.id. Since FieldVertex("nodes.id") already exists, the builder would do nothing. PS. It is also possible to record this fragment exactly as the other ones in this example graphql-java on android: DexException #4 and Fix typo in method name #5, so its id field is treated differently from nodes.id. Fragment without type condition simply has an implicit predicate that always returns true, i.e. such fragment is always activated (included).
  3. visit next field, which is another nodes.id. FieldVertex("nodes.id") already exists. We can evaluate @skip directive against current variables and mark FieldVertex("nodes.id") as already resolved so it won't be evaluated. PS. In case we precompile the query into a matrix and reuse it between multiple calls, this directive as well as @include would turn into the FieldVertex predicate that is evaluated during topological sorting rather than during building the graph.
  4. visit fragment ... on Human and create FragmentVertext("... on Human"). All fragment fields will be converted into FieldVertices that depend on the conditional resolution of that FragmentVertex. As I understand it, @skip directive does not affect id field specified in this fragment.
  5. visit fragment ...on Cat and create FragmentVertex("... on Cat"). The builder will repeat the same steps as in graphql-java on android: DexException #4.

@andimarek
Copy link
Member

@gkesler yes I agree that the ExecutionPlan is probably the right place to create exactly this kind of MergedField based dependency graph.
Maybe we could separate this in an extra PR which just deals with the dependency graph? I think this is a big enough task on its own.

@gkesler
Copy link
Contributor Author

gkesler commented Apr 29, 2019

absolutely @andimarek.

Let me create a PR that just adds DependencyGraph.
Then we'll work on the proper population of it in the ExecutionStrategy.

thanks!

@andimarek
Copy link
Member

@gkesler this is a long overdue, but we introduced and successfully used a new structure called NormalizedExecutableOperation
and ExecutableNormalizedField

I believe that this is the right structure to be used as input for creating an execution plan.

@dondonz dondonz closed this May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Not to be merged spikes or other stuff that should never or not yet to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.