Community Spotlights
Aug 20, 2021
NebulaGraph Source Code Explained: Validator
kyle
Architecture
NebulaGraph Query Engine consists of four major modules: Parser, Validator, Optimizer, and Executor.

Parser is responsible for performing lexical and syntax analysis on a statement and generating an AST (Abstract Syntax Tree). Validator aims to convert an AST into an execution plan. Optimizer optimizes the execution plan. And Executor is designed to perform computing on the data.
In this article, we will introduce how Validator is implemented.
Structure of Source Files
The directories for the source code of Validator are src/validatorand src/planner.
The src/validator directory contains the code for implementing the validators for the clauses, such as OrderByValidator, LimitValidator, and GoValidator.
The src/planner/plan directory defines the data structure of all the PlanNodes, aiming to generate the execution plans. For example, if a query statement contains aggregate functions, an Aggregatenode will be generated in the execution plan, and the Aggregateclass, defined in Query.h, specifies all the information necessary for the computation of the aggregate functions, including grouping keys and aggregate expressions. In NebulaGraph, more than 100 PlanNodes have been defined. For more information, see PlanNode∷kind in PlanNode.h.
Additionally, the src/planner directory contains the code for implementing the planners of nGQL and MATCH statements. These planners are used to generate the execution plans of nGQL and MATCH statements.
Explaining Source Code
The entry function of Validator is Validator::validate(Sentence*, QueryContext*). It is responsible for converting the AST, generated by Parser, into an execution plan. QueryContext stores the root node of the execution plan. Here is the source code of this function.
This function retrieves the graph space information of the current session, stores the information in ValidateContext, and then calls Validator::makeValidator() and Validator::validate().
Validator::makeValidator() aims to generate validators for clauses. It generates SequentialValidator, the entry of Validator. For each statement, SequentialValidator is generated first, and then a validator is recursively generated.
SequentialValidator::validateImpl() will call Validator::makeValidator() to generate an appropriate validator for each clause. Here is the source code of this function.
Similarly, PipeValidator, AssignmentValidator, and SetValidator will generate appropriate validators for the corresponding clauses.
Validator::validate() is responsible for generating an execution plan. The source code of this function is as follows.
This function does a check of the information of the graph space and authentication first, and then calls Validator:validateImpl() to validate the clauses. The validateImpl() function is a pure virtual function of the Validator class. It uses polymorphism to call different validatorImpl()functions for different clauses. Finally, the Validator::toPlan()function is called to generate an execution plan. The toPlan()function will generate subplans for the clauses and these subplans will be connected to form a complete execution plan for the statement. For example, for a MATCH statement, MatchPlanner::connectSegments() is called to connect all the subplans, but for an nGQL statement, Validator::appendPlan()is called for connection.
An Example
In this section, we will take an nGQL statement as an example to show the procedure.
Here is the example statement.
In the Validator phase of this example statement, all these three steps are necessary:
Generating validators for the clauses
Firstly, Validator::makeValidator() is called to generate SequentialValidator. In the SequentialValidator::validateImpl() function, PipeValidator will be generated, which produces validators for the clauses separated by the pipe symbol (|), that is, GoValidator and OrderByValidator.
Validating the clauses
In this step, the GO and ORDER BY clauses are validated separately.
Let's take the GO clause as an example. Firstly, it is verified for the semantic errors, such as invalid aggregate functions and type mismatch in expressions. Secondly, the subclauses inside it are validated one by one. During the validation phase, all the intermediate results are stored in GoContext, which will be used as the basis for GoPlanner to generate its subplan. For example, validateWhere() stores the filter condition expressions to generate the Filter node.
Generating the execution plan
The subplan of the GO statement is generated by GoPlanner∷transform(Astcontext*). Its source code is as follows.
This function calls QueryUtil::buildStart() to construct the Start node first. And then appropriate methods are used to generate plans for the four different steps. In this example statement, the nStepPlan strategy is used.
Here is the source code of GoPlanner::nStepsPlan().
Here is the subplan of the GO statement.
A GO statement aims to expand querying of a graph. GetNeighbors is the most important node in the execution plan. The GetNeighbors operator will access the storage service during the execution to retrieve the VID of the destination vertex according to the source vertex and the specified edge type. An expansion across multiple steps is implemented through the Loop node. Between the Start node and the Loop node, it is the subplans of Loop. When the condition is satisfied, the execution of these subplans will be looped. And then the last step of an expansion is executed outside Loop. The Project node aims to retrieve the VID of the destination vertex of such an expansion. The Dedup node is responsible for deduplicating the VIDs of the destination vertices, which will be used as the starting vertices of the next expansion. GetVertices is responsible for retrieving the tag properties of the destination vertices. The Filter node aims to filter vertices based on the conditions. The LeftJoin node works to connect the results of GetNeightbors and GetVertices.
The ORDER BY clause is designed to sort data. Its subplan generates the Sort node. After the plans for the clauses on the left and right sides of the pipe symbol (|) are generated, PipeValidator::toPlan() will call Validator::appendPlan()to connect these subplans to obtain the complete execution plan, which is shown as follows:
Relevant Questions
Q: How can I get the parser/GraphParser.hpp file?
A: A .h file will be generated after compiling.
Stay tuned for the next piece of the source code reading series.
Join our Slack channel if you want to discuss with the rest of the NebulaGraph community!

Go From Zero to Graph in Minutes
Spin Up Your NebulaGraph Cluster Instantly!