Community
Pick of the Week at NebulaGraph - Schema design in NebulaGraph
Normally the weekly issue covers NebulaGraph Updates and Community Q&As. If something major happens, it will also be covered in the additional Events of the Week section.
Events of the Week
- NebulaGraph v2.0.0-beta release note
NebulaGraph V2.0.0-beta has been released. This release supports full-text indexing, data statistics, and other new features. Also, NebulaGraph Studio and Go Importer now support NebulaGraph 2.x versions.
To get more information, please read the release note.
- DB-Engines Ranking Has Been Updated
NebulaGraph has risen 3 places and jumped to #15 in the latest DB-Engines ranking.
NebulaGraph Updates
The updates of Nebula in the last week:
Supports the DeleteRange operation in RocksDB to greatly improves the efficiency of edge deletion. Tags:
Version 1.x
,Optimization
. For more information, see PR#2404.Fixed the issue where using
FETCH PROP ON
on timestamp properties outputs int64 results. Tags:Version 1.x
,bug fix
. For more information, see PR#2389.
Community Q&A
This week's topic is from community user @panda about schema design in NebulaGraph.
Spark Writer Configuration Suggestions
Before using Spark Write to import data, we need to configure application.conf
.
@panda: I am not familiar with graph databases. I used to use MySQL and MongoDB. Now I am having some trouble designing schemas in NebulaGraph. Suppose we have the following schema (written in GraphQL for convenience):
// User list
type User {
name: String,
followings: [User],
followers: [User],
posts: [Post],
topics: [Topic]
}
// Topic list
type Topic {
name: String,
description: String,
user: User,
members: [Member]
posts: [User]
}
// Post list
type Post {
text: String,
member: Member,
topic: Topic
}
// Topic member list
type Member {
user: User,
topic: Topic,
name: String,
level: Int,
join_date: DateTime,
posts: [Post],
}
If we design a schema in NebulaGraph based on the preceding data, since there is no concept of table association and we cannot set the associated fields, what is the right way to do this?
Can we retrieve the required scalar and associated fields all at once? Please advise.
Now we are using NebulaGraph v2.x.
NebulaGraph: You can set associated fields as edges in NebulaGraph. For example, the followings
field indicates that a user follows another user, i.e., User—following—>User
. So you can set the following
field as an edge type, and insert edges (relationships) of this type to connect the vertices representing the users.
@panda: I know these concepts, but this is not what I'm asking. Here are what really puzzle me:
- Tags and edges are separated and have no strong correlation, then how should we query them? Take the preceding data for example, if we want to find all the data in the post list, shall we search for the tags first, and then search for the edges one by one? Do we need to do this for all the queries? And so does writing data? It feels super troublesome.
- Some edges are shared by multiple tags, for example, the
user: User
edge mentioned earlier is used by both Topic and Member. Wouldn't this mess up the data? And how to distinguish them? We don't want to use other names to create new edges. - Associated fields usually have a one-to-one association, such as the
user: User
edge mentioned earlier, one-to-many association, such as the followings: [User] edge, and many-to-many association. How are these associations represented in NebulaGraph? - For associated lists, we usually need to count the quantity, normally by aggregating the queries or setting a count field alone to hold the quantity. In NebulaGraph, is there a proper way to handle this? Do we set a count property on edges? For example
CREATE EDGE followings(count int default 0);
?
The official documents are too term-based. They are not based on actual scenes and cases, and many things are not explained in detail, which makes them difficult to understand.
If possible, please make a best practice document based on the preceding schema and introduce how to query and write data in such a case.
NebulaGraph: First of all, thank you for the scenario you provided. You can require the document from our technical writers.
Let's summarize your scenario as follows:
Tags and edges are separated and have no strong correlation, then how should we query them? Take the preceding data for example, if we want to find all the data in the post list, shall we search for the tags first, and then search for the edges one by one? Do we need to do this for all the queries?
And so does writing data? It feels super troublesome.
For now, the scanning by tag feature is not ready, but it will be in the future in MATCH. For example, MATCH(p:post) RETURN p
. Scanning by tag relies on indexes and will be quite memory-consuming. If the data volume is large, OOM may happen frequently.
To write data, simply set the properties according to the post, and no other operation is needed.
Some edges are shared by multiple tags, for example, the
user: User
edge mentioned earlier is used by both Topic and Member. Wouldn't this mess up the data? And how to distinguish them? We don't want to use other names to create new edges.
You might have confused tags with vertices. I suggest that you read Nebula Concepts first. In fact, there is no such thing as an edge shared by different vertices. Edge IDs consist of the source vertex ID and destination vertex ID. An edge is unique if the vertices on both ends of it, its edge type, and its rank are distinct. Any difference in these attributes generates different edges. Different edges may have the same edge type, but their vertices are usually different. Even if their vertices are the same, if their ranks are different, they are different edges.
For example, the edge type between user
and topic
may be focus
, and that between user
and member
may be is
.
Associated fields usually have a one-to-one association, such as the
user: User
edge mentioned earlier, one-to-many association, such as thefollowings: [User]
edge, and many-to-many association. How are these associations represented in NebulaGraph?
Please refer to the preceding reply. In NebulaGraph tags and edges don't have direct relationships, but vertices and edges do. Tags are attached on vertices, and vertices are connected with edges. So no matter it is a one-to-one association or an N-to-N one, it is represented by edges.
For associated lists, we usually need to count the quantity, normally by aggregating the queries or setting a count field alone to hold the quantity. In NebulaGraph, is there a proper way to handle this? Do we set a
count
property on edges? For exampleCREATE EDGE followings(count int default 0);
?
nGQL supports such as COUNT
and you don't need to set a count property for counting.