Community Spotlights

Jul 1, 2021

Validating Import Performance of Nebula Importer

duspring

Machine Specifications for Testing

Host Name

OS

CPU Architecture

CPU Cores

Memory

Disk

hadoop 10

CentOS 7.6

x86_64

32 核

128 GB

1.8 TB

hadoop 11

CentOS 7.6

x86_64

32 核

64 GB

1 TB

hadoop 12

CentOS 7.6

x86_64

16 核

64 GB

1 TB

Environment of NebulaGraph Cluster

  • Operating System: CentOS 7.5 +

  • Necessary software for NebulaGraph Cluster, including gcc 7.1.0+, cmake 3.5.0, glibc 2.12+, and other necessary dependencies.


  • NebulaGraph version: V2.0.0

  • Back-end storage: Three nodes, RocksDB

Process \ Host Name

hadoop10

hadoop11

hadoop12

# of metad processes

1

1

1

# of storaged processes

1

1

1

# of graphd processes

1

1

1

Preparing Data and Introducing Data Format

# of Vertices / File Size

# of Edges / File Size

# of Vertices and Edges / File Size

74,314,635 /4.6 G

139,951,301 /6.6 G

214,265,936 /11.2 G

More details about the data:

  • edge.csv: 139,951,301 records in total, 6.6 GB

  • vertex.csv: 74,314,635 records in total, 4.6 GB

  • 214,265,936 vertices and edges in total, 11.2 GB

data size


vertices and edges
[root@hadoop10 datas]# wc -l edge.csv 
139951301 edge.csv
[root@hadoop10 datas]# head -10 vertex.csv 
-201035082963479683,实体
-1779678833482502384,值
4646408208538057683,胶饴
-1861609733419239066,别名: 饴糖、畅糖、畅、软糖。
-2047289935702608120,词条
5842706712819643509,词条(拼音:cí tiáo)也叫词目,是辞书学用语,指收列的词语及其释文。
-3063129772935425027,文化
-2484942249444426630,红色食品
-3877061284769534378,红色食品是指食品为红色、橙红色或棕红色的食品。
-3402450096279275143,否
[root@hadoop10 datas]# wc -l vertex.csv 
74314635 vertex.csv
[root@hadoop10 datas]

Validating Solution

Solution: Using Nebula Importer to import data in batch.

Edit a YAML file for importing data.


Create schema

On Nebula Console, create a graph space, and then tags and edge types in the graph space.

# 1. Create a graph space.
 (admin@nebula) [(none)]> create space test2(vid_type = FIXED_STRING(64));
# 2. Switch to the specified graph space.
 (admin@nebula) [(none)]> use test2;
# 3. Create a tag.
(admin@nebula) [test2]> create tag entity(name string);
# 4. Create an edge type.
(admin@nebula) [test2]> create edge relation(name string);
# 5. View the definition of the tag.
 (admin@nebula) [test2]> describe tag entity;
+--------+----------+-------+---------+
| Field  | Type     | Null  | Default |
+--------+----------+-------+---------+
| "name" | "string" | "YES" |         |
+--------+----------+-------+---------+
Got 1 rows (time spent 703/1002 us)
# 6. View the definition of the edge type.
 (admin@nebula) [test2]

Compile

Compile Nebula Importer and run shell commands.


View the output

Import results
# View part of logs.
2021/04/19 19:05:55 [INFO] statsmgr.go:61: Tick: Time(2400.00s), Finished(210207018), Failed(0), Latency AVG(32441us), Batches Req AVG(33824us), Rows AVG(87586.25/s)
2021/04/19 19:06:00 [INFO] statsmgr.go:61: Tick: Time(2405.00s), Finished(210541418), Failed(0), Latency AVG(32461us), Batches Req AVG(33844us), Rows AVG(87543.20/s)
2021/04/19 19:06:05 [INFO] statsmgr.go:61: Tick: Time(2410.00s), Finished(210901218), Failed(0), Latency AVG(32475us), Batches Req AVG(33857us), Rows AVG(87510.88/s)
2021/04/19 19:06:10 [INFO] statsmgr.go:61: Tick: Time(2415.00s), Finished(211270318), Failed(0), Latency AVG(32486us), Batches Req AVG(33869us), Rows AVG(87482.50/s)
2021/04/19 19:06:15 [INFO] statsmgr.go:61: Tick: Time(2420.00s), Finished(211685318), Failed(0), Latency AVG(32490us), Batches Req AVG(33873us), Rows AVG(87473.27/s)
2021/04/19 19:06:20 [INFO] statsmgr.go:61: Tick: Time(2425.00s), Finished(211959718), Failed(0), Latency AVG(32517us), Batches Req AVG(33900us), Rows AVG(87406.07/s)
2021/04/19 19:06:25 [INFO] statsmgr.go:61: Tick: Time(2430.00s), Finished(212220818), Failed(0), Latency AVG(32545us), Batches Req AVG(33928us), Rows AVG(87333.67/s)
2021/04/19 19:06:30 [INFO] statsmgr.go:61: Tick: Time(2435.00s), Finished(212433518), Failed(0), Latency AVG(32579us), Batches Req AVG(33963us), Rows AVG(87241.69/s)
2021/04/19 19:06:35 [INFO] statsmgr.go:61: Tick: Time(2440.00s), Finished(212780818), Failed(0), Latency AVG(32593us), Batches Req AVG(33977us), Rows AVG(87205.25/s)
2021/04/19 19:06:40 [INFO] statsmgr.go:61: Tick: Time(2445.01s), Finished(213240518), Failed(0), Latency AVG(32589us), Batches Req AVG(33973us), Rows AVG(87214.69/s)
2021/04/19 19:06:40 [INFO] reader.go:180: Total lines of file(/opt/software/datas/edge.csv) is: 139951301, error lines: 0
2021/04/19 19:06:42 [INFO]

A special focus on the statistics of the statistics of results.


Resource Requirements

High requirement of the machine specifications, including the number of CPU cores, memory size, and disk size.

  • hadoop 10

imageimage

hadoop 11

imageimage

hadoop 12

imageimage

Recommendations on the machine specifications:

  1. By comparing the memory consumption of the three machines, we found that the memory consumption is great when more than 200 million data are imported, so we recommend that the memory capacity should be as large as possible.

  2. For the information about the CPU cores and disk size, see the documentation: https://docs.nebula-graph.io.

nGQL Statements Test

The native graph query language of NebulaGraph is nGQL. It is compatible with OpenCypher. For now, nGQL has not supported traversal of the total number of vertices and edges. For example, MATCH (v) RETURN v is not supported yet. Make sure that at least one index is available in a MATCH statement. If you want to create an index when related vertices, edges, or properties exist, rebuild the index after it is created to make it effective.

To test whether nGQL is compatible with OpenCypher.



imageimage

Conclusion

This test validated the performance of importing a large amount of data to a three-node NebulaGraph cluster. The batch writing performance of Nebula Importer can meet the performance requirements of the production scenario. However, if the data is imported as CSV files, it must be stored in HDFS and a YAML configuration file is needed to specify the configuration of the tags and edge types for processing by tools.

Would like to know more about NebulaGraph? Join the Slack channel!

Go From Zero to Graph in Minutes

Spin Up Your NebulaGraph Cluster Instantly! 

✅ 14-day free trial

✅ No credit card required

✅ Cancel anytime

Go From Zero to Graph in Minutes

Spin Up Your NebulaGraph Cluster Instantly! 

✅ 14-day free trial ✅ No credit card required ✅ Cancel anytime

Go From Zero to Graph in Minutes

Spin Up Your NebulaGraph Cluster Instantly! 

✅ 14-day free trial
✅ No credit card required
✅ Cancel anytime