Pick of the Week at NebulaGraph - Running Configuration Explained in Detail

Pick of the Week

Normally the weekly issue covers Feature Explanation and Community Q&As. If something major happens, it will also be covered in the additional Events of the Week section.

Events of the Week

NebulaGraph has released the bugfix version of v1.0.0 on Jul.9, 2020. See the details in NebulaGraph v1.0.1 Release Note.

Feature Explanation

This week let's talk about the TTL (Time-to-Live) feature.

In NebulaGraph, you can add a TTL attribute to any tag or edge type so that expired data can be automatically deleted.

To use the TTL feature, you need to specify the values of start time (ttl_col) and lifespan (ttl_duration). The start time (ttl_col) uses the UNIX timestamp format and supports calling the now function to acquire the current time, and the data type is int. If the current time is greater than the start time plus the lifespan, the Tag or Edge will be discarded.

Pro Tip: The TTL function and the Index function cannot be used for a single Tag or Edge at the same time.

Below are several common use cases where TTL is used.

Scenario #1: When a vertex has multiple tags. When a vertex has multiple tags, only the one with the TTL property is discarded. The other preperties of this vertex can be accessed. See the below example:

CREATE TAG tag1(a timestamp) ttl_col = "a", ttl_duration = 5;
CREATE TAG tag2(a string);
INSERT VERTEX tag1(a),tag2(a) values 200:(now(), "hello");
fetch prop on * 200;

You can see there are two tags attached to the Vertex 200 and the lifespan for property data on tag1 is five seconds:

Vertex with Multi-Tag - Before Data Expires

Fetch the properties on Vertex 200 after five seconds, only tag2 is returned:

Vertex with Multi-Tag - After Data Expires

Scenarios #2: When a vertex has only one Tag with a TTL property.

Fetch properties on Vertex 101 and the tag1 is returned:

fetch prop on * 101;

Vertex with One Tag - Before Data Expires

Query properties on Vertex 101 again after five seconds, no result is returned because the data is expired.

Vertex with One Tag - After Data Expires

You can delete the TTL property of a Tag or Edge by setting ttl_col to null or deleting this field. Setting ttl_col to null means the data will not expire. See the example code below:

ALTER TAG tag1 ttl_col = "";
INSERT VERTEX tag1(a),tag2(a) values 202:(now(), "hello");
fetch prop on * 202;

Alter TTL

Delete the ttl_col field, the data does not expire.

ALTER TAG tag1 DROP (a);
INSERT VERTEX tag1(a),tag2(a) values 203:(now(), "hello");
1. fetch prop on * 203;

Drop TTL

When ttl_duration is set to 0, the TTL function exists, but all data does not expire.

ALTER TAG tag1 ttl_duration = 0;
INSERT VERTEX tag1(a),tag2(a) values 204:(now(), "hello");
fetch prop on * 204;

Set TTL to zero

Community Q&A

Q: How should I configure NebulaGraph?

A: Let's talk about the problem from three perspectives:

Recommendation for Production Environment

Deployment requirements:

Three metad processes
At least three storaged processes
At least three graphd processes

None of the above processes needs to occupy a machine exclusively. For example, a cluster of 5 machines: A, B, C, D, E, can be deployed as follows:

A：metad, storaged, graphd
B：metad, storaged, graphd
C：metad, storaged, graphd
D：storaged, graphd
E：storaged, graphd

Pro Tip: Do not deploy the same cluster across available zones. Each metad process creates a replica of the meta data, so usually only three processes are required. The number of storaged processes does not affect the number of replicas of data in the graph space.

Server configuration requirements (standard configuration):

Take AWS EC2 c5d.12xlarge as an example:

Processor: 48 core
Memory: 96 GB
Storage: 2 * 900 GB, NVMe SSD
Linux kernel: 3.9 or higher, view through the command uname -r
glibc: 2.12 or higher, view through the command ldd --version

See here for Operation System configuration.

Recommendation for Test Environment

Deployment requirements:

One metad process
At least one storaged process
At least one graphd process

For example, a cluster with 3 machines: A, B, and C can be deployed as follows:

A：metad, storaged, graphd
B：storaged, graphd
C：storaged, graphd

Server configuration requirements (minimum configuration):

Take AWS EC2 c5d.xlarge as an example:

Processor: 4 core
Memory: 8 GB
Storage: 100 GB, SSD

Resource Estimation (three-replica standard configuration)

Storage space (full cluster): number of vertices/edges * _number of bytes of average attributes *_ 6
Memory (full cluster): number of vertices/edges *_ 5 bytes + number of RocksDB instances _ (write_buffer_size * max_write_buffer_number + rocksdb_block_cache), where each directory under the --data_path entry in the etc/nebula-storaged.conf file corresponds to a RocksDB instance.
Number of partitions in the graph space: the number of hard drives in the entire cluster * (2 to 10--the better the hard drive, the greater the value).
Reserve a 20% buffer for memory and hard disk.

Pick of the Week at NebulaGraph - Running Configuration Explained in Detail

Events of the Week

Feature Explanation

Community Q&A

You Might Also Like