Pick of the Week at NebulaGraph - vesoft Inc. Named one of the 20 Fastest-Growing OSS Startups
Normally the weekly issue covers NebulaGraph Updates and Community Q&As. If something major happens, it will also be covered in the additional Events of the Week section.
Events of the Week
- Runa Capital, an early-stage investor of Nginx and MariaDB, released the Top 20 fastest-growing open-source startups in Q3 2020
This list measures the GitHub star growth of each open-source project in Q3, 2020. And vesoft Inc., the creator of the NebulaGraph database, ranked the eighth with 4.5K stars (total stars of the NebulaGraph repository on GitHub as of 22/10/2020) and 351% AGR (Annualised Growth Rate in Q3, 2020). For more information, please read the Medium report: https://medium.com/runacapital/open-source-growth-benchmarks-extention-the-ross-index-and-the-fastest-growing-startups-in-q3-2020-7aee7fa7eed7
- October: DB-Engines Ranking for Graph DBMS
During the last month, no big changes happened to the Top 10 ranking of Graph DBMS. However, NebulaGraph is still advancing, now ranking at the 18th.
NebulaGraph Studio v1.2.0-beta was released
In this update, NebulaGraph Studio is furnished with a new feature: visualized schema operations. From now on, you can create graph schemas for NebulaGraph without memorizing the nGQL syntaxes, which is more convenient and less stressful for users. NebulaGraph Studio is available here: https://github.com/vesoft-inc/nebula-web-docker
This week's topic is about how to restore cluster data from a snapshot in NebulaGraph.
Q: How to restore the data from a snapshot on a new cluster? Which directory in the cluster should I put the snapshot file in?
NebulaGraph: After a snapshot is created, you will find that the file structure in the snapshot is the same as that in the original data directory. Therefore, you can consider that the snapshot is a backup of the original data files. If you want to use the checkpoint you created to restore data on a NebulaGraph cluster, you can write a shell script to replace the original data with the checkpoint data. If you need to repeat the data corruption problem on the cluster, keep the original data.
Further push: Thank you very much. I have another question. If I have an active/standby cluster, and the structure of the standby cluster is different from that of the active cluster. For example, the active cluster has five machines for the storage service, and the standby cluster has only four. In this case, can I use the data snapshot of the active cluster to replicate data in the standby cluster?
NebulaGraph: Generally speaking, this is not possible. The structure of the standby cluster must be the same as that of the active cluster. Data loss may be caused by inconsistent structures of active and standby clusters. For example, for an active cluster, a graph space is divided into five partitions with one replica. If the standby cluster has only four machines, the data of one partition will surely be lost, and other adaptation problems may occur. Therefore, we recommend consistent structures of the active and standby clusters during the recovery process.
Further push: OK, thank you. We originally wanted to create a graph offline and create a snapshot of it, and then copy the snapshot data on online machines. We didn't want the offline machines to be completely consistent with the online ones. However, according to your replies, the offline cluster structure must be the same as the online one.
NebulaGraph: Keeping the structures consistent is recommended. I think you may want to load online data efficiently by doing this, right? If the number of nodes in the standby cluster is less than that in the active cluster, even if there is no adaptation problem, a partition balance operation will be performed between storage nodes after the snapshot is switched, which will also consume system resources and reduce the efficiency of the entire checkpoint handover. In my opinion, the gain is not worth the loss.
Further push: One more question. Can I just copy all the files under the
/data directory of each machine in the active cluster to the corresponding directories in the slave cluster, instead of using the snapshot to restore the data, to migrate the data?
NebulaGraph: Yes, you can. In fact, the implementation mechanism of the checkpoint is to create a hard link for data files. The advantage of this hard link is that it will not occupy too much storage space under the same file system. Therefore, if the checkpoint is created by using the hard link mechanism on the local system, it has great advantages in both storage space and backup performance. In your business, the checkpoint is actually copied to another system, which is no big difference from copying data files directly. In addition, if you want to directly copy data files, you must avoid write operations in the source cluster during the copy process.
Recommended for You
Benchmarking the Mainstream Open Source Distributed Graph Databases at Meituan: NebulaGraph vs Dgraph vs JanusGraph
Selecting a graph database solution that can meet the Meituan's business requirements is the basis of building their graph storage and graph learning platform. In this article, the NLP team at Meituan benchmarked the mainstream open-source distributed graph databases in data import, data write, and data queries.