naskid.blogg.se - Nebula 3 vs nebula 4

Execute the Spark task in Hive to generate SST files in RocksDB and then ingest the SST files to Nebula Graph. The data import process in each graph database is described below: Middle files supported by graph databases.The steps of data import in batch are as follows: data import in batch, real-time data write, and data query. The benchmarking test has been conducted from three perspectives, i.e. Relationships: 17.7 billion relationships of 19 types.Entities: 2.6 billion entities of four types.Data generation parameters: branch=stable, version=0.3.3, scale=1000.The team uses the LDBC dataset for the benchmarking test.īelow is a brief introduction to the data within the dataset: The Dataset Used for the Benchmarking Test Therefore, the team chooses HBase as the storage backend instead. Although HugeGraph supports RocksDB as a storage backend, it doesn’t support RocksDB as the storage backend for a cluster. HugeServer is responsible for cluster meta data management and query execution. The storage backend is developed by Dgraph. Alpha is responsible for query execution and data storage. Zero is responsible for cluster meta data management. And the Storage Service is responsible for storing sharded data. The Query Service is responsible for query execution. The Meta Service is responsible to manage cluster meta data. Single instance resources: 32 Cores, 64 GB Memory, 1 TB SSD (Intel(R) Xeon(R) Gold 5218 CPU 2.30 GHz).Database instances: Docker containers running on different machines.

A Summary of The Testing Process Hardware Configuration After a through research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu). This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met. The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness. The knowledge graph data is usually stored in data warehouses like Hive.

It should be able to import data in batch.

Therefore, it is unacceptable to respond a query at second level. To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths.

It should work under OLTP scenarios with the capability of multi-hop queries at millisecond level.

With that being said, the single-node deployment cannot meet Meituan’s storage requirements. The knowledge graph data size in Meituan can reach hundreds of billions vertices and edges in total and the throughput can reach tens of thousands of QPS.

It should support clustering and should be able to scale horizontally in terms of both storage and computation capabilities.

It should be an open source project which is also business friendlyīy having control over the source code, the Meituan team can ensure data security and service availability.

The team has outlined the basic requirements as below per our business status quo: It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan’s graph storage and graph learning platform. Currently there are dozens of graph database solutions out there on the market. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine.

The ultimate goal is to enhance the smart local life.Ĭompared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. Meituan has been digging deeply in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. Compared with the “black box” of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management. The deep learning and knowledge graph technologies have been developing rapidly in recent years. This article was originally published on the Nebula Graph forum： This article is written by Gao Chen and Zhao Dengchang from the NLP team at Meituan.