Databend Performance
This ClickBench ranking is dynamic and the current results are as of March 3rd, 2023.
To evaluate Databend's performance against other OLAP databases, we will use the standardized ClickBench, which measures database performance using the hits
dataset from a production environment. The benchmark includes results from 50 databases.
It tests data import time and 43 other queries, with the database's overall performance across all testing scenarios determining its total ranking.
To demonstrate the real performance of Databend, we did not make any special optimizations for the testing scenarios:
- We used all default configurations without any parameter tuning
- We did not partition the data based on specific column during the import and table creation
- We did not cache the original data or query results, only caching metadata and indexes
We submitted test results for the three most common types of Amazon EC2 instances:
- c6a.metal, 500GB gp2 (192 cores)
- c6a.4xlarge, 500GB gp2 (16 cores)
- c5.4xlarge, 500GB gp2 (16 cores)
Data Load Performance (March 3rd, 2023)
Data Load Performance Ranked First for all three instance types:
Query Performance (March 3rd, 2023)
In hot run queries, Databend performed exceptionally well on c6a.4xlarge, ranking first. It had a slight disadvantage on c5.4xlarge, ranking second, and was in third place on c6a.metal.
Thanks to the newly designed expression system in Databend, all operators have been implemented with vectorization, and all operators have domain-based value inference capabilities. Based on this, we can apply a powerful constant folding framework to perform multi-level data pruning, as well as skip unnecessary data blocks as much as possible.
In addition, the scheduling ability of the pipeline and the functionality of the aggregation operator have been further strengthened, allowing for efficient scheduling of CPU and IO resources, thereby achieving optimal performance.
Conclusion
Databend is a new generation cloud-native data warehouse designed for cloud-based object storage, and it hasn't done much optimization for local file system scenarios.
The gap between the top three in the ClickBench list is not very significant. Therefore, even in scenarios where it is not particularly good at, combining high-performance computing power, Databend can also achieve good advantages. Since Databend uses default configurations and disables DataCache, the comparison has little significance in cold run scenarios. We can also optimize the table creation statement (e.g., import partition by UserID
, optimize Q17
and other scenarios that aggregate by UserId) or add some parameter tuning to further improve performance.
Later on, we will add test results based on cloud-based object storage to ClickBench.
However, benchmark tuning is not the main purpose of this test. Our ultimate goal is to provide users with ultimate performance and easy-to-use product experience in common scenarios. Benchmarking is mainly to provide us with a way to measure performance and improve product quality.
ClickBench testing is very representative, so we have integrated ClickBench into the performance testing CI of various versions and PRs, making it easy for developers to observe performance degradation and improvement and optimize product development.