Low import speed using TiSpark. How to optimize cluster performance and increase speed? #66452
Replies: 1 comment
-
|
Original author: TiDB-AI-Assistant Based on the information provided, it seems that the cluster configuration is not the bottleneck for the slow import speed. The hardware resources allocated to the cluster are more than sufficient for the import task. One possible reason for the slow import speed could be the default configuration of TiSpark. By default, TiSpark uses a small number of Spark partitions, which may not be enough to fully utilize the available cluster resources. To increase the import speed, you can try adjusting the following TiSpark configuration parameters:
You can set these parameters in the Additionally, you can try using TiDB Lightning for data import instead of TiSpark. TiDB Lightning is a standalone tool that can import data into TiDB at a much faster speed than TiSpark. You can refer to the documentation for more information on how to use TiDB Lightning. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Original author: aaarnell
Original time: 2023-06-22 07:59:25.889351
Original ID/Slug: 605 / low-import-speed-using-tispark-how-to-optimize-cluster-performance-and-increase-speed
TiDB version:
v7.1.0
Problem:
Low speed of data import using TiSpark. Suboptimal cluster configuration
Resource allocation:
4 hosts. Characteristics of each:
Cluster Configuration:
1 host: 3 PD
2 host: 1 TiDB, 5 TiKV, 1 TiFlash
3 host: 1 TiDB, 4 TiKV, 1 TiFlash
4 host: 1 TiDB, 4 TiKV, 1 TiFlash
In the
cluster_template.yamlfile, the topology of the deployed TiDB cluster.cluster_template.yaml (8.8 KB)
Detailed description of the problem:
We try to import data from another system according to the instructions using
TiSpark. But it turns out slowly.For example: the
customertable from theTPC-DStest suite occupies about 30GB in text format and has 65,000,000 rows. In the current configuration of the Td b cluster and with 3 SparkExec units configured, the table import time is about 30 minutes. Or 17MB/sec and 36000 rows/sec.How can I increase this speed? Perhaps it is worth optimizing the cluster configuration?
Beta Was this translation helpful? Give feedback.
All reactions