- Docs Home
- About TiDB Cloud
- Get Started
- Develop Applications
- Overview
- Quick Start
- Build a TiDB Developer Cluster
- CRUD SQL in TiDB
- Build a Simple CRUD App with TiDB
- Example Applications
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
- Troubleshoot
- Reference
- Cloud Native Development Environment
- Manage Cluster
- Plan Your Cluster
- Create a TiDB Cluster
- Connect to Your TiDB Cluster
- Set Up VPC Peering Connections
- Use an HTAP Cluster with TiFlash
- Scale a TiDB Cluster
- Upgrade a TiDB Cluster
- Delete a TiDB Cluster
- Use TiDB Cloud API (Beta)
- Migrate Data
- Import Sample Data
- Migrate Data into TiDB
- Configure Amazon S3 Access and GCS Access
- Migrate from MySQL-Compatible Databases
- Migrate Incremental Data from MySQL-Compatible Databases
- Migrate from Amazon Aurora MySQL in Bulk
- Import or Migrate from Amazon S3 or GCS to TiDB Cloud
- Import CSV Files from Amazon S3 or GCS into TiDB Cloud
- Import Apache Parquet Files from Amazon S3 or GCS into TiDB Cloud
- Troubleshoot Access Denied Errors during Data Import from Amazon S3
- Export Data from TiDB
- Back Up and Restore
- Monitor and Alert
- Overview
- Built-in Monitoring
- Built-in Alerting
- Third-Party Monitoring Integrations
- Tune Performance
- Overview
- Analyze Performance
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- TiKV Follower Read
- Coprocessor Cache
- Garbage Collection (GC)
- Tune TiFlash performance
- Manage User Access
- Billing
- Reference
- TiDB Cluster Architecture
- TiDB Cloud Cluster Limits and Quotas
- TiDB Limitations
- SQL
- Explore SQL with TiDB
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INDEX
ALTER TABLE
ALTER TABLE COMPACT
ALTER USER
ANALYZE TABLE
BATCH
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TABLE
TRACE
TRUNCATE
UPDATE
USE
WITH
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Locking Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- Set Operations
- List of Expressions for Pushdown
- TiDB Specific Functions
- Clustered Indexes
- Constraints
- Generated Columns
- SQL Mode
- Table Attributes
- Transactions
- Views
- Partitioning
- Temporary Tables
- Cached Tables
- Character Set and Collation
- Read Historical Data
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CLIENT_ERRORS_SUMMARY_BY_HOST
CLIENT_ERRORS_SUMMARY_BY_USER
CLIENT_ERRORS_SUMMARY_GLOBAL
CHARACTER_SETS
CLUSTER_INFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DATA_LOCK_WAITS
DDL_JOBS
DEADLOCKS
ENGINES
KEY_COLUMN_USAGE
PARTITIONS
PROCESSLIST
REFERENTIAL_CONSTRAINTS
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS_HISTORY
TIDB_INDEXES
TIDB_SERVERS_INFO
TIDB_TRX
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
- System Variables
- API Reference
- Storage Engines
- Dumpling
- Table Filter
- Troubleshoot Inconsistency Between Data and Indexes
- FAQs
- Release Notes
- Support
- Glossary
Connection Pools and Connection Parameters
This document describes how to configure connection pools and connection parameters when you use a driver or ORM framework to connect to TiDB.
If you are interested in more tips about Java application development, see Best Practices for Developing Java Applications with TiDB
If you are interested in more tips about Java application development, see Best Practices for Developing Java Applications with TiDB
Connection pool
Building TiDB (MySQL) connections is relatively expensive (for OLTP scenarios at least). Because in addition to building a TCP connection, connection authentication is also required. Therefore, the client usually saves the TiDB (MySQL) connections to the connection pool for reuse.
Java has many connection pool implementations such as HikariCP, tomcat-jdbc, druid, c3p0, and dbcp. TiDB does not limit which connection pool you use, so you can choose whichever you like for your application.
Configure the number of connections
It is a common practice that the connection pool size is well adjusted according to the application's own needs. Take HikariCP as an example:
- maximumPoolSize: The maximum number of connections in the connection pool. If this value is too large, TiDB consumes resources to maintain useless connections. If this value is too small, the application gets slow connections. Therefore, you need to configure this value according to your application characteristics. For details, see About Pool Sizing.
- minimumIdle: The minimum number of idle connections in the connection pool. It is mainly used to reserve some connections to respond to sudden requests when the application is idle. You also need to configure it according to your application characteristics.
The application needs to return the connection after finishing using it. It is recommended that the application uses the corresponding connection pool monitoring (such as metricRegistry) to locate connection pool issues in time.
Probe configuration
The connection pool maintains persistent connections to TiDB. TiDB does not proactively close client connections by default (unless an error is reported), but generally, there are also network proxies such as LVS or HAProxy between the client and TiDB. Usually, these proxies proactively clean up connections that are idle for a certain period. In addition to paying attention to the idle configuration of the proxies, the connection pool also needs to keep alive or probe connections.
If you often see the following error in your Java application:
The last packet sent successfully to the server was 3600000 milliseconds ago. The driver has not received any packets from the server. com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
If n
in n milliseconds ago
is 0
or a very small value, it is usually because the executed SQL operation causes TiDB to exit abnormally. To find the cause, it is recommended to check the TiDB stderr log.
If n
is a very large value (such as 3600000
in the above example), it is likely that this connection was idle for a long time and then closed by the proxy. The usual solution is to increase the value of the proxy's idle configuration and allow the connection pool to:
- Check whether the connection is available before using the connection every time.
- Regularly check whether the connection is available using a separate thread.
- Send a test query regularly to keep alive connections.
Different connection pool implementations might support one or more of the above methods. You can check your connection pool documentation to find the corresponding configuration.
Formulas based on experience
According to the About Pool Sizing article of HikariCP, if you have no idea about how to set a proper size for the database connection pool, you can get started with a formula based on experience. Then, based on the performance result of the pool size calculated from the formula, you can further adjust the size to achieve the best performance.
The formula based on experience is as follows:
connections = ((core_count * 2) + effective_spindle_count)
The description of each parameter in the formula is as follows:
- connections: the size of connections obtained.
- core_count: the number of CPU cores.
- effective_spindle_count: the number of hard drives (not SSD). Because each spinning hard disk can be called a spindle. For example, if you are using a server with a RAID of 16 disks, the effective_spindle_count should be 16. Because HDD usually can handle only one request at a time, the formula here is actually measuring how many concurrent I/O requests your server can manage.
In particular, pay attention to the following note below the formula.
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count). Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. ... There hasn't been any analysis so far regarding how well the formula works with SSDs.
This note indicates that:
- core_count is the number of physical cores, regardless of whether you enable Hyper-Threading or not.
- When data is fully cached, you need to set effective_spindle_count to
0
. As the hit rate of cache decreases, the count is closer to the actual number ofHDD
. - Whether the formula works for SSD is not tested and unknown.
When using SSDs, it is recommended that you use the following formula based on experience instead:
connections = (number of cores * 4)
Therefore, you can set the maximum connection size of the initial connection pool to cores * 4
in the case of SSDs and further adjust the size to tune the performance.
Tuning direction
As you can see, a size calculated from the formulas based on experience is just a recommended base value. To get the optimal size on a specific machine, you need to try other values around the base value and test the performance.
Here are some basic rules to help you get the optimal size:
- If your network or storage latency is high, increase your maximum number of connections to reduce the latency waiting time. Once a thread is blocked by latency, other threads can take over and continue processing.
- If you have multiple services deployed on your server and each service has a separate connection pool, consider the sum of the maximum number of connections to all connection pools.
Connection parameters
Java applications can be encapsulated with various frameworks. In most of the frameworks, JDBC API is called on the bottommost level to interact with the database server. For JDBC, it is recommended that you focus on the following things:
- JDBC API usage choice
- API Implementer's parameter configuration
JDBC API
For JDBC API usage, see JDBC official tutorial. This section covers the usage of several important APIs.
Use Prepare API
For OLTP (Online Transactional Processing) scenarios, the SQL statements sent by the program to the database are several types that can be exhausted after removing parameter changes. Therefore, it is recommended to use Prepared Statements instead of regular execution from a text file and reuse Prepared Statements to execute directly. This avoids the overhead of repeatedly parsing and generating SQL execution plans in TiDB.
At present, most upper-level frameworks call the Prepare API for SQL execution. If you use the JDBC API directly for development, pay attention to choosing the Prepare API.
In addition, with the default implementation of MySQL Connector/J, only client-side statements are preprocessed, and the statements are sent to the server in a text file after ?
is replaced on the client. Therefore, in addition to using the Prepare API, you also need to configure useServerPrepStmts = true
in JDBC connection parameters before you perform statement preprocessing on the TiDB server. For detailed parameter configuration, see MySQL JDBC parameters.
Use Batch API
For batch inserts, you can use the addBatch
/executeBatch
API. The addBatch()
method is used to cache multiple SQL statements first on the client, and then send them to the database server together when calling the executeBatch
method.
In the default MySQL Connector/J implementation, the sending time of the SQL statements that are added to batch with addBatch()
is delayed to the time when executeBatch()
is called, but the statements will still be sent one by one during the actual network transfer. Therefore, this method usually does not reduce the amount of communication overhead.
If you want to batch network transfer, you need to configure rewriteBatchedStatements = true
in the JDBC connection parameters. For the detailed parameter configuration, see Batch-related parameters.
Use StreamingResult
to get the execution result
In most scenarios, to improve execution efficiency, JDBC obtains query results in advance and saves them in client memory by default. But when the query returns a super large result set, the client often wants the database server to reduce the number of records returned at a time and waits until the client's memory is ready and it requests for the next batch.
Usually, there are two kinds of processing methods in JDBC:
Set FetchSize to
Integer.MIN_VALUE
to ensure that the client does not cache. The client will read the execution result from the network connection throughStreamingResult
.When the client uses the streaming read method, it needs to finish reading or close
resultset
before continuing to use the statement to make a query. Otherwise, the errorNo statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
is returned.To avoid such an error in queries before the client finishes reading or closes
resultset
, you can add theclobberStreamingResults=true
parameter in the URL. Then,resultset
is automatically closed but the result set to be read in the previous streaming query is lost.To use Cursor Fetch, first set
FetchSize
as a positive integer and configureuseCursorFetch=true
in the JDBC URL.
TiDB supports both methods, but it is preferred that you use the first method, because it is a simpler implementation and has a better execution efficiency.
MySQL JDBC parameters
JDBC usually provides implementation-related configurations in the form of JDBC URL parameters. This section introduces MySQL Connector/J's parameter configurations (If you use MariaDB, see MariaDB's parameter configurations). Because this document cannot cover all configuration items, it mainly focuses on several parameters that might affect performance.
Prepare-related parameters
This section introduces parameters related to Prepare
.
useServerPrepStmts
useServerPrepStmts is set to
false
by default, that is, even if you use the Prepare API, the "prepare" operation will be done only on the client. To avoid the parsing overhead of the server, if the same SQL statement uses the Prepare API multiple times, it is recommended to set this configuration totrue
.To verify that this setting already takes effect, you can do:
- Go to TiDB monitoring dashboard and view the request command type through Query Summary > QPS By Instance.
- If
COM_QUERY
is replaced byCOM_STMT_EXECUTE
orCOM_STMT_PREPARE
in the request, it means this setting already takes effect.
cachePrepStmts
Although
useServerPrepStmts=true
allows the server to execute Prepared Statements, by default, the client closes the Prepared Statements after each execution and does not reuse them. This means that the "prepare" operation is not even as efficient as text file execution. To solve this, it is recommended that after settinguseServerPrepStmts=true
, you should also configurecachePrepStmts=true
. This allows the client to cache Prepared Statements.To verify that this setting already takes effect, you can do:
Go to TiDB monitoring dashboard and view the request command type through Query Summary > QPS By Instance.
If the number of
COM_STMT_EXECUTE
in the request is far more than the number ofCOM_STMT_PREPARE
, it means this setting already takes effect.In addition, configuring
useConfigs=maxPerformance
will configure multiple parameters at the same time, includingcachePrepStmts=true
.
prepStmtCacheSqlLimit
After configuring
cachePrepStmts
, also pay attention to theprepStmtCacheSqlLimit
configuration (the default value is256
). This configuration controls the maximum length of the Prepared Statements cached on the client.The Prepared Statements that exceed this maximum length will not be cached, so they cannot be reused. In this case, you may consider increasing the value of this configuration depending on the actual SQL length of the application.
You need to check whether this setting is too small if you:
- Go to TiDB monitoring dashboard and view the request command type through Query Summary > QPS By Instance.
- And find that
cachePrepStmts=true
has been configured, butCOM_STMT_PREPARE
is still mostly equal toCOM_STMT_EXECUTE
andCOM_STMT_CLOSE
exists.
prepStmtCacheSize
prepStmtCacheSize controls the number of cached Prepared Statements (the default value is
25
). If your application requires "preparing" many types of SQL statements and wants to reuse Prepared Statements, you can increase this value.To verify that this setting already takes effect, you can do:
- Go to TiDB monitoring dashboard and view the request command type through Query Summary > QPS By Instance.
- If the number of
COM_STMT_EXECUTE
in the request is far more than the number ofCOM_STMT_PREPARE
, it means this setting already takes effect.
Batch-related parameters
While processing batch writes, it is recommended to configure rewriteBatchedStatements=true
. After using addBatch()
or executeBatch()
, JDBC still sends SQL one by one by default, for example:
pstmt = prepare("INSERT INTO `t` (a) values(?)");
pstmt.setInt(1, 10);
pstmt.addBatch();
pstmt.setInt(1, 11);
pstmt.addBatch();
pstmt.setInt(1, 12);
pstmt.executeBatch();
Although Batch
methods are used, the SQL statements sent to TiDB are still individual INSERT
statements:
INSERT INTO `t` (`a`) VALUES(10);
INSERT INTO `t` (`a`) VALUES(11);
INSERT INTO `t` (`a`) VALUES(12);
But if you set rewriteBatchedStatements=true
, the SQL statements sent to TiDB will be a single INSERT
statement:
INSERT INTO `t` (`a`) values(10),(11),(12);
Note that the rewrite of the INSERT
statements is to concatenate the values after multiple "values" keywords into a whole SQL statement. If the INSERT
statements have other differences, they cannot be rewritten, for example:
INSERT INTO `t` (`a`) VALUES (10) ON DUPLICATE KEY UPDATE `a` = 10;
INSERT INTO `t` (`a`) VALUES (11) ON DUPLICATE KEY UPDATE `a` = 11;
INSERT INTO `t` (`a`) VALUES (12) ON DUPLICATE KEY UPDATE `a` = 12;
The above INSERT
statements cannot be rewritten into one statement. But if you change the three statements into the following ones:
INSERT INTO `t` (`a`) VALUES (10) ON DUPLICATE KEY UPDATE `a` = VALUES(`a`);
INSERT INTO `t` (`a`) VALUES (11) ON DUPLICATE KEY UPDATE `a` = VALUES(`a`);
INSERT INTO `t` (`a`) VALUES (12) ON DUPLICATE KEY UPDATE `a` = VALUES(`a`);
Then they meet the rewrite requirement. The above INSERT
statements will be rewritten into the following one statement:
INSERT INTO `t` (`a`) VALUES (10), (11), (12) ON DUPLICATE KEY UPDATE a = VALUES(`a`);
If there are three or more updates during the batch update, the SQL statements will be rewritten and sent as multiple queries. This effectively reduces the client-to-server request overhead, but the side effect is that a larger SQL statement is generated. For example:
UPDATE `t` SET `a` = 10 WHERE `id` = 1; UPDATE `t` SET `a` = 11 WHERE `id` = 2; UPDATE `t` SET `a` = 12 WHERE `id` = 3;
In addition, because of a client bug, if you want to configure rewriteBatchedStatements=true
and useServerPrepStmts=true
during batch update, it is recommended that you also configure the allowMultiQueries=true
parameter to avoid this bug.
Integrate parameters
Through monitoring, you might notice that although the application only performs INSERT
operations to the TiDB cluster, there are a lot of redundant SELECT
statements. Usually this happens because JDBC sends some SQL statements to query the settings, for example, select @@session.transaction_read_only
. These SQL statements are useless for TiDB, so it is recommended that you configure useConfigs=maxPerformance
to avoid extra overhead.
useConfigs=maxPerformance
configuration includes a group of configurations:
cacheServerConfiguration=true
useLocalSessionState=true
elideSetAutoCommits=true
alwaysSendSetIsolation=false
enableQueryTimeouts=false
After it is configured, you can check the monitoring to see a decreased number of SELECT
statements.
Timeout-related parameters
TiDB provides two MySQL-compatible parameters to control the timeout: wait_timeout
and max_execution_time
. These two parameters respectively control the connection idle timeout with the Java application and the timeout of the SQL execution in the connection; that is to say, these parameters control the longest idle time and the longest busy time for the connection between TiDB and the Java application. Since TiDB v5.4, the default value of wait_timeout
is 28800
seconds, which is 8 hours. For TiDB versions earlier than v5.4, the default value is 0
, which means the timeout is unlimited. The default value of max_execution_time
is 0
, which means the maximum execution time of a SQL statement is unlimited.
However, in an actual production environment, idle connections and SQL statements with excessively long execution time negatively affect databases and applications. To avoid idle connections and SQL statements that are executed for too long, you can configure these two parameters in your application's connection string. For example, set sessionVariables=wait_timeout=3600
(1 hour) and sessionVariables=max_execution_time=300000
(5 minutes).