Docs Home
About TiDB Cloud
Get Started
Develop Applications
- Overview
- Quick Start
  - Build a TiDB Developer Cluster
  - CRUD SQL in TiDB
  - Build a Simple CRUD App with TiDB
    - Java
    - Golang
- Example Applications
  - Build a TiDB Application using Spring Boot
- Connect to TiDB
- Design Database Schema
- Write Data
- Read Data
- Transaction
- Optimize
  - Overview
  - SQL Performance Tuning
  - Best Practices for Performance Tuning
  - Best Practices for Indexing
  - Other Optimization Methods
    - Avoid Implicit Type Conversions
    - Unique Serial Number Generation
- Troubleshoot
- Reference
  - Bookshop Example Application
  - Guidelines
    - Object Naming Convention
    - SQL Development Specifications
- Cloud Native Development Environment
  - Gitpod
Manage Cluster
- Plan Your Cluster
- Create a TiDB Cluster
- Connect to Your TiDB Cluster
  - Connect via a SQL Client
  - Connect via SQL Shell
- Set Up VPC Peering Connections
- Use an HTAP Cluster with TiFlash
- Scale a TiDB Cluster
- Upgrade a TiDB Cluster
- Delete a TiDB Cluster
- Use TiDB Cloud API (Beta)
Migrate Data
Back Up and Restore
Monitor and Alert
- Overview
- Built-in Monitoring
- Built-in Alerting
- Third-Party Monitoring Integrations
  - Datadog Integration
  - Prometheus and Grafana Integration
Tune Performance
- Overview
- Analyze Performance
- SQL Tuning
  - Overview
  - Understanding the Query Execution Plan
  - SQL Optimization Process
    - Overview
    - Logic Optimization
    - Physical Optimization
    - Prepare Execution Plan Cache
  - Control Execution Plans
- TiKV Follower Read
- Coprocessor Cache
- Garbage Collection (GC)
  - Overview
  - Configuration
- Tune TiFlash performance
Manage User Access
- Manage Console User Access
- Configure Cluster Security Settings
Billing
Reference
FAQs
Release Notes
Support
Glossary

Subquery

This document introduces subquery statements and categories in TiDB.

Overview

An subquery is a query within another SQL query. With subquery, the query result can be used in another query.

The following takes the Bookshop application as an example to introduce subquery.

Subquery statement

In most cases, there are five types of subqueries:

Scalar Subquery, such as SELECT (SELECT s1 FROM t2) FROM t1.
Derived Tables, such as SELECT t1.s1 FROM (SELECT s1 FROM t2) t1.
Existential Test, such as WHERE NOT EXISTS(SELECT ... FROM t2), WHERE t1.a IN (SELECT ... FROM t2).
Quantified Comparison, such as WHERE t1.a = ANY(SELECT ... FROM t2), WHERE t1.a = ANY(SELECT ... FROM t2).
Subquery as a comparison operator operand, such as WHERE t1.a > (SELECT ... FROM t2).

Category of subquery

The subquery can be categorized as Correlated Subquery and Self-contained Subquery. TiDB treats these two types differently.

Whether a subquery is correlated or not depends on whether it refers to columns used in its outer query.

Self-contained subquery

For a self-contained subquery that uses subquery as operand of comparison operators (>, >=, < , <= , = , or ! =), the inner subquery queries only once, and TiDB rewrites it as a constant during the execution plan phase.

For example, to query authors in the authors table whose age is greater than the average age, you can use a subquery as a comparison operator operand.

SELECT * FROM authors a1 WHERE (IFNULL(a1.death_year, YEAR(NOW())) - a1.birth_year) > (
    SELECT
        AVG(IFNULL(a2.death_year, YEAR(NOW())) - a2.birth_year) AS average_age
    FROM
        authors a2
)

The inner subquery is executed before TiDB executes the above query:

SELECT AVG(IFNULL(a2.death_year, YEAR(NOW())) - a2.birth_year) AS average_age FROM authors a2;

Suppose the result of the query is 34, that is, the average age is 34, and 34 will be used as a constant to replace the original subquery.

SELECT * FROM authors a1
WHERE (IFNULL(a1.death_year, YEAR(NOW())) - a1.birth_year) > 34;

The result is as follows:

+--------+-------------------+--------+------------+------------+
| id     | name              | gender | birth_year | death_year |
+--------+-------------------+--------+------------+------------+
| 13514  | Kennith Kautzer   | 1      | 1956       | 2018       |
| 13748  | Dillon Langosh    | 1      | 1985       | NULL       |
| 99184  | Giovanny Emmerich | 1      | 1954       | 2012       |
| 180191 | Myrtie Robel      | 1      | 1958       | 2009       |
| 200969 | Iva Renner        | 0      | 1977       | NULL       |
| 209671 | Abraham Ortiz     | 0      | 1943       | 2016       |
| 229908 | Wellington Wiza   | 1      | 1932       | 1969       |
| 306642 | Markus Crona      | 0      | 1969       | NULL       |
| 317018 | Ellis McCullough  | 0      | 1969       | 2014       |
| 322369 | Mozelle Hand      | 0      | 1942       | 1977       |
| 325946 | Elta Flatley      | 0      | 1933       | 1986       |
| 361692 | Otho Langosh      | 1      | 1931       | 1997       |
| 421294 | Karelle VonRueden | 0      | 1977       | NULL       |
...

For self-contained subqueries such as Existential Test and Quantified Comparison, TiDB rewrites and replaces them with equivalent queries for better performance. For more information, see Subquery Related Optimizations.

Correlated subquery

For correlated subquery, because the inner subquery references the columns from the outer query, each subquery is executed once for each row of the outer query. That is, assuming that the outer query gets 10 million results, the subquery will also be executed 10 million times, which will consume more time and resources.

Therefore, in the process of processing, TiDB will try to Decorrelate of Correlated Subquery to improve the query efficiency at the execution plan level.

The following statement is to query authors who are older than the average age of other authors of the same gender.

SELECT * FROM authors a1 WHERE (IFNULL(a1.death_year, YEAR(NOW())) - a1.birth_year) > (
    SELECT
        AVG(
            IFNULL(a2.death_year, YEAR(NOW())) - IFNULL(a2.birth_year, YEAR(NOW()))
        ) AS average_age
    FROM
        authors a2
    WHERE a1.gender = a2.gender
);

TiDB rewrites it to an equivalent join query:

SELECT *
FROM
    authors a1,
    (
        SELECT
            gender, AVG(
                IFNULL(a2.death_year, YEAR(NOW())) - IFNULL(a2.birth_year, YEAR(NOW()))
            ) AS average_age
        FROM
            authors a2
        GROUP BY gender
    ) a2
WHERE
    a1.gender = a2.gender
    AND (IFNULL(a1.death_year, YEAR(NOW())) - a1.birth_year) > a2.average_age;

As a best practice, in actual development, it is recommended to avoid querying through a correlated subquery if you can write another equivalent query with better performance.

Download PDF Request docs changes

What’s on this page

Overview
Subquery statement
Category of subquery
- Self-contained subquery
- Correlated subquery
Read more

Was this page helpful?