Using the Vectorized Executor for Tuning
The openGauss database supports the row executor and vectorized executor for processing row-store tables and column-store tables, respectively.
- More data is read in one batch at a time, saving I/O resources.
- There are a large number of records in a batch, and the CPU cache hit rate increases.
- In pipeline mode, the number of function calls is small.
- A batch of data is processed at a time, which is efficient.
Therefore, the openGauss database can achieve better query performance for complex analytical queries. However, column-store tables do not perform well in data insertion and update. Therefore, column-store tables cannot be used for services with frequent data insertion and update.
To improve the query performance of row-store tables in complex analytical queries, the openGauss database provides the vectorized executor for processing row-store tables. You can set try_vector_engine_strategy to convert query statements containing row-store tables into vectorized execution plans for execution.
This conversion is not applicable to all query scenarios. If a query statement contains operations such as expression calculation, multi-table join, and aggregation, the performance can be improved by converting the statement to a vectorized execution plan. Theoretically, converting a row-store table to a vectorized execution plan causes conversion overheads and performance deterioration. After the foregoing expression calculation, join operation, and aggregation operations are converted into vectorized execution plans, performance can be improved. The performance improvement must be higher than the overheads generated by the conversion. This determines whether the conversion is required.
Take TPCH Q1 as an example. When the row executor is used, the execution time of the scan operator is 405210 ms, and the execution time of the aggregation operation is 2618964 ms. After the vectorized executor is used, the execution time of the scan operator (SeqScan and VectorAdapter) is 470840 ms, and the execution time of the aggregation operation is 212384 ms. So the query performance can be improved.
Execution plan of the TPCH Q1 row executor:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=43539570.49..43539570.50 rows=6 width=260) (actual time=3024174.439..3024174.439 rows=4 loops=1)
Sort Key: l_returnflag, l_linestatus
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=43539570.30..43539570.41 rows=6 width=260) (actual time=3024174.396..3024174.403 rows=4 loops=1)
Group By Key: l_returnflag, l_linestatus
-> Seq Scan on lineitem (cost=0.00..19904554.46 rows=590875396 width=28) (actual time=0.016..405210.038 rows=596140342 loops=1)
Filter: (l_shipdate <= '1998-10-01 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 3897560
Total runtime: 3024174.578 ms
(9 rows)
Execution plan of the TPCH Q1 vectorized executor:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Row Adapter (cost=43825808.18..43825808.18 rows=6 width=298) (actual time=683224.925..683224.927 rows=4 loops=1)
-> Vector Sort (cost=43825808.16..43825808.18 rows=6 width=298) (actual time=683224.919..683224.919 rows=4 loops=1)
Sort Key: l_returnflag, l_linestatus
Sort Method: quicksort Memory: 3kB
-> Vector Sonic Hash Aggregate (cost=43825807.98..43825808.08 rows=6 width=298) (actual time=683224.837..683224.837 rows=4 loops=1)
Group By Key: l_returnflag, l_linestatus
-> Vector Adapter(type: BATCH MODE) (cost=19966853.54..19966853.54 rows=596473861 width=66) (actual time=0.982..470840.274 rows=596140342 loops=1)
Filter: (l_shipdate <= '1998-10-01 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 3897560
-> Seq Scan on lineitem (cost=0.00..19966853.54 rows=596473861 width=66) (actual time=0.364..199301.737 rows=600037902 loops=1)
Total runtime: 683225.564 ms
(11 rows)