Maintainability

Workload Diagnosis Report

The workload diagnosis report (WDR) generates a performance report between two different time points based on the system performance snapshot data at two different time points. The report is used to diagnose database kernel performance faults.

The WDR performance snapshot data is stored in the snapshot schema of the postgres database. The default collection and retention policies are as follows:

  • A snapshot is collected every hour (wdr_snapshot_interval=1h).
  • It is retained for 8 days (wdr_snapshot_retention_days=8).

WDR depends on the following two components:

  • SNAPSHOT: The performance snapshot can be configured to collect a certain amount of performance data from the kernel at a specified interval and store the data in the user tablespace. Any snapshot can be used as a performance baseline for comparison with other snapshots.
  • WDR Reporter: This tool analyzes the overall system performance based on two snapshots, calculates the changes of more specific performance indicators between the two time periods, and generates summarized and detailed performance data. For details, see Table 1 and Table 2.

Table 1 Summarized diagnosis report

Diagnosis Type

Description

Database Stat

Evaluates the load and I/O status of the current database. Load and I/O are the most important indicators of the TP system.

The statistics include the number of sessions connected to the database, number of committed and rolled back transactions, number of read disk blocks, number of disk blocks found in the cache, number of rows returned, captured, inserted, updated, and deleted through database query, number of conflicts and deadlocks, usage of temporary files, and I/O read/write time.

Load Profile

Evaluates the current system load from the time, I/O, transaction, and SQL dimensions.

The statistics include the job running elapse time, CPU time, daily transaction quality, logical and physical read volume, read and write I/O times and size, login and logout times, SQL, transaction execution volume, and SQL P85 and P90 response time.

Instance Efficiency Percentages

Evaluates the cache efficiency of the current system.

The statistics include the database cache hit ratio.

Events

Evaluates the performance of key system kernel resources and key events.

The statistics include the number of times that the key time of the database kernel occurs and the waiting time.

Wait Classes

Evaluates the performance of key events in the system.

The statistics include the release of the data kernel in the main types of waiting events, such as STATUS, LWLOCK_EVENT, LOCK_EVENT, and IO_EVENT.

CPU

Includes time release of the CPU in user mode, kernel mode, wait I/O, and idle mode.

IO Profile

Includes the number of database I/O times, database I/O data volume, number of redo I/O times, and redo I/O volume.

Memory Statistics

Includes maximum process memory, used process memory, maximum shared memory, and used shared memory.

Table 2 Detailed diagnosis report

Diagnosis Type

Description

Time Model

Evaluates the performance of the current system in the time dimension.

The statistics include time consumed by the system in each phase, including the kernel time, CPU time, execution time, parsing time, compilation time, query rewriting time, plan generation time, network time, and I/O time.

SQL Statistics

Diagnoses SQL statement performance problems.

The statistics include normalized SQL performance indicators in multiple dimensions: elapsed time, CPU time, rows returned, tuple reads, executions, physical reads, and logical reads. The indicators can be classified into execution time, number of execution times, row activity, and cache I/O.

Wait Events

Diagnoses performance of key system resources and key time in detail.

The statistics include the performance of all key events in a period of time, including the number of events and the time consumed.

Cache IO Stats

Diagnoses the performance of user tables and indexes.

The statistics include read and write operations on all user tables and indexes, and the cache hit ratio.

Utility status

Diagnoses the performance of backend jobs.

The statistics include the performance of backend operations such as page operation and replication.

Object stats

Diagnoses the performance of database objects.

The statistics include user tables, tables on indexes, index scan activities, insert, update, and delete activities, number of valid rows, and table maintenance status.

Configuration settings

Determines whether the configuration is changed.

It is a snapshot that contains all current configuration parameters.

Benefits:

  • WDR is the main method for diagnosing long-term performance problems. Based on the performance baseline of a snapshot, performance analysis is performed from multiple dimensions, helping DBAs understand the system load, performance of each component, and performance bottlenecks.
  • Snapshots are also an important data source for subsequent performance problem self-diagnosis and self-optimization suggestions.

Slow SQL Diagnosis

Slow SQL records information about all jobs whose execution time exceeds the threshold.

Historical slow SQL provides table-based and function-based query interfaces. You can query the execution plan, start time, end time, query statement, row activity, kernel time, CPU time, execution time, parsing time, compilation time, query rewriting time, plan generation time, network time, I/O time, network overhead, and lock overhead. All information is anonymized.

Slow SQL provides detailed information required for slow SQL diagnosis. You can diagnose performance problems of specific slow SQL statements offline without reproducing the problem. The table-based and function-based APIs help users collect statistics on slow SQL indicators and connect to third-party platforms.

Feedback
编组 3备份
    openGauss 2024-07-26 00:59:28
    cancel