MOT Configuration Settings
MOT is provided preconfigured to creating working MOT Tables. For best results, it is recommended to customize the MOT configuration (defined in the file named mot.conf) according to your application's specific requirements and your preferences.
This file is read-only upon server startup. If you edit this file while the system is running, then the server must be reloaded in order for the changes to take effect.
The mot.conf file is located in the same folder as the postgres.conf configuration file.
In primary/standby deployment mode, mot.conf file should be exactly same in both primary and standby nodes. Otherwise, the system behavior is undefined.
Read the General Guidelines section and then review and configure the following sections of the mot.conf file, as needed.
NOTE: The topics listed above describe each of the setting sections in the mot.conf file. In addition to the above topics, for an overview of all the aspects of a specific MOT feature (such as Recovery), you may refer to the relevant topic of this user manual. For example, the mot.conf file has a Recovery section that contains settings that affect MOT recovery and this is described in the MOT Recovery section that is listed above. In addition, for a full description of all aspects of Recovery, you may refer to the MOT Recovery section of the Administration chapter of this user manual. Reference links are also provided in each relevant section of the descriptions below.
The following topics describe each section in the mot.conf file and the settings that it contains, as well as the default value of each.
General Guidelines
The following are general guidelines for editing the mot.conf file.
Each setting appears with its default value as follows –
# name = value
Blank/white space is acceptable.
Comments are indicated by placing a number sign (#) anywhere on a line.
The default values of each setting appear as a comment throughout this file.
In case a parameter is uncommented and a new value is placed, the new setting is defined.
Changes to the mot.conf file are applied only at the start or reload of the database server.
Memory Units are represented as follows –
- KB – Kilobytes
- MB – Megabytes
- GB – Gigabytes
- TB – Terabytes
If no memory units are specified, then bytes are assumed.
Some memory units are represented as a percentage of the max_process_memory setting that is configured in postgresql.conf. For example – 20%.
Time units are represented as follows –
- us – microseconds (or micros)
- ms – milliseconds (or millis)
- s – seconds (or secs)
- min – minutes (or mins)
- h – hours
- d – days
If no time units are specified, then microseconds are assumed.
REDO LOG (MOT)
enable_group_commit = false
Specifies whether to use group commit.
This option is only relevant when openGauss is configured to use synchronous commit, meaning only when the synchronous_commit setting in postgresql.conf is configured to any value other than off.
group_commit_size = 16
group_commit_timeout = 10 ms
This option is only relevant when the MOT engine has been configured to Synchronous Group Commit logging. This means that the synchronous_commit setting in postgresql.conf is configured to True and the enable_group_commit parameter in the mot.conf configuration file is configured to True.
Defines which of the following determines when a group of transactions is recorded in the WAL Redo Log –
group_commit_size – The quantity of committed transactions in a group. For example, 16 means that when 16 transactions in the same group have been committed by their client application, then an entry is written to disk in the WAL Redo Log for each of the 16 transactions.
group_commit_timeout – A timeout period in ms. For example, 10 means that after 10 ms, an entry is written to disk in the WAL Redo Log for each of the transactions in the same group that have been committed by their client application in the lats 10 ms.
A commit group is closed after either the configured number of transactions has arrived or after the configured timeout period since the group was opened. After the group is closed, all the transactions in the group wait for a group flush to complete execution and then notify the client that each transaction has ended.
NOTE: You may refer to MOT Logging – WAL Redo Log section for more information about the WAL Redo Log and synchronous group commit logging.
CHECKPOINT (MOT)
checkpoint_dir =
Specifies the directory in which checkpoint data is to be stored. The default location is in the data folder of each data node.
checkpoint_segsize = 16 MB
Specifies the segment size used during checkpoint. Checkpoint is performed in segments. When a segment is full, it is serialized to disk and a new segment is opened for the subsequent checkpoint data.
checkpoint_workers = 3
Specifies the number of workers to use during checkpoint.
Checkpoint is performed in parallel by several MOT engine workers. The quantity of workers may substantially affect the overall performance of the entire checkpoint operation, as well as the operation of other running transactions. To achieve a shorter checkpoint duration, a larger number of workers should be used, up to the optimal number (which varies based on the hardware and workload). However, be aware that if this number is too large, it may negatively impact the execution time of other running transactions. Keep this number as low as possible to minimize the effect on the runtime of other running transactions, but at the cost of longer checkpoint duration.
NOTE: You may refer to the MOT Checkpoints section for more information about configuration settings.
RECOVERY (MOT)
checkpoint_recovery_workers = 3
Specifies the number of workers (threads) to use during checkpoint data recovery. Each MOT engine worker runs on its own core and can process a different table in parallel by reading it into memory. For example, while the default is three-course, you might prefer to set this parameter to the number of cores that are available for processing. After recovery these threads are stopped and killed.
NOTE: You may refer to the MOT Recovery section for more information about configuration settings.
parallel_recovery_workers = 5 Specifies the number of workers to use during redo recovery/replay.
parallel_recovery_queue_size = 512 Specifies the size of the queue used during recovery to hold redo log segments. This parameter also limits the maximum number of transactions that can be active (in progress) during parallel recovery. If this limit is reached, redo replay will wait for some of the transactions to commit before processing the redo log for new transactions.
STATISTICS (MOT)
enable_stats = false
Configures periodic statistics for printing.
print_stats_period = 10 minute
Configures the time period for printing a summary statistics report.
print_full_stats_period = 1 hours
Configures the time period for printing a full statistics report.
The following settings configure the various sections included in the periodic statistics report. If none of them are configured, then the statistics report is suppressed.
enable_log_recovery_stats = false
Log recovery statistics contain various Redo Log recovery metrics.
enable_db_session_stats = false
Database session statistics contain transaction events, such commits, rollbacks and so on.
enable_network_stats = false
Network statistics contain connection/disconnection events.
enable_log_stats = false
Log statistics contain details regarding the Redo Log.
enable_memory_stats = false
Memory statistics contain memory-layer details.
enable_process_stats = false
Process statistics contain total memory and CPU consumption for the current process.
enable_system_stats = false
System statistics contain total memory and CPU consumption for the entire system.
enable_jit_stats = false
JIT statistics contain information regarding JIT query compilation and execution.
ERROR LOG (MOT)
log_level = INFO
Configures the log level of messages issued by the MOT engine and recorded in the Error log of the database server. Valid values are PANIC, ERROR, WARN, INFO, TRACE, DEBUG, DIAG1 and DIAG2.
Log.COMPONENT.LOGGER.log_level=LOG_LEVEL
Configures specific loggers using the syntax described below.
For example, to configure the TRACE log level for the ThreadIdPool logger in system component, use the following syntax –
Log.System.ThreadIdPool.log_level=TRACE
To configure the log level for all loggers under some component, use the following syntax –
Log.COMPONENT.log_level=LOG_LEVEL
For example –
Log.System.log_level=DEBUG
MEMORY (MOT)
enable_numa = true
Specifies whether to use NUMA-aware memory allocation.
When disabled, all affinity configurations are disabled as well.
MOT engine assumes that all the available NUMA nodes have memory. If the machine has some special configuration in which some of the NUMA nodes have no memory, then the MOT engine initialization and hence the database server startup will fail. In such machines, it is recommended that this configuration value be set to false, in order to prevent startup failures and let the MOT engine to function normally without using NUMA-aware memory allocation.
affinity_mode = equal-per-socket
Configures the affinity mode of threads for the user session and internal MOT tasks.
When a thread pool is used, this value is ignored for user sessions, as their affinity is governed by the thread pool. However, it is still used for internal MOT tasks.
Valid values are fill-socket-first, equal-per-socket, fill-physical-first and none –
- Fill-socket-first attaches threads to cores in the same socket until the socket is full and then moves to the next socket.
- Equal-per-socket spreads threads evenly among all sockets.
- Fill-physical-first attaches threads to physical cores in the same socket until all physical cores are employed and then moves to the next socket. When all physical cores are used, then the process begins again with hyper-threaded cores.
- None disables any affinity configuration and lets the system scheduler determine on which core each thread is scheduled to run.
lazy_load_chunk_directory = true
Configures the chunk directory mode that is used for memory chunk lookup.
Lazy mode configures the chunk directory to load parts of it on demand, thus reducing the initial memory footprint (from 1 GB to 1 MB approximately). However, this may result in minor performance penalties and errors in extreme conditions of memory distress. In contrast, using a non-lazy chunk directory allocates an additional 1 GB of initial memory, produces slightly higher performance and ensures that chunk directory errors are avoided during memory distress.
reserve_memory_mode = virtual
Configures the memory reservation mode (either physical or virtual).
Whenever memory is allocated from the kernel, this configuration value is consulted to determine whether the allocated memory is to be resident (physical) or not (virtual). This relates primarily to preallocation, but may also affect runtime allocations. For physical reservation mode, the entire allocated memory region is made resident by forcing page faults on all pages spanned by the memory region. Configuring virtual memory reservation may result in faster memory allocation (particularly during preallocation), but may result in page faults during the initial access (and thus may result in a slight performance hit) and more sever errors when physical memory is unavailable. In contrast, physical memory allocation is slower, but later access is both faster and guaranteed.
store_memory_policy = compact
Configures the memory storage policy (compact or expanding).
When compact policy is defined, unused memory is released back to the kernel, until the lower memory limit is reached (see min_mot_memory below). In expanding policy, unused memory is stored in the MOT engine for later reuse. A compact storage policy reduces the memory footprint of the MOT engine, but may occasionally result in minor performance degradation. In addition, it may result in unavailable memory during memory distress. In contrast, expanding mode uses more memory, but results in faster memory allocation and provides a greater guarantee that memory can be re-allocated after being de-allocated.
chunk_alloc_policy = auto
Configures the chunk allocation policy for global memory.
MOT memory is organized in chunks of 2 MB each. The source NUMA node and the memory layout of each chunk affect the spread of table data among NUMA nodes, and therefore can significantly affect the data access time. When allocating a chunk on a specific NUMA node, the allocation policy is consulted.
Available values are auto, local, page-interleaved, chunk-interleaved and native –
- Auto policy selects a chunk allocation policy based on the current hardware.
- Local policy allocates each chunk on its respective NUMA node.
- Page-interleaved policy allocates chunks that are composed of interleaved memory 4‑kilobyte pages from all NUMA nodes.
- Chunk-interleaved policy allocates chunks in a round robin fashion from all NUMA nodes.
- Native policy allocates chunks by calling the native system memory allocator.
chunk_prealloc_worker_count = 8
Configures the number of workers per NUMA node participating in memory preallocation.
max_mot_global_memory = 80%
Configures the maximum memory limit for the global memory of the MOT engine.
Specifying a percentage value relates to the total defined by max_process_memory configured in postgresql.conf.
The MOT engine memory is divided into global (long-term) memory that is mainly used to store user data and local (short-term) memory that is mainly used by user sessions for local needs.
Any attempt to allocate memory beyond this limit is denied and an error is reported to the user. Ensure that the sum of max_mot_global_memory and max_mot_local_memory do not exceed the max_process_memory configured in postgresql.conf.
min_mot_global_memory = 0 MB
Configures the minimum memory limit for the global memory of the MOT engine.
Specifying a percentage value relates to the total defined by the max_process_memory configured in postgresql.conf.
This value is used for the preallocation of memory during startup, as well as to ensure that a minimum amount of memory is available for the MOT engine during its normal operation. When using compact storage policy (see store_memory_policy above), this value designates the lower limit under which memory is not released back to the kernel, but rather kept in the MOT engine for later reuse.
max_mot_local_memory = 15%
Configures the maximum memory limit for the local memory of the MOT engine.
Specifying a percentage value relates to the total defined by the max_process_memory configured in postgresql.conf.
MOT engine memory is divided into global (long-term) memory that is mainly used to store user data and local (short-term) memory that is mainly used by user session for local needs.
Any attempt to allocate memory beyond this limit is denied and an error is reported to the user. Ensure that the sum of max_mot_global_memory and max_mot_local_memory do not exceed the max_process_memory configured in postgresql.conf.
min_mot_local_memory = 0 MB
Configures the minimum memory limit for the local memory of the MOT engine.
Specifying a percentage value relates to the total defined by the max_process_memory configured in postgresql.conf.
This value is used for preallocation of memory during startup, as well as to ensure that a minimum amount of memory is available for the MOT engine during its normal operation. When using compact storage policy (see store_memory_policy above), this value designates the lower limit under which memory is not released back to the kernel, but rather kept in the MOT engine for later reuse.
max_mot_session_memory = 0 MB
Configures the maximum memory limit for a single session in the MOT engine.
Typically, sessions in the MOT engine can allocate as much local memory as needed, so long as the local memory limit is not exceeded. To prevent a single session from taking too much memory, and thereby denying memory from other sessions, this configuration item is used to restrict small session-local memory allocations (up to 1,022 KB).
Make sure that this configuration item does not affect large or huge session-local memory allocations.
A value of zero denotes no restriction on any session-local small allocations per session, except for the restriction arising from the local memory allocation limit configured by max_mot_local_memory.
Note: Percentage values cannot be set for this configuration item.
min_mot_session_memory = 0 MB
Configures the minimum memory reservation for a single session in the MOT engine.
This value is used to preallocate memory during session creation, as well as to ensure that a minimum amount of memory is available for the session to perform its normal operation.
Note: Percentage values cannot be set for this configuration item.
session_large_buffer_store_size = 0 MB
Configures the large buffer store for sessions.
When a user session executes a query that requires a lot of memory (for example, when using many rows), the large buffer store is used to increase the certainty level that such memory is available and to serve this memory request more quickly. Any memory allocation for a session exceeding 1,022 KB is considered as a large memory allocation. If the large buffer store is not used or is depleted, such allocations are treated as huge allocations that are served directly from the kernel.
Note: Percentage values cannot be set for this configuration item.
session_large_buffer_store_max_object_size = 0 MB
Configures the maximum object size in the large allocation buffer store for sessions.
Internally, the large buffer store is divided into objects of varying sizes. This value is used to set an upper limit on objects originating from the large buffer store, as well as to determine the internal division of the buffer store into objects of various size.
This size cannot exceed 1/8 of the session_large_buffer_store_size. If it does, it is adjusted to the maximum possible.
Note: Percentage values cannot be set for this configuration item.
session_max_huge_object_size = 1 GB
Configures the maximum size of a single huge memory allocation made by a session.
Huge allocations are served directly from the kernel and therefore are not guaranteed to succeed.
This value also pertains to global (meaning not session-related) memory allocations.
Note: Percentage values cannot be set for this configuration item.
GARBAGE COLLECTION (MOT)
reclaim_threshold = 512 KB
Configures the memory threshold for the garbage collector.
Each session manages its own list of to-be-reclaimed objects and performs its own garbage collection during transaction commitment. This value determines the total memory threshold of objects waiting to be reclaimed, above which garbage collection is triggered for a session.
In general, the trade-off here is between un-reclaimed objects vs garbage collection frequency. Setting a low value keeps low levels of un-reclaimed memory, but causes frequent garbage collection that may affect performance. Setting a high value triggers garbage collection less frequently, but results in higher levels of un-reclaimed memory. This setting is dependent upon the overall workload.
reclaim_batch_size = 8000
Configures the batch size for garbage collection.
The garbage collector reclaims memory from objects in batches, in order to restrict the number of objects being reclaimed in a single garbage collection pass. The intent of this approach is to minimize the operation time of a single garbage collection pass.
high_reclaim_threshold = 8 MB
Configures the high memory threshold for garbage collection.
Because garbage collection works in batches, it is possible that a session may have many objects that can be reclaimed, but which were not. In such situations, in order to prevent garbage collection lists from becoming too bloated, this value is used to continue reclaiming objects within a single pass, even though that batch size limit has been reached, until the total size of the still-waiting-to-be-reclaimed objects is less than this threshold, or there are no more objects eligible for reclamation.
JIT (MOT)
enable_mot_codegen = false
Specifies whether to use JIT query compilation and execution for planned queries.
JIT query execution enables JIT-compiled code to be prepared for a prepared query during its planning phase. The resulting JIT-compiled function is executed whenever the prepared query is invoked. JIT compilation usually takes place in the form of LLVM.
enable_mot_codegen_print = false
Specifies whether to print emitted LLVM code for JIT-compiled queries.
mot_codegen_limit = 50000
Limits the number of JIT queries allowed per user session.
enable_mot_query_codegen = true
Specifies whether to use JIT query compilation and execution for planned queries. JIT query execution allows preparing Just-In-Time compiled code for a prepared query during its planning phase. The resulting JIT-compiled function is executed whenever the prepared query is invoked. JIT compilation takes place in the form of LLVM.
enable_mot_sp_codegen = true
Specifies whether to use JIT query compilation and execution for stored procedures. JIT query execution allows preparing Just-In-Time compiled code for a stored procedure during its compilation phase. The resulting JIT-compiled function is executed whenever the stored procedure is invoked.
enable_mot_codegen_profile = true
Specifies whether to use JIT profiling. When using this option, the mot_jit_profile() function can be used to obtain run-time profile data for jitted stored procedures and queries.
Default mot.conf
The minimum settings and configuration specify to point the postgresql.conf file to the location of the mot.conf file –
postgresql.conf
mot_config_file = '/tmp/gauss/mot.conf'
Ensure that the value of the max_process_memory setting is sufficient to include the global (data and index) and local (sessions) memory of MOT tables.
The default content of mot.conf is sufficient to get started. The settings can be optimized later.