gs_check

Background

gs_check has been enhanced to unify functions of various check tools, such as gs_check and gs_checkos. It helps you fully check openGauss runtime, OS, network, and database running environments; as well as perform comprehensive checks on various environments before major operations in openGauss, ensuring smooth operation.

Precautions

  • Parameter -i or -e must be set. -i specifies a single item to be checked, and -e specifies an inspection scenario where multiple items will be checked.
  • If -i is not set to a root item or no such items are contained in the check item list of the scenario specified by -e, you do not need to enter the name or password of a user with the root permissions.
  • You can run --skip-root-items to skip root items.
  • If the MTU values are inconsistent, the check may be slow or the check process may fail to respond. When the inspection tool displays a message, change the MTU values of the nodes to be the same and then perform the inspection.
  • If the switch does not support the configured MTU value, process response failures may be caused due to communication problems even if the MTU values are the same. In this case, you need to adjust the MTU based on the switch.

Syntax

  • Check a single-item.

    gs_check -i ITEM [...] [-U USER] [-L] [-l LOGFILE] [-o OUTPUTDIR] [--skip-root-items][--set][--routing]
    
  • Check a scenario.

    gs_check -e SCENE_NAME [-U USER] [-L] [-l LOGFILE] [-o OUTPUTDIR] [--skip-root-items] [--time-out=SECS][--set][--routing][--skip-items]
    
  • Display help information.

    gs_check -? | --help
    
  • Display version information.

    gs_check -V | --version
    

Parameter Description

  • -U

    Specifies the name of the user for running openGauss.

    Value range: Name of the user for running openGauss

  • -L

    Specifies that the check is locally performed.

  • -i

    Specifies a check item. Its format is -i CheckXX. For details about check items, see Table 1.

  • -e

    Specifies scenario check items. Default scenarios include inspect (routine inspection), upgrade (pre-upgrade inspection), binary_upgrade (local pre-upgrade inspection), health (health check inspection), and install (installation). You can also compile scenarios as required.

  • -l

    Specifies a log file path, Add the .log suffix when specifying the path.

  • -o

    Specifies the path of the check result output folder.

  • --skip-root-items

    Skips the check items that require root permissions.

  • --skip-items

    Skips specified check items.

  • --format

    Specifies the format of the result report.

  • --set

    Specifies abnormal items that can be fixed.

  • --time-out

    Specifies the timeout period. The unit is second. If the user-defined timeout period is not less than 1500 seconds, the default value (1500 seconds) is used.

  • --routing

    Specifies the network segment for service IP addresses. The format is IP address:Subnet mask.

  • --disk-threshold=“PERCENT”

    Specifies the alarm threshold when you check disk usage. You can specify the integer value that ranges from 1 to 99. The default value is 90. This parameter is not mandatory for other check items.

  • -?, --help

    Displays help information.

  • -V, --version

    Displays version information.

Table 1 openGauss status checklist

Status

Check Item

Description

--set Supported or Not

OS

CheckCPU

Checks the CPU usage of the host. If idle is greater than 30% and iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.

No

CheckFirewall

Checks the firewall status of the host. If the firewall is disabled, this item passes the check. Otherwise, this item fails the check.

Yes

CheckTimeZone

Checks whether nodes in openGauss use the same time zone. If they do, this item passes the check. Otherwise, this item fails the check.

No

CheckSysParams

Checks whether the values of OS parameters for each node are as expected. If the parameters do not meet the requirements of the warning field, a warning is reported. If the parameters do not meet the requirements of the NG field, this item fails the check, and the parameters are printed.

For details, see OS Parameters.

Yes

CheckOSVer

Check the OS version of each node in openGauss. If versions are consistent with those in the version compatibility list and information about versions of OSs running on nodes in openGauss is included in the same version list, the item passes the check. Otherwise, the item fails the check.

No

CheckNTPD

Checks the NTPD service. If the service is enabled and the time difference across nodes is within 1 minute, this item passes the check. Otherwise, this item fails the check.

No

CheckTHP

Checks the THP service. If the service is enabled, this item passes the check. Otherwise, this item fails the check.

Yes

CheckSshdService

Checks whether the sshd service is started. If yes, this item passes the check. Otherwise, this item fails the check.

No

CheckCrondService

Checks whether the crontab service is started. If yes, this item passes the check. Otherwise, this item fails the check.

Yes

CheckCrontabLeft

Checks whether the crontab file contains remaining Gauss information. If no, this item passes the check. Otherwise, this item fails the check.

Yes

CheckDirLeft

Checks whether the /opt/huawei/Bigdata/, /var/log/Bigdata/, and /home/omm directories of new nodes remain after scale-out. If they do not exist or exist only in the mount directory, this item passes the check. Otherwise, this item fails the check.

Yes

CheckProcessLeft

Checks whether a new node has the gaussdb and omm processes remained after scale-out. If no, this item passes the check. Otherwise, this item fails the check.

Yes

CheckStack

Checks stack depths. If the stack depths across nodes are inconsistent, a warning is reported. If the stack depths are consistent and greater than or equal to 3072, this item passes the check. If the stack depths are consistent but less than 3072, this item fails the check.

Yes

CheckNoCheckSum

Checks the value of nochecksum.

  • When bond NICs are used on the Red Hat 6.4 or 6.5, if the check result is Y on every node, this item passes the check. Otherwise, this item fails the check.
  • For other OSs, if the check result is N on every node, this item passes the check. Otherwise, this item fails the check.

No

CheckOmmUserExist

Checks whether user omm exists on new nodes after scale-out. If no, this item passes the check. Otherwise, this item fails the check.

Yes

CheckPortConflict

Checks whether database node ports are occupied. If they are not, this item passes the check. Otherwise, this item fails the check.

Yes

CheckSysPortRange

Checks the value range of the system parameter ip_local_port_range. If the value range is 26000 to 65535, this item passes the check. Otherwise, this item fails the check.

Yes

CheckEtcHosts

If localhost is not configured for /etc/hosts, there is a mapping whose comment contains #openGauss, or the names of hosts having the same IP address are different, this item fails the check. Otherwise, this item passes the check. In addition, if host names are the same but IP addresses are different, this item also fails the check.

No

CheckCpuCount

Checks the number of CPU cores. If the number is different from that of available CPUs, this item fails the check. If the two numbers are the same but unavailability messages exist, a warning is reported. If the CPU information of all nodes is different, this item fails the check.

No

CheckSctpService

Checks the SCTP service. If the service is enabled and written in the startup file, this item passes the check. Otherwise, this item fails the check.

Yes

CheckHyperThread

Checks hyper-threading. If it is started, this item passes the check. Otherwise, this item fails the check.

No

CheckMemInfo

Checks whether the total memory size of each node is the same. If yes, this item passes the check. Otherwise, a warning is reported.

No

CheckSshdConfig

Checks the /etc/ssh/sshd_config file.

(a)PasswordAuthentication=yes;

(b)MaxStartups=1000;

(c)UseDNS=no;

(d) ClientAliveInterval is greater than 10800 or equal to 0.

If the above information is configured, this item passes the check. If a and c configurations are incorrect, a warning is reported. If b and d configurations are incorrect, this item fails the check.

Yes

CheckMaxHandle

Checks the maximum handle value of the OS. If the value is greater than or equal to 1 million, this item passes the check. Otherwise, this item fails the check.

Yes

CheckKernelVer

Checks the kernel version of each node. If the version information is consistent, this item passes the check. Otherwise, a warning is reported.

No

CheckEncoding

Checks the system code of each node in openGauss. If the codes are consistent, this item passes the check. Otherwise, this item fails the check.

No

CheckBootItems

Checks whether there are manually added startup items. If no, this item passes the check. Otherwise, this item fails the check.

No

CheckDropCache

Checks whether there is a dropcache process running on each node. If there is, this item passes the check. Otherwise, this item fails the check.

No

CheckFilehandle

Checks the following conditions. If both the conditions are met, this item passes the check. Otherwise, this item fails the check.

  • The number of processes opened by each gaussdb process does not exceed 800,000.
  • The number of handles used by the slave process does not exceed that of handles used by the master process.

No

CheckKeyProAdj

Checks all key processes. If the omm_adj value for all key processes is 0, this item passes the check. Otherwise, this item fails the check.

No

CheckMaxProcMemory

Checks whether the value of max_process_memory on the database nodes is greater than 1 GB. If no, this item passes the check. Otherwise, this item fails the check.

Yes

Device

CheckSwapMemory

Checks the swap memory size. If the check result is 0, this item passes the check. Otherwise, a warning is reported. If the result is greater than the total memory, this item fails the check.

Yes

CheckLogicalBlock

Checks the logical block size of a disk. If the result is 512, this item passes the check. Otherwise, this item fails the check.

Yes

CheckIOrequestqueue

Checks the I/O value. If the value is 32768, this item passes the check. Otherwise, this item fails the check.

Yes

CheckMaxAsyIOrequests

Checks the maximum number of asynchronous requests. If the number of asynchronous I/O requests is greater than 104857600 and greater than the number of database instances on the current node x 1048576, this item passes the check. Otherwise, this item fails the check.

Yes

CheckIOConfigure

Checks the I/O configuration. If the result is deadline, this item passes the check. Otherwise, this item fails the check.

Yes

CheckBlockdev

Checks the size of the pre-read block. If the result is 16384, this item passes the check. Otherwise, this item fails the check.

Yes

CheckDiskFormat

Checks the XFS format information about a disk. If the result is rw,noatime,inode64,allocsize=16m, this item passes the check. Otherwise, a warning is reported.

No

CheckInodeUsage

For new nodes, checks all disks.

For old nodes, checks openGauss paths (GAUSSHOME/PGHOST/GAUSSHOME/GAUSSLOG/tmp and instance directories).

Checks the usage of the above directories. If the usage exceeds the warning threshold (60% by default), a warning is reported. If the usage exceeds the NG threshold (80% by default), this item fails the check. If the usage is less than or equal to the NG threshold, this item passes the check.

No

CheckSpaceUsage

For new nodes, checks all disks.

For old nodes, checks openGauss paths (GAUSSHOME/PGHOST/GAUSSHOME/GAUSSLOG/tmp and instance directories).

Checks the usage of the above directories. If the usage exceeds the warning threshold (70% by default), a warning is reported. If the usage exceeds the NG threshold (90% by default), this item fails the check. Also checks the available space of the GAUSSHOME/PGHOST/GPHOME/GAUSSLOG/tmp/data directory. If the space is less than the threshold, this item fails the check. Otherwise, this item passes the check.

No

CheckDiskConfig

Checks whether disk configurations are consistent. If the names, sizes, and mount points of disks are the same, this item passes the check. If any of them is inconsistent, a warning is reported.

No

CheckXid

Checks the value of xid. If the value is greater than 1 billion, a warning is reported. If the value is greater than 1.8 billion, this item fails the check.

No

CheckSysTabSize

Checks the system catalog capacity of each instance. If the available capacity of each disk is greater than the total capacity of system catalogs for all instances on the disk, this item passes the check. Otherwise, this item fails the check.

No

Cluster

CheckClusterState

Checks the fencedUDF status. If it is down, a warning is reported. In this case, check the openGauss status. If it is Normal, this item passes the check. Otherwise, this item fails the check.

No

CheckConfigFileDiff

Checks whether the static configuration file and installation XML file meet the scale-out conditions. If they do, this item passes the check. Otherwise, this item fails the check.

No

CheckDBParams

For the primary database node, checks the size of the shared buffer and the Sem parameter.

For database nodes, checks the size of the shared buffer and the maximum number of connections.

The shared buffer size should be greater than 128 KB, greater than shmmax, and greater than shmall x PAGESIZE.

If there is the primary database node, Sem must be greater than the rounded up result of (Maximum number of database node connections + 150)/16.

If the above items are met, this item passes the check. If any of them is not met, this item fails the check.

Yes

CheckDebugSwitch

Checks the value of the log_min_messages parameter in the configuration file of each instance on each node. If the value is empty, the default log level warning is used. In this case, if the actual log level is not warning, a warning is reported.

Yes

CheckUpVer

Checks the version of the upgrade package on each node in openGauss. If the versions are consistent, this item passes the check. Otherwise, this item fails the check. you need to specify the path of the upgrade software package.

No

CheckDirPermissions

Checks permissions for the node directories (instance Xlog path, GAUSSHOME, GPHOME, PGHOST, and GAUSSLOG). If the directories allow for the write permission and at most 750 permission, this item passes the check. Otherwise, this item fails the check.

Yes

CheckEnvProfile

Checks the environment variables ($GAUSSHOME, $LD_LIBRARY_PATH, and $PATH) of nodes and those of the CMS, CMA, and database node processes. If there are node environment variables that are correctly configured and process environment variables exist, this item passes the check. Otherwise, this item fails the check.

No

CheckGaussVer

Checks whether the gaussdb version of each node is consistent. If the versions are consistent, this item passes the check. Otherwise, this item fails the check.

No

CheckPortRange

Checks the port range. If the value of ip_local_port_range is within the threshold (26000 to 65535 by default) and an instance port is out of the range, this item passes the check. Otherwise, this item fails the check.

No

CheckReadonlyMode

Checks the read only mode. If the value of default_transaction_read_only on the database nodes in openGauss is off, this item passes the check. Otherwise, this item fails the check.

No

CheckCatchup

Checks whether the CatchupMain function can be found in the gaussdb process stack. If no, this item passes the check. Otherwise, this item fails the check.

No

CheckProcessStatus

Checks the owner of the gaussdb processes. If their owner is only user omm, this item passes the check. Otherwise, this item fails the check.

No

CheckSpecialFile

Checks whether the files in the tmp directory (PGHOST), OM directory (GPHOME), log directory (GAUSSLOG), data directory, and program directory (GAUSSHOME) contain special characters or whether there are files that do not belong to user omm. If none of them exists, this item passes the check. Otherwise, this item fails the check.

No

CheckCollector

Checks whether information is successfully collected in the output directory. If yes, this item passes the check. Otherwise, this item fails the check.

No

CheckLargeFile

Checks whether there is a file over 4 GB in the directory of each database node. If there is such a file in any database node directory and its subdirectories, this item fails the check. Otherwise, this item passes the check.

No

CheckProStartTime

Checks whether the interval for starting key processes exceeds 5 minutes. If no, this item passes the check. Otherwise, this item fails the check.

No

CheckDilateSysTab

Checks whether a system catalog is bloated. If no, this item passes the check. Otherwise, this item fails the check.

Yes

CheckMpprcFile

Checks whether the environment variable isolation file is modified. If no, this item passes the check. Otherwise, this item fails the check.

No

Database

CheckLockNum

Checks the number of database locks. If a result is returned, this item passes the check.

No

CheckArchiveParameter

Checks the database archive parameter. If the parameter is not enabled or is enabled for database nodes, this item passes the check. If it is enabled but not for database nodes, this item fails the check.

Yes

CheckCurConnCount

Checks the number of database connections. If the number is less than 90% of the maximum connection quantity, this item passes the check. Otherwise, this item fails the check.

No

CheckCursorNum

Checks the number of cursors in the database. If a result is returned, this item passes the check. Otherwise, this item fails the check.

No

CheckMaxDatanode

Checks the maximum number of database nodes. If the number is less than the number of nodes configured in the XML file multiplied by the number of database nodes (90 x 5 by default), a warning is reported. Otherwise, this item passes the check.

Yes

CheckPgPreparedXacts

Checks the pgxc_prepared_xacts parameter. If no 2PC transactions are found, this item passes the check. Otherwise, this item fails the check.

Yes

CheckPgxcgroup

Checks the number of redistributed records in the pgxc_group table. If the result is 0, this item passes the check. Otherwise, this item fails the check.

No

CheckLockState

Checks whether openGauss is locked. If no, this item passes the check. Otherwise, this item fails the check.

No

CheckIdleSession

Checks the number of non-idle sessions. If the result is 0, this item passes the check. Otherwise, this item fails the check.

No

CheckDBConnection

Checks whether the database can be connected. If yes, this item passes the check. Otherwise, this item fails the check.

No

CheckGUCValue

Checks the result of [(max_connections + max_prepared_transactions) x max_locks_per_transaction]. If it is greater than or equal to 1 million, this item passes the check. Otherwise, this item fails the check.

Yes

CheckPMKData

Checks whether the PMK schema of the database contains abnormal data. If no, this item passes the check. Otherwise, this item fails the check.

Yes

CheckSysTable

Checks the system catalog. If the check can be performed, this item passes the check.

No

CheckSysTabSize

Checks the system catalog capacity of each instance. If the available capacity of each disk is greater than the total capacity of system catalogs for all instances on the disk, this item passes the check. Otherwise, this item fails the check.

No

CheckTableSpace

Checks tablespace paths. If no tablespace path and openGauss path are nested and no tablespace paths are nested, this item passes the check. Otherwise, this item fails the check.

No

CheckTableSkew

Checks the skew of table data. If a table has unbalanced data distribution among database nodes in openGauss and the database node with the most data has over 100,000 records more than the database with the smallest amount of data, this item fails the check. Otherwise, this item passes the check.

No

CheckDNSkew

Checks the skew of table data at the database node level. If the database node with the most amount of data has 5% more than the database node with the smallest amount of data, this item fails the check. Otherwise, this item passes the check.

No

CheckUnAnalyzeTable

Checks for a table that has not been analyzed. If there is such a table and the table contains at least one record, this item fails the check. Otherwise, this item passes the check.

Yes

CheckCreateView

Checks whether the query statement for creating a view contains sub-queries, and parsing and rewriting sub-query results result in duplicate aliases. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckHashIndex

Checks whether there are hash indexes. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckNextvalInDefault

Checks whether a DEFAULT expression contains nextval (sequence). If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckNodeGroupName

Checks whether the name of a Node Group contains non-SQL_ASCII characters. If yes, this item fails the check. Otherwise, this item passes the check.

Yes

CheckPgxcRedistb

Checks whether any temporary table remains in the database after data redistribution. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckReturnType

Checks whether a user-defined function contains invalid return value types. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckSysadminUser

Checks whether there are database administrators in addition to the owner of openGauss. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckTDDate

Checks whether the ORC table in a Teradata database contains columns of the date type. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckDropColumn

Checks whether there are tables on which DROP COLUMN has been performed. If yes, this item fails the check. Otherwise, this item passes the check.

No

CheckDiskFailure

Checks for disk faults. If there is an error during full data query in openGauss, this item fails the check. Otherwise, this item passes the check.

No

Network

CheckPing

Checks the connectivity of all nodes in openGauss. If all their IP addresses can be pinged from each other, this item passes the check. Otherwise, this item fails the check.

No

CheckRXTX

Checks the RX/TX value for backIP of a node. If it is 4096, this item passes the check. Otherwise, this item fails the check.

Yes

CheckMTU

Checks the MTU value of a NIC corresponding to backIP of a node (ensure consistent PICs after bonding). If the result is not 8192 or 1500, a warning is reported. In this case, if MTU values in openGauss are the same, this item passes the check. Otherwise, this item fails the check.

Yes

CheckNetWorkDrop

Checks the packet loss rate of each IP address within 1 minute. If the rate does not exceed 1%, this item passes the check. Otherwise, this item fails the check.

No

CheckBond

Checks whether BONDING_OPTS or BONDING_MODULE_OPTS is configured. If no, a warning is reported. In this case, checks whether the bond mode of each node is the same. If yes, this item passes the check. Otherwise, this item fails the check.

Yes

CheckMultiQueue

Checks cat /proc/interrupts. If multiqueue is enabled for NICs and different CPUs are bound, this item passes the check. Otherwise, this item fails the check.

Yes

CheckUsedPort

Checks the value of net.ipv4.ip_local_port_range. If the value is greater than or equal to the default value of the OS (32768 to 61000), this item passes the check.

Checks the number of random TCP ports. If the number is less than 80% of the total number of random ports, this item passes the check.

Checks the number of random SCTP ports. If the number is less than 80% of the total number of random ports, this item passes the check.

No

CheckNICModel

Checks whether NIC models or driver versions are consistent across nodes. If yes, this item passes the check. Otherwise, a warning is reported.

No

CheckRouting

Checks the number of IP addresses on the service network segment for each node. If the number exceeds 1, a warning is reported. Otherwise, this item passes the check.

No

CheckNetSpeed

When the network is fully loaded, checks whether the average NIC receiving bandwidth is greater than 600 MB. If yes, this item passes the check.

When the network is fully loaded, checks the network ping value. If it is shorter than 1s, this item passes the check.

When the network is fully loaded, checks the NIC packet loss rate. If it is less than 1%, this item passes the check.

No

Others

CheckDataDiskUsage

Checks the usage of the disk database node directory. If the usage is lower than 90%, this item passes the check. Otherwise, this item fails the check.

No

NOTE: Constraints on the CheckNetSpeed check item are as follows:

  • Do not use -L to check CheckNetSpeed, because doing so cannot produce enough network load and the check result will be inaccurate.
  • If the number of nodes is less than six, the network load produced by speed_test may not fully occupy the bandwidth, and the check result will be inaccurate.

Defining a Scenario

  1. Log in as the OS user omm to the primary node of the database.

  2. Create the scenario configuration file scene_XXX.xml in the script/gspylib/inspection/config directory.

  3. Write check items to the scenario configuration file in the following format:

    <?xml version="1.0" encoding="utf-8" ?>
    <scene name="XXX" desc="check cluster parameters before XXX.">
    <configuration/>
    <allowitems>
    <item name="CheckXXX"/>
    <item name="CheckXXX"/>
    </allowitems>
    </scene>
    

    item name indicates the check item name.

    Note: You need to ensure that the user-defined XML file is correct.

  4. Run the following command in the home/package/script/gspylib/inspection/config directory to deploy the file on each node where the check is to be performed:

    scp scene_upgrade.xml SIA1000068994: home/package/script/gspylib/inspection/config/
    

    NOTE: home/package/script/gspylib/inspection/config is the absolute path of the new scenario configuration file.

  5. Switch to user omm and run the following command on an old node to view the check result:

    gs_check  -e XXX
    

Defining a Check Item

  1. Add a check item. Modify the script/gspylib/inspection/config/items.xml file in the following format:

    <checkitem id="10010" name="CheckCPU">
    <title>
    <zh>Check the CPU usage.</zh>
    <en>Check CPU Idle and I/O wait</en>
    </title>
    <threshold>
    StandardCPUIdle=30;
    StandardWIO=30
    </threshold>
    <suggestion>
    <zh>If the available space is insufficient and the CPU is heavily loaded, scale out the nodes. If iowait is too high, expand the disk capacity, which is the current performance bottleneck.</zh>.
    </suggestion>
    <standard>
    <zh>Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.</zh>
    </standard>
    <category>os</category>
    <permission>user</permission>
    <scope>all</scope>
    <analysis>default</analysis>
    </checkitems>
    
    • id: specifies the check item ID.

    • name: specifies the name of the check script.

    • title: specifies the check item description. It allows multiple languages.

      <zh>: checks content of Chinese version.

      <en>: checks content of English version.

    • standard: specifies the check standards. It allows multiple languages.

    • suggestion: provides advice on how to fix check item problems. It allows multiple languages.

    • threshold: specifies the check item threshold. Multiple values are separated using semicolons (;), for example, Key1=Value1;Key2=Value2.

    • category: specifies the check item type. It is optional. Its value can be os, device, network, cluster, database, or other.

    • permission: specifies the permission required for checking an item. It is optional. Its value can be root or user (default).

    • scope: specifies the node scope where an item is checked. It is optional. cn- indicates that only the primary database node resides is checked. local- indicates that only the current node is checked. all- is the default value, indicating that all nodes in openGauss are checked.

    • analysis: specifies how the check result is analyzed. default- is the default value, indicating that the result on every node is checked, and that an item passes the check only if it passes the check on all the nodes. consistent- indicates that each node returns a result, and that an item passes the check if all the results are consistent. custom- indicates other ways.

    Note: You need to ensure that the user-defined XML file is correct.

  2. Create a check script named CheckXXXX.py in the script/gspylib/inspection/items directory. The directory should contain multiple folders, each storing a type of scripts. The format is as follows:

    class CheckCPU(BaseItem):
    def __init__(self):
    super(CheckCPU, self).__init__(self.__class__.__name__)
    self.idle = None
    self.wio = None
    self.standard = None
    
    def preCheck(self):
    # check the threshold was set correctly
    if (not self.threshold.has_key('StandardCPUIdle')
    or not self.threshold.has_key('StandardWIO')):
    raise Exception("threshold can not be empty")
    self.idle = self.threshold['StandardCPUIdle']
    self.wio = self.threshold['StandardWIO']
    
    # format the standard by threshold
    self.standard = self.standard.format(idle=self.idle, iowait=self.wio)
    
    def doCheck(self):
    cmd = "sar 1 5 2>&1"
    output = SharedFuncs.runShellCmd(cmd)
    self.result.raw = output
    # check the result with threshold
    d = next(n.split() for n in output.splitlines() if "Average" in n)
    iowait = d[-3]
    idle = d[-1]
    rst = ResultStatus.OK
    vals = []
    if (iowait > self.wio):
    rst = ResultStatus.NG
    vals.append("The %s actual value %s is greater than expected value %s" % ("IOWait", iowait, self.wio))
    if (idle < self.idle):
    rst = ResultStatus.NG
    vals.append("The %s actual value %s is less than expected value %s" % ("Idle", idle, self.idle))
    self.result.rst = rst
    if (vals):
    self.result.val = "\n".join(vals)
    

    A script is developed based on the BaseItem class, which defines the common check process, result analysis method, and default output format. Extended parameters:

    • doCheck: contains specific ways to check an item. The check result is in the following format:

      result.rst: (optional) specifies the check result. Its value can be:

      • OK: indicates that the item passes the check.
      • NA: indicates that the check does not cover the node.
      • NG: indicates that the item failed the check.
      • WARNING: indicates that the check is complete and that a warning is reported.
      • ERROR: indicates that the check is interrupted due to an internal error.
    • preCheck: checks prerequisites. Its value can be cnPreCheck, which checks whether a primary database node instance is deployed on the current execution node; or localPreCheck, which checks whether the current execution node is specified for the check. You can set it using scope in the check item configuration file. This method can be reloaded to perform customized pre-checks.

    • postAnalysis specifies how the check result is analyzed. Its value can be default or consistent. You can set it using analysis in the check item configuration file. This method can be reloaded to perform customized result analysis.

    Note: The name of a user-defined check item cannot be the same as the name of an existing check item. In addition, you need to ensure that the user-defined check item script is standard.

  3. Deploy the script on all execution nodes.

  4. Log in to the nodes added in a scale-out as user root or to old nodes as user omm. Run the following commands as required and view the result:

    To locally perform a check, run the following command:

    gs_check -i CheckXXX  -L
    

    To remotely perform a check, run the following command:

    gs_check  -i  CheckXXX
    

OS Parameters

Table 2 OS parameters

Parameter

Description

Recommended Value

net.ipv4.tcp_max_tw_buckets

Specifies the maximum number of TCP/IP connections concurrently remaining in the TIME_WAIT state. If the number of TCP/IP connections concurrently remaining in the TIME_WAIT state exceeds the value of this parameter, the TCP/IP connections in the TIME_WAIT state will be released immediately, and alarm information will be printed.

10000

net.ipv4.tcp_tw_reuse

Reuses sockets whose status is TIME-WAIT for new TCP connections.

  • 0: This function is disabled.
  • 1: This function is enabled.

1

net.ipv4.tcp_tw_recycle

Rapidly reclaims sockets whose status is TIME-WAIT in TCP connections.

  • 0: This function is disabled.
  • 1: This function is enabled.

1

net.ipv4.tcp_keepalive_time

Specifies how often Keepalived messages are sent through TCP connections when Keepalived is enabled.

30

net.ipv4.tcp_keepalive_probes

Specifies the number of Keepalived detection packets sent through a TCP connection before the connection is regarded invalid. The product of the parameter value multiplied by the value of the tcp_keepalive_intvl parameter determines the response timeout duration after a Keepalived message is sent through a connection.

9

net.ipv4.tcp_keepalive_intvl

Specifies how often a detection packet is re-sent when the previous packets are not acknowledged.

30

net.ipv4.tcp_retries1

Specifies the maximum TCP reattempts during connection establishment.

5

net.ipv4.tcp_syn_retries

Specifies the maximum SYN packet reattempts in the TCP.

5

net.ipv4.tcp_synack_retries

Specifies the maximum SYN response packet reattempts in the TCP.

5

net.sctp.path_max_retrans

Specifies the maximum SCTP reattempts.

10

net.sctp.max_init_retransmits

Specifies the maximum INIT packet reattempts in the SCTP.

10

net.sctp.association_max_retrans

Specifies the maximum reattempts of a single logical connection in the SCTP.

10

net.sctp.hb_interval

Specifies the retransmission interval of heartbeat detection packets in the SCTP.

30000

net.ipv4.tcp_retries2

Specifies the number of times that the kernel re-sends data to a connected remote host. A smaller value leads to earlier detection of an invalid connection to the remote host, and the server can quickly release this connection.

If "connection reset by peer" is displayed, increase the value of this parameter to avoid the problem.

12

vm.overcommit_memory

Specifies the kernel check method during memory allocation.

  • 0: The system accurately calculates the current available memory.
  • 1: The system returns a success message without a kernel check.
  • 2: The system returns a failure message if the memory size you have applied for exceeds the result of the following formula: Total memory size x Value of vm.overcommit_ratio/100 + Total SWAP size.

The default value for a kernel is 2, which is too conservative. The recommended value is 0. If system loads are high, set this parameter to 1.

0

net.sctp.sndbuf_policy

Specifies the buffer allocation policy on the SCTP sender.

  • 0: The buffer is allocated by connection.
  • 1: The buffer is allocated by association.

0

net.sctp.rcvbuf_policy

Specifies the buffer allocation policy on the SCTP receiver.

  • 0: The buffer is allocated by connection.
  • 1: The buffer is allocated by association.

0

net.sctp.sctp_mem

Specifies the maximum free memory of the kernel SCTP stack. Three memory size ranges in the unit of page are provided: min, default, and max. If the value is max, packet loss occurs.

94500000 915000000 927000000

net.sctp.sctp_rmem

Specifies the total free memory for receiving data in the kernel SCTP stack. Three memory size ranges in the unit of page are provided: min, default, and max. If the value is max, packet loss occurs.

8192 250000 16777216

net.sctp.sctp_wmem

Specifies the total free memory for sending data in the kernel SCTP stack. Three memory size ranges in the unit of page are provided: min, default, and max. If the value is max, packet loss occurs.

8192 250000 16777216

net.ipv4.tcp_rmem

Specifies the free memory in the TCP receiver buffer. Three memory size ranges in the unit of page are provided: min, default, and max.

8192 250000 16777216

net.ipv4.tcp_wmem

Specifies the free memory in the TCP sender buffer. Three memory size ranges in the unit of page are provided: min, default, and max.

8192 250000 16777216

net.core.wmem_max

Specifies the maximum size of the socket sender buffer.

21299200

net.core.rmem_max

Specifies the maximum size of the socket receiver buffer.

21299200

net.core.wmem_default

Specifies the default size of the socket sender buffer.

21299200

net.core.rmem_default

Specifies the default size of the socket receiver buffer.

21299200

net.ipv4.ip_local_port_range

Specifies the range of temporary ports that can be used by a physical server.

26000-65535

kernel.sem

Specifies the kernel semaphore.

250 6400000 1000 25600

vm.min_free_kbytes

Specifies the minimum free physical memory reserved for unexpected page breaks.

5% of the total system memory

net.core.somaxconn

Specifies the maximum length of the listening queue of each port. This is a global parameter.

65535

net.ipv4.tcp_syncookies

Specifies whether to enable SYN cookies to guard the OS against SYN attacks when the SYN waiting queue overflows.

  • 0: The SYN cookies are disabled.
  • 1: The SYN cookies are enabled.

1

net.sctp.addip_enable

Specifies whether dynamic address reset of the SCTP is enabled.

  • 0: This function is disabled.
  • 1: This function is enabled.

0

net.core.netdev_max_backlog

Specifies the maximum number of data packets that can be sent to the queue when the rate at which the network device receives data packets is higher than that at which the kernel processes the data packets.

65535

net.ipv4.tcp_max_syn_backlog

Specifies the maximum number of unacknowledged connection requests to be recorded.

65535

net.ipv4.tcp_fin_timeout

Specifies the default timeout.

60

kernel.shmall

Specifies the total shared free memory of the kernel.

1152921504606846720

kernel.shmmax

Specifies the maximum value of a shared memory segment.

18446744073709551615

net.ipv4.tcp_sack

Specifies whether selective acknowledgment is enabled. The selective acknowledgment on out-of-order packets can increase system performance. Restricting users to sending only lost packets (for wide area networks) should be enabled, but this will increase CPU usage.

  • 0: This function is disabled.
  • 1: This function is enabled.

1

net.ipv4.tcp_timestamps

Specifies whether the TCP timestamp (12 bytes are added in the TCP packet header) enables a more accurate RTT calculation than the retransmission timeout (for details, see RFC 1323) for better performance.

  • 0: This function is disabled.
  • 1: This function is enabled.

1

vm.extfrag_threshold

When system memory is insufficient, Linux will score the current system memory fragments. If the score is higher than the value of vm.extfrag_threshold, kswapd triggers memory compaction. When the value of this parameter is close to 1000, the system tends to swap out old pages when processing memory fragments to meet the application requirements. When the value of this parameter is close to 0, the system tends to do memory compaction when processing memory fragments.

500

vm.overcommit_ratio

When the system uses the algorithms where memory usage never exceeds the thresholds, the total memory address space of the system cannot exceed the value of swap+RAM multiplied by the percentage specified by this parameter. When the value of vm.overcommit_memory is set to 2, this parameter takes effect.

90

/sys/module/sctp/parameters/no_checksums

Specifies whether checksum is disabled in SCTP.

0

MTU

Specifies the maximum transmission unit (MTU) for a node NIC. The default value in the OS is 1500. You can set it to 8192 to improve the performance of sending and receiving data using SCTP.

8192

File System Parameters

  • soft nofile

    Indicates the soft limit. The number of file handles used by a user can exceed this parameter value. However, an alarm will be reported.

    Recommended value: 1000000

  • hard nofile

    Indicates the hard limit. The number of file handles used by a user cannot exceed this parameter value.

    Recommended value: 1000000

  • stack size

    Specifies the thread stack size.

    Recommended value: 3072

Examples

Check result of a single item:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3

Checking...               [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes.  success: 3

Success. All check items run completed. Total:1  Success:1  Failed:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz

Local execution result:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L

2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.
2017-12-29 17:09:29 [RST] OK

2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749)  12/29/17  _x86_64_

17:09:24        CPU     %user     %nice   %system   %iowait    %steal     %idle
17:09:25        all      0.25      0.00      0.25      0.00      0.00     99.50
17:09:26        all      0.25      0.00      0.13      0.00      0.00     99.62
17:09:27        all      0.25      0.00      0.25      0.13      0.00     99.37
17:09:28        all      0.38      0.00      0.25      0.00      0.13     99.25
17:09:29        all      1.00      0.00      0.88      0.00      0.00     98.12
Average:        all      0.43      0.00      0.35      0.03      0.03     99.17

Check result of a scenario:

[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Skip CheckHdfsForeignTabEncoding because it only applies to V1R5 upgrade V1R6 with cluster.
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOConfigure CheckMTU CheckRXTX CheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpService CheckMaxProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:
Please enter password for user[root]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:64 Nodes:3
Checking...               [=========================] 64/64
Start to analysis the check result
CheckClusterState...........................OK
The item run on 3 nodes.  success: 3
CheckDBParams...............................OK
.........................................................................
CheckMpprcFile..............................OK
The item run on 3 nodes.  success: 3

Analysis the check result successfully
Failed. All check items run completed. Total:64   Success:56   Warning:5   NG:3   Error:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_inspect_201902207129254785.tar.gz

Helpful Links

gs_checkos and gs_checkperf

Feedback
编组 3备份
    openGauss 2024-05-25 00:42:51
    cancel