Data Types

Numeric Types

Table 1 lists all available types.

Table 1 Integer types

Name

Description

Storage Space

Value Range

TINYINT

Tiny integer, also called INT1.

1 byte

0-255

SMALLINT

Small integer, also called INT2.

2 bytes

–32,768 to +32,767

INTEGER

Typical choice for integers, also called INT4.

4 bytes

–2,147,483,648 to +2,147,483,647

BINARY_INTEGER

Alias of INTEGER.

4 bytes

–2,147,483,648 to +2,147,483,647

BIGINT

Big integer, also called INT8.

8 bytes

–9,223,372,036,854,775,808 to +9,223,372,036,854,775,807

int16

A 16-byte certificate cannot be used to create tables.

16 bytes

–170,141,183,460,469,231,731,687,303,715,884,105,728 to +170,141,183,460,469,231,731,687,303,715,884,105,727

Table 2 Arbitrary precision types

Name

Description

Storage Space

Value Range

NUMERIC[(p[,s])],

DECIMAL[(p[,s])]

The value range of p (precision) is [1,1000], and the value range of s (scale) is [0,p].

NOTE:

p indicates the total digits, and s indicates the decimal digit.

The precision is specified by users. Every four decimal digits occupy two bytes, and an extra eight-byte overhead is added to the entire data.

Up to 131,072 digits before the decimal point and up to 16,383 digits after the decimal point when no precision is specified

NUMBER[(p[,s])]

Alias of the NUMERIC type.

The precision is specified by users. Every four decimal digits occupy two bytes, and an extra eight-byte overhead is added to the entire data.

Up to 131,072 digits before the decimal point and up to 16,383 digits after the decimal point when no precision is specified

Table 3 Serial integers

Name

Description

Storage Space

Value Range

SMALLSERIAL

2-byte serial integer.

2 bytes

–32,768 to +32,767

SERIAL

4-byte serial integer.

4 bytes

–2,147,483,648 to +2,147,483,647

BIGSERIAL

8-byte serial integer.

8 bytes

–9,223,372,036,854,775,808 to +9,223,372,036,854,775,807

LARGESERIAL

16-byte serial integer.

16 bytes

–170,141,183,460,469,231,731,687,303,715,884,105,728 to +170,141,183,460,469,231,731,687,303,715,884,105,727

Table 4 Floating point types

Name

Description

Storage Space

Value Range

REAL,

FLOAT4

Single precision floating points, which is not very precise.

4 bytes

–3.402E+38 to 3.402E+38, 6-bit decimal digits

DOUBLE PRECISION,

FLOAT8

Double precision floating points, which is not very precise.

8 bytes

–1.79E+308 to 1.79E+308, 15-bit decimal digits

FLOAT[(p)]

Floating points, which is not very precise. The value range of p (precision) is [1,53].

NOTE:

p is the precision, indicating the total decimal digits.

4 or 8 bytes

REAL or DOUBLE PRECISION is selected as an internal identifier based on precision (p). If no precision is specified, DOUBLE PRECISION is used as the internal identifier.

BINARY_DOUBLE

Alias of DOUBLE PRECISION.

8 bytes

–1.79E+308 to 1.79E+308, 15-bit decimal digits

DEC[(p[,s])]

The value range of p (precision) is [1,1000], and the value range of s (scale) is [0,p].

NOTE:

p indicates the total digits, and s indicates the decimal digit.

The precision is specified by users. Every four decimal digits occupy two bytes, and an extra eight-byte overhead is added to the entire data.

Up to 131,072 digits before the decimal point and up to 16,383 digits after the decimal point when no precision is specified

INTEGER[(p[,s])]

The value range of p (precision) is [1,1000], and the value range of s (scale) is [0,p].

The precision is specified by users. Every four decimal digits occupy two bytes, and an extra eight-byte overhead is added to the entire data.

N/A

Boolean Types

Table 5 Boolean types

Name

Description

Storage Space

Value

BOOLEAN

Boolean type.

1 byte

  • true
  • false
  • null (unknown)

Character Types

Table 6 lists the character data types supported by openGauss.

Table 6 Character types

Name

Description

Storage Space

CHAR(n)

CHARACTER(n)

NCHAR(n)

Fixed-length character string. Empty characters are filled in with blank spaces. n indicates the string length. If it is not specified, the default precision 1 is used.

The maximum size is 10 MB.

VARCHAR(n)

CHARACTER VARYING(n)

Variable-length string. n indicates the string length.

The maximum size is 10 MB.

VARCHAR2(n)

Variable-length string. It is the alias of the VARCHAR(n) type. n indicates the string length.

The maximum size is 10 MB.

NVARCHAR2(n)

Variable-length string. n indicates the string length.

The maximum size is 10 MB.

TEXT

Variable-length string.

The maximum size is 1 GB minus 1 byte. However, the size of the column description header and the size of the tuple (less than 1 GB minus 1 byte) where the column is located must also be considered. Therefore, the maximum size of the TEXT type may be less than 1 GB minus 1 byte.

CLOB

Big text object. It is the alias of the TEXT type.

The maximum size is 1 GB minus 1 byte. However, the size of the column description header and the size of the tuple (less than 1 GB minus 1 byte) where the column is located must also be considered. Therefore, the maximum size of the CLOB type may be less than 1 GB minus 1 byte.

NOTE:

  1. In addition to the restriction on the size of each column, the total size of each tuple cannot exceed 1 GB minus 1 byte and is affected by the control header information of the column, the control header information of the tuple, and whether null columns exist in the tuple.
  2. NCHAR is the alias of the bpchar type, and NCHAR(n) is the alias of the b(n) type bpchar(n).

In openGauss, there are two other fixed-length character types, as shown in Table 7. The name type exists only for the storage of identifiers in the internal system catalogs and is not intended for general users. Its length is currently defined as 64 bytes (63 usable characters plus a terminator). The type “char” only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.

Table 7 Special character types

Name

Description

Storage Space

name

Internal type for object names.

64 bytes

"char"

Single-byte internal type.

1 byte

Binary Types

Table 8 lists the binary data types supported by openGauss.

Table 8 Binary data types

Name

Description

Storage Space

BLOB

Binary large object (BLOB).

NOTE:

Column storage cannot be used for the BLOB type.

Its maximum length is 1073733621 bytes (1 GB – 8203 bytes).

RAW

Variable-length hexadecimal string.

NOTE:

Column store cannot be used for the raw type.

4 bytes plus the actual hexadecimal string. Its maximum length is 1073733621 bytes (1 GB – 8203 bytes).

BYTEA

Variable-length binary string.

4 bytes plus the actual binary string. Its maximum length is 1073733621 bytes (1 GB – 8203 bytes).

BYTEAWITHOUTORDERWITHEQUALCOL

Variable-length binary character string (new type for the encryption feature. If the encryption type of the encrypted column is specified as deterministic encryption, the column type is BYTEAWITHOUTORDERWITHEQUALCOL). The original data type is displayed when the encrypted table is printed by running the meta command.

4 bytes plus the actual binary string. The maximum value is 1073741771 bytes (1 GB – 53 bytes).

BYTEAWITHOUTORDERCOL

Variable-length binary character string (new type for the encryption feature. If the encryption type of the encrypted column is specified as random encryption, the column type is BYTEAWITHOUTORDERCOL). The original data type is displayed when the encrypted table is printed by running the meta command.

4 bytes plus the actual binary string. The maximum value is 1073741771 bytes (1 GB – 53 bytes).

_BYTEAWITHOUTORDERWITHEQUALCOL

Variable-length binary character string, which is a new type for the encryption feature.

4 bytes plus the actual binary string. The maximum value is 1073741771 bytes (1 GB – 53 bytes).

_BYTEAWITHOUTORDERCOL

Variable-length binary character string, which is a new type for the encryption feature.

4 bytes plus the actual binary string. The maximum value is 1073741771 bytes (1 GB – 53 bytes).

NOTE:

  • In addition to the size limitation on each column, the total size of each tuple is 1073733621 bytes (1 GB – 8203 bytes).
  • BYTEAWITHOUTORDERWITHEQUALCOL, BYTEAWITHOUTORDERCOL, _BYTEAWITHOUTORDERWITHEQUALCOL, and _BYTEAWITHOUTORDERCOL cannot be directly used to create a table.

Date/Time Types

Table 9 lists the date/time types supported by openGauss.

NOTE: If the time format of another database is different from that of openGauss, modify the value of the DateStyle parameter to keep them consistent.

Table 9 Date/Time types

Name

Description

Storage Space

DATE

Date and time.

4 bytes (The actual storage space is 8 bytes.)

TIME [(p)] [WITHOUT TIME ZONE]

Time within one day.

p indicates the precision after the decimal point. The value ranges from 0 to 6.

8 bytes

TIME [(p)] [WITH TIME ZONE]

Time within one day (with time zone).

p indicates the precision after the decimal point. The value ranges from 0 to 6.

12 bytes

TIMESTAMP[(p)] [WITHOUT TIME ZONE]

Date and time.

p indicates the precision after the decimal point. The value ranges from 0 to 6.

8 bytes

TIMESTAMP[(p)][WITH TIME ZONE]

Date and time (with time zone). TIMESTAMP is also called TIMESTAMPTZ.

p indicates the precision after the decimal point. The value ranges from 0 to 6.

8 bytes

SMALLDATETIME

Date and time (without time zone).

The precision is minute. A duration between 30s and 60s is rounded into 1 minute.

8 bytes

INTERVAL DAY (l) TO SECOND (p)

Time interval (X days X hours X minutes X seconds).

  • l: indicates the precision of days. The value ranges from 0 to 6. For compatibility, the precision functions are not supported.
  • p: indicates the precision of seconds. The value ranges from 0 to 6. The digit 0 at the end of a decimal number is not displayed.

16 bytes

INTERVAL [FIELDS] [ (p) ]

Time interval.

  • FIELDS: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, DAY TO HOUR, DAY TO MINUTE, DAY TO SECOND, HOUR TO MINUTE, HOUR TO SECOND, or MINUTE TO SECOND.
  • p: indicates the precision of seconds. The value ranges from 0 to 6. p takes effect only when FIELDS is SECOND, DAY TO SECOND, HOUR TO SECOND, or MINUTE TO SECOND. The digit 0 at the end of a decimal number is not displayed.

12 bytes

reltime

Relative time interval. The format is as follows:

X years X months X days XX:XX:XX

The Julian calendar is used. It specifies that a year has 365.25 days and a month has 30 days. The relative time interval needs to be calculated based on the input value. The output format is POSTGRES.

4 bytes

abstime

Date and time. The format is as follows:

YYYY-MM-DD hh:mm:ss+timezone

The value range is from 1901-12-13 20:45:53 GMT to 2038-01-18 23:59:59 GMT. The precision is second.

4 bytes

Geometric Types

Table 10 lists the geometric types that can be used in openGauss. The most basic type is point which forms the basis for all of the other types.

Table 10 Geometric types

Name

Storage Space

Description

Representation

point

16 bytes

Point on a plane

(x,y)

lseg

32 bytes

Finite line segment

((x1,y1),(x2,y2))

box

32 bytes

Box

((x1,y1),(x2,y2))

path

16 + 16n bytes

Closed path (similar to polygon)

((x1,y1),...)

path

16+16n bytes

Open path

[(x1,y1),...]

polygon

40+16n bytes

Polygon (similar to closed path)

((x1,y1),...)

circle

24 bytes

Circle

<(x,y),r> (center point and radius)

A rich set of functions and operators is available in openGauss to perform various geometric operations, such as scaling, translation, rotation, and determining intersections.

  • Points

    Points are the fundamental two-dimensional building block for geometric types. Values of the point type are specified using either of the following syntaxes:

    ( x , y )
    x , y
    

    x and y are the respective coordinates, as floating-point numbers.

    Points are output using the first syntax.

  • Line Segments

    Line segments (lseg) are represented by pairs of points. Values of the lseg type are specified using any of the following syntaxes:

    [ ( x1 , y1 ) , ( x2 , y2 ) ]
    ( ( x1 , y1 ) , ( x2 , y2 ) )
    ( x1 , y1 ) , ( x2 , y2 )
    x1 , y1   ,   x2 , y2
    

    (x1,y1) and (x2,y2) are the end points of the line segment.

    Line segments are output using the first syntax.

  • Boxes

    Boxes are represented by pairs of points that are opposite corners of the box. Values of the box type are specified using any of the following syntaxes:

    ( ( x1 , y1 ) , ( x2 , y2 ) )
    ( x1 , y1 ) , ( x2 , y2 )
    x1 , y1   ,   x2 , y2
    

    (x1,y1) and (x2,y2) are any two opposite corners of the box.

    Boxes are output using the second syntax.

    Any two opposite corners can be supplied on input, but in this order, the values will be reordered as needed to store the upper right and lower left corners.

  • Paths

    Paths are represented by lists of connected points. Paths can be open, where the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected.

    Values of the path type are specified using any of the following syntaxes:

    [ ( x1 , y1 ) , ... , ( xn , yn ) ]
    ( ( x1 , y1 ) , ... , ( xn , yn ) )
    ( x1 , y1 ) , ... , ( xn , yn )
    ( x1 , y1   , ... ,   xn , yn )
    x1 , y1   , ... ,   xn , yn
    

    The points are the end points of the line segments comprising the path. Square brackets ([]) indicate an open path, while parentheses (()) indicate a closed path. When the outermost parentheses are omitted, as in the third through fifth syntax, a closed path is assumed.

    Paths are output using the first or second syntax.

  • Polygons

    Polygons are represented by lists of points (the vertexes of the polygon). Polygons are very similar to closed paths, but are stored differently and have their own set of support functions.

    Values of the polygon type are specified using any of the following syntaxes:

    ( ( x1 , y1 ) , ... , ( xn , yn ) )
    ( x1 , y1 ) , ... , ( xn , yn )
    ( x1 , y1   , ... ,   xn , yn )
    x1 , y1   , ... ,   xn , yn
    

    The points are the end points of the line segments comprising the boundary of the polygon.

    Polygons are output using the first syntax.

  • Circles

    Circles are represented by a center point and radius. Values of the circle type are specified using any of the following syntaxes:

    < ( x , y ) , r >
    ( ( x , y ) , r )
    ( x , y ) , r
    x , y   , r
    

    (x,y) is the center point and r is the radius of the circle.

    Circles are output using the first syntax.

Network Address Types

openGauss offers data types to store IPv4, IPv6, and MAC addresses.

It is better to use these types instead of plain text types to store network addresses, because these types offer input error check as well as specialized operators and functions.

Table 11 Network address types

Name

Storage Space

Description

cidr

7 or 19 bytes

IPv4 or IPv6 networks

inet

7 or 19 bytes

IPv4 or IPv6 hosts and networks

macaddr

6 bytes

MAC address

When sorting inet or cidr data types, IPv4 addresses will always sort before IPv6 addresses, including IPv4 addresses encapsulated in or mapped to IPv6 addresses, such as ::10.2.3.4 or ::ffff:10.4.3.2.

  • cidr

    The cidr type (Classless Inter-Domain Routing) holds an IPv4 or IPv6 network address. The format for specifying networks is address/y where address is the network represented as an IPv4 or IPv6 address, and y is the number of bits in the netmask. If y is omitted, it is calculated using assumptions from the older classful network numbering system, except it will be at least large enough to include all of the bytes written in the input.

    Table 12 cidr type input examples

    cidr Input

    cidr Output

    abbrev(cidr)

    192.168.100.128/25

    192.168.100.128/25

    192.168.100.128/25

    192.168/24

    192.168.0.0/24

    192.168.0/24

    192.168/25

    192.168.0.0/25

    192.168.0.0/25

    192.168.1

    192.168.1.0/24

    192.168.1/24

    192.168

    192.168.0.0/24

    192.168.0/24

    10.1.2

    10.1.2.0/24

    10.1.2/24

    10.1

    10.1.0.0/16

    10.1/16

    10

    10.0.0.0/8

    10/8

    10.1.2.3/32

    10.1.2.3/32

    10.1.2.3/32

    2001:4f8:3:ba::/64

    2001:4f8:3:ba::/64

    2001:4f8:3:ba::/64

    2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128

    2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128

    2001:4f8:3:ba:2e0:81ff:fe22:d1f1

    ::ffff:1.2.3.0/120

    ::ffff:1.2.3.0/120

    ::ffff:1.2.3/120

    ::ffff:1.2.3.0/128

    ::ffff:1.2.3.0/128

    ::ffff:1.2.3.0/128

  • inet

    The inet type holds an IPv4 or IPv6 host address, and optionally its subnet, all in one field. The subnet is represented by the number of network address bits present in the host address (the “netmask”). If the netmask is 32 and the address is an IPv4 address, then the value does not indicate a subnet but a single host. In IPv6, because the address length is 128 bits, 128 bits specify a unique host address.

    The input format for this type is address/y where address is an IPv4 or IPv6 address and y is the number of bits in the netmask. If the /y portion is omitted, the netmask is 32 for an IPv4 address and 128 for an IPv6 address, and the value represents just a single host. On display, the /y portion is suppressed if the netmask specifies a single host.

    The essential difference between the inet and cidr data types is that inet accepts values with nonzero bits to the right of the netmask, whereas cidr does not.

  • macaddr

    The macaddr type stores MAC addresses, known for example from Ethernet card hardware addresses (although MAC addresses are used for other purposes as well). Input is accepted in the following formats:

    '08:00:2b:01:02:03'
    '08-00-2b-01-02-03'
    '08002b:010203'
    '08002b-010203'
    '0800.2b01.0203'
    '08002b010203'
    

    These examples specify the same address. Upper and lower cases are accepted for the digits a through f. Output is always in the first of the forms shown.

Bit String Types

Bit strings are strings of 1's and 0's. They can be used to store bit masks.

openGauss supports two bit string types: bit(n) and bit varying(n), in which n is a positive integer.

The data of the bit type must match the length n exactly. An error will be reported if shorter or longer bit strings are stored. The data of the bit varying type is of variable length up to the maximum length n. Longer strings will be rejected. Writing bit without a length is equivalent to bit(1), while bit varying without a length limit means unlimited length.

NOTE: If one explicitly casts a bit-string value to bit(n), it will be truncated or zero-padded on the right to be exactly n bits, without raising an error. Similarly, if one explicitly casts a bit-string value to bit varying(n), it will be truncated on the right if it has more than n bits.

-- Create a table.
openGauss=# CREATE TABLE bit_type_t1 
(
    BT_COL1 INTEGER,
    BT_COL2 BIT(3),
    BT_COL3 BIT VARYING(5)
) ;

-- Insert data.
openGauss=# INSERT INTO bit_type_t1 VALUES(1, B'101', B'00');

-- Specify the type length. An error is reported if an inserted string exceeds this length.
openGauss=# INSERT INTO bit_type_t1 VALUES(2, B'10', B'101');
ERROR:  bit string length 2 does not match type bit(3)
CONTEXT:  referenced column: bt_col2

-- Specify the type length. Data is converted if it exceeds this length.
openGauss=# INSERT INTO bit_type_t1 VALUES(2, B'10'::bit(3), B'101');

-- View data.
openGauss=# SELECT * FROM bit_type_t1;
 bt_col1 | bt_col2 | bt_col3 
---------+---------+---------
       1 | 101     | 00
       2 | 100     | 101
(2 rows)

-- Delete the table.
openGauss=# DROP TABLE bit_type_t1;

Text Search Types

openGauss offers two data types that are designed to support full text search. The tsvector type represents a document in a form optimized for text search. The tsquery type similarly represents a text query.

  • tsvector

    The tsvector type represents a retrieval unit, usually a textual column within a row of a database table, or a combination of such columns. A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word. Sorting and deduplication are done automatically during input. The to_tsvector function is used to parse and normalize a document string.

    A tsvector value is a sorted list of distinct lexemes, which are words that have been formatted different entries. During segmentation, tsvector automatically performs duplicate-elimination to the entries for input in a certain order. Example:

    openGauss=# SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
                          tsvector                      
    ----------------------------------------------------
     'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
    (1 row)
    

    It can be seen from the preceding example that tsvector segments a string by spaces, and segmented lexemes are sorted based on their length and alphabetical order. To represent lexemes containing whitespaces or punctuations, surround them with quotation marks:

    openGauss=# SELECT $$the lexeme '    ' contains spaces$$::tsvector;
                     tsvector                  
    -------------------------------------------
     '    ' 'contains' 'lexeme' 'spaces' 'the'
    (1 row)
    

    Use double dollar signs ($$) to mark entries containing single quotation marks ('').

    openGauss=# SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector;
                        tsvector                    
    ------------------------------------------------
     'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'
    (1 row)
    

    Optionally, integer positions can be attached to lexemes:

    openGauss=# SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
                                       tsvector                                    
    -------------------------------------------------------------------------------
     'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4
    (1 row)
    

    A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values range from 1 to 16383. The maximum value is 16383. Duplicate positions for the same lexeme are discarded.

    Lexemes that have positions can further be labeled with a weight, which can be A, B, C, or D. D is the default and therefore is not shown in output.

    openGauss=# SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
              tsvector          
    ----------------------------
     'a':1A 'cat':5 'fat':2B,4C
    (1 row)
    

    Weights are typically used to reflect the document structure, for example, by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.

    The following example is the standard usage of the tsvector type. Example:

    openGauss=# SELECT 'The Fat Rats'::tsvector;
          tsvector      
    --------------------
     'Fat' 'Rats' 'The'
    (1 row)
    

    For most English-text-searching applications, the above words would be considered non-normalized, which should usually be passed through to_tsvector to normalize the words appropriately for searching:

    openGauss=# SELECT to_tsvector('english', 'The Fat Rats');
       to_tsvector   
    -----------------
     'fat':2 'rat':3
    (1 row)
    
  • tsquery

    The tsquery type represents a retrieval condition. A tsquery value stores lexemes that are to be searched for, and combines them honoring the Boolean operators & (AND), | (OR), and ! (NOT). Parentheses can be used to enforce grouping of the operators. The to_tsquery and plainto_tsquery functions will normalize lexemes before the lexemes are converted to the tsquery type.

    openGauss=# SELECT 'fat & rat'::tsquery;
        tsquery    
    ---------------
     'fat' & 'rat'
    (1 row)
    
    openGauss=# SELECT 'fat & (rat | cat)'::tsquery;
              tsquery          
    ---------------------------
     'fat' & ( 'rat' | 'cat' )
    (1 row)
    
    openGauss=# SELECT 'fat & rat & ! cat'::tsquery;
            tsquery         
    ------------------------
     'fat' & 'rat' & !'cat'
    (1 row)
    

    In the absence of parentheses, ! (NOT) binds most tightly, and & (AND) binds more tightly than | (OR).

    Lexemes in a tsquery can be labeled with one or more weight letters, which restrict them to match only tsvector lexemes with matching weights:

    openGauss=# SELECT 'fat:ab & cat'::tsquery;
         tsquery      
    ------------------
     'fat':AB & 'cat'
    (1 row)
    

    Also, lexemes in a tsquery can be labeled with asterisks (*) to specify prefix matching:

    openGauss=# SELECT 'super:*'::tsquery;
      tsquery  
    -----------
     'super':*
    (1 row)
    

    This query will match any word in a tsvector that begins with “super”.

    Note that prefixes are first processed by text search configurations, which means that the following example returns true:

    openGauss=# SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ) AS RESULT;
      result  
    ----------
     t
    (1 row)
    

    This is because postgres gets stemmed to postgr.

    openGauss=# SELECT to_tsquery('postgres:*');
     to_tsquery 
    ------------
     'postgr':*
    (1 row)
    

    It then matches postgraduate.

    'Fat:ab & Cats' is normalized to the tsquery type as follows:

    openGauss=# SELECT to_tsquery('Fat:ab & Cats');
        to_tsquery    
    ------------------
     'fat':AB & 'cat'
    (1 row)
    

Universally Unique Identifier (UUID) Types

This data type stores universally unique identifiers defined by RFC 4122, ISO/IEF 9834-8:2005 and related standards. This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm.

A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:

a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11

openGauss also accepts the following alternative forms for input: use of upper-case letters and digits, standard format surrounded by braces, omitting some or all hyphens, and adding a hyphen after any group of four digits. Example:

A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11
a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11

Output is always in the standard form.

JSON/JSONB Types

JavaScript Object Notation (JSON) data can be a single scalar, an array, or a key-value pair object. The array and object can be called a container:

  • Scalar: a number, Boolean, string, or null
  • Array: defined in a pair of square brackets ([]), in which elements can be any type of JSON data, and are not necessarily of the same type.
  • Object: defined in a pair of braces ({}), in which objects are stored in the format of key:value. Each key must be a string enclosed by a pair of double quotation marks (""), and its value can be any type of JSON data. In case of duplicate keys, the last key-value pair will be used.

openGauss supports two types JSON and JSONB to store JSON data. JSON is a complete copy of the entered character string and is parsed when used. The entered spaces, duplicate keys, and sequence are retained. JSONB parses the input binary data. During parsing, JSONB deletes semantic-irrelevant details and duplicate keys, and sorts key-values. Therefore, JSONB does not need to parse the binary data again when it is used.

So you can see that both JSON and JSONB are of JSON data type, and the same strings can be entered as input. The main difference between JSON and JSONB is the efficiency. Because JSON data is an exact copy of the input text, the data must be parsed on every execution; in contrast, JSONB data is stored in a decomposed binary form and can be processed faster, though this makes it slightly slower in input due to the conversion mechanism. In addition, because the JSONB data form is unified, it better supports more powerful functions, for example, comparing sizes according to a specific rule. JSONB also supports indexing, which is a significant advantage.

  • Input format

    An input must be a JSON-compliant string, which is enclosed in single quotation marks ('').

    Null (null-json): Only null is supported, and all letters are in lowercase.

    select 'null'::json;   -- suc
    select 'NULL'::jsonb;  -- err
    

    Number (num-json): The value can be a positive or negative integer, decimal fraction, or 0. The scientific notation is supported.

    select '1'::json;
    select '-1.5'::json;
    select '-1.5e-5'::jsonb, '-1.5e+2'::jsonb;
    select '001'::json, '+15'::json, 'NaN'::json;  
    -- Redundant leading zeros, positive plus signs (+), NaN, and infinity are not supported.
    

    Boolean (bool-json): The value can only be true or false in lowercase.

    select 'true'::json;
    select 'false'::jsonb;  
    

    String (str-json): The value must be a string enclosed in double quotation marks ("").

    select '"a"'::json;
    select '"abc"'::jsonb;  
    

    Array (array-json): Arrays are enclosed in square brackets ([]). Elements in the array can be any valid JSON data, and are unnecessarily of the same type.

    select '[1, 2, "foo", null]'::json;
    select '[]'::json;
    select '[1, 2, "foo", null, [[]], {}]'::jsonb; 
    

    Object (object-json): The value is enclosed in braces ({}). The key must be a JSON-compliant string, and the value can be any valid JSON string.

    select '{}'::json;
    select '{"a": 1, "b": {"a": 2,  "b": null}}'::json;
    select '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::jsonb;  
    

    CAUTION:

    • Note that 'null'::json and null::json are different, which are similar to the strings str="" and str=null.
    • For numbers, when scientific notation is used, JSONB expands them, while JSON stores an exact copy of the input text.
  • JSONB advanced features

    • Precautions
      • It does not support row-store tables.
      • It cannot be used as a partition key.
      • It does not support foreign tables and MOTs.

    The main difference between JSON and JSONB lies in the storage mode. JSONB stores parsed binary data, which reflects the JSON hierarchy and facilitates direct access. Therefore, JSONB has many advanced features that JSON does not have.

    • Format normalization

      • After the input object-json string is parsed into JSONB binary data, semantically irrelevant details are naturally discarded, for example, spaces.

        openGauss=# select '   [1, " a ", {"a"   :1    }]  '::jsonb;
                jsonb
        ----------------------
         [1, " a ", {"a": 1}]
        (1 row)
        
      • For object-json, duplicate key-values are deleted and only the last key-value is retained. For example:

        openGauss=# select '{"a" : 1, "a" : 2}'::jsonb;
          jsonb
        ----------
         {"a": 2}
        (1 row)
        
      • For object-json, key-values will be re-sorted. The sorting rule is as follows: 1. Longer key-values are sorted last. 2. If the key-values are of the same length, the key-values with a larger ASCII code are sorted after the key-values with a smaller ASCII code:

        openGauss=# select '{"aa" : 1, "b" : 2, "a" : 3}'::jsonb;
                   jsonb
        ---------------------------
         {"a": 3, "b": 2, "aa": 1}
        (1 row)
        
    • Size comparison

      Format normalization ensures that only one form of JSONB data exists in the same semantics. Therefore, sizes may be compared according to a specific rule.

      • First, compare the types: object-jsonb > array-jsonb > bool-jsonb > num-jsonb > str-jsonb > null-jsonb.

      • Compare the content if data types are the same:

        • str-json: The default text sorting rule of the database is used for comparison. A positive value indicates greater than, a negative value indicates less than, and 0 indicates equal.
        • num-json: numeric comparison
        • bool-json: true > false
        • array-jsonb: long elements > short elements. If the lengths are the same, compare each element in sequence.
        • object-jsonb: If the length of a key-value pair is longer than that of a short key-value pair, the key is compared first, and then the value is compared.

        CAUTION: For comparison within the object-jsonb type, the final result after format sorting is used for comparison. Therefore, the comparison result may not be intuitive compared with the direct input.

    • Creating indexes, primary keys, and foreign keys

      • B-tree index

        B-tree indexes, primary keys, and foreign keys can be created for the JSONB type.

      • GIN index

        GIN indexes can be used to effectively search for keys or key-value pairs that appear in a large number of JSONB documents (datums). Two GIN operator classes (jsonb_ops and jsonb_hash_ops) are provided for different performance and flexibility choices. The default GIN operator class supports @>, <@, ?, ?& and ?| operator query. The non-default GIN operator class jsonb_path_ops supports only the @> and <@ operators.

    • Containment and existence

      Querying whether a JSON contains some elements or whether some elements exist in a JSON is an important capability of JSONB.

      -- Simple scalar/primitive values contain only the identical value:
      SELECT '"foo"'::jsonb @> '"foo"'::jsonb;
      -- The array on the left contains the character string on the right.
      SELECT '[1, "aa", 3]'::jsonb ? 'aa';
      -- The array on the left contains all elements of the array on the right, regardless of the sequence and repetition.
      SELECT '[1, 2, 3]'::jsonb @> '[1, 3, 1]'::jsonb;
      -- The object-json on the left contains all key-value pairs of object-json on the right.
      SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb":true}'::jsonb @> '{"version":9.4}'::jsonb;
      -- The array on the left does not contain all elements of the array on the right. The three elements on the left are 1, 2, and [1,3], but the elements on the right are 1 and 3.
      SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb;  --false
      -- Similarly, no containment relationship exists and false is returned.
      SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- false
      

HyperLoglog (HLL) Types

HLL is an approximation algorithm for efficiently counting the number of distinct values in a dataset. It features faster computing and lower space usage. You only need to store HLL data structures, instead of data sets. When new data is added to a dataset, make hash calculation on the data and insert the result to an HLL. Then, you can obtain the final result based on the HLL.

Table 13 compares HLL with other algorithms.

Table 13 Comparison between HLL and other algorithms

Item

Sorting Algorithm

Hash Algorithm

HLL

Time complexity

O(nlogn)

O(n)

O(n)

Space complexity

O(n)

O(n)

log(logn)

Error rate

0

0

≈0.8%

Storage space requirement

Size of original data

Size of original data

The maximum size is 16 KB by default.

HLL has advantages over others in the computing speed and storage space requirement. In terms of time complexity, the sorting algorithm needs O(nlogn) time for sorting, and the hash algorithm and HLL need O(n) time for full table scanning. In terms of storage space requirements, the sorting algorithm and hash algorithm need to store raw data before collecting statistics, whereas the HLL algorithm needs to store only the HLL data structures rather than the raw data, thereby occupying a fixed space of about 16 KB.

NOTICE:

  • In the current default specifications, the maximum number of distinct values that can be calculated is about 1.1e + 15, and the error rate is 0.8%. If the calculation result exceeds the maximum, the error rate of the calculation result will increase, or the calculation will fail and an error will be reported.
  • When using this feature for the first time, you need to evaluate the distinct values of the service, properly select configuration parameters, and perform verification to ensure that the accuracy meets requirements.
  • By default, the distinct value is 1.1e + 15. If the distinct value is NaN, you need to adjust log2m or use another algorithm to calculate the distinct value.
  • The hash algorithm has an extremely low probability of collision. However, you are still advised to select 2 or 3 hash seeds for verification when using the hash algorithm for the first time. If there is only a small difference between the distinct values, you can select any one of the seeds as the hash seed.

Table 14 describes main HLL data structures.

Table 14 Main HLL data structures

Data Type

Function

hll

The HLL header is a 27-byte field. By default, the data length ranges from 0 KB to 16 KB. The distinct value can be obtained.

When you create an HLL data type, 0 to 4 input parameters are supported. The parameter meanings and specifications are the same as those of the hll_empty function. The first parameter is log2m, indicating the logarithm of the number of buckets, and its value ranges from 10 to 16. The second parameter is log2explicit, indicating the threshold in explicit mode, and its value ranges from 0 to 12. The third parameter is log2sparse, indicating the threshold of the Sparse mode, and its value ranges from 0 to 14. The fourth parameter is duplicatecheck, indicating whether to enable duplicate check, and its value ranges from 0 to 1. When the input parameter is set to –1, the default value of the HLL parameter is used. You can run the \d or \d+ command to view the parameters of the HLL type.

NOTE: When the HLL data type is created, the result varies depending on the input parameter behavior:

  • When creating an HLL type, do not set the input parameter or set it to –1. Use the default value of the corresponding HLL parameter.
  • If a valid value is set for the input parameter, the corresponding HLL parameter uses the input value.
  • If the input value is invalid, an error is reported when the HLL type is created.
-- Create an HLL table without specifying input parameters.
openGauss=# create table t1 (id integer, set hll);
openGauss=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 set    | hll     |

-- Create an HLL table, specify the first two input parameters, and use the default values for the last two input parameters.
openGauss=# create table t2 (id integer, set hll(12,4));
openGauss=# \d t2
          Table "public.t2"
 Column |      Type      | Modifiers
--------+----------------+-----------
 id     | integer        |
 set    | hll(12,4,12,0) |

-- Create an HLL table, specify the third input parameter, and use default values for other parameters.
openGauss=# create table t3(id int, set hll(-1,-1,8,-1));
openGauss=# \d t3
          Table "public.t3"
 Column |      Type      | Modifiers
--------+----------------+-----------
 id     | integer        |
 set    | hll(14,10,8,0) |

-- When a user creates an HLL table and specifies an invalid input parameter, an error is reported.
openGauss=# create table t4(id int, set hll(5,-1));
ERROR:  log2m = 5 is out of range, it should be in range 10 to 16, or set -1 as default

NOTE: When inserting an HLL object to an HLL table, ensure that the parameters of the HLL type are the same as those of the inserted object. Otherwise, an error is reported.

-- Create an HLL table.
openGauss=# create table t1(id integer, set hll(14));
 
-- Insert an HLL object to a table. The insertion succeeds because parameter types are consistent.
openGauss=# insert into t1 values (1, hll_empty(14,-1));

-- Insert an HLL object to a table. The insertion fails because parameter types are inconsistent.
openGauss=# insert into t1(id, set) values (1, hll_empty(14,5));
ERROR:  log2explicit does not match: source is 5 and dest is 10

The following describes HLL application scenarios.

  • Scenario 1: “Hello World”

    The following example shows how to use the HLL data type:

    -- Create a table with the HLL type.
    openGauss=# create table helloworld (id integer, set hll);
    
    -- Insert an empty HLL to the table.
    openGauss=# insert into helloworld(id, set) values (1, hll_empty());
    
    -- Add a hashed integer to the HLL.
    openGauss=# update helloworld set set = hll_add(set, hll_hash_integer(12345)) where id = 1;
    
    -- Add a hashed string to the HLL.
    openGauss=# update helloworld set set = hll_add(set, hll_hash_text('hello world')) where id = 1;
    
    -- Obtain the number of distinct values of the HLL.
    openGauss=# select hll_cardinality(set) from helloworld where id = 1;
     hll_cardinality 
    -----------------
                   2
    (1 row)
    
    -- Delete the table.
    openGauss=#  drop table helloworld;
    
  • Scenario 2: Collect statistics about website visitors.

    The following example shows how an HLL collects statistics on the number of users visiting a website within a period of time:

    -- Create a raw data table to show that a user has visited the website at a certain time.
    openGauss=# create table facts (
             date            date,
             user_id         integer
    );
    
    -- Create a raw data table to show that a user has visited the website at a certain time.
    openGauss=# insert into facts values ('2019-02-20', generate_series(1,100));
    openGauss=# insert into facts values ('2019-02-21', generate_series(1,200));
    openGauss=# insert into facts values ('2019-02-22', generate_series(1,300));
    openGauss=# insert into facts values ('2019-02-23', generate_series(1,400));
    openGauss=# insert into facts values ('2019-02-24', generate_series(1,500));
    openGauss=# insert into facts values ('2019-02-25', generate_series(1,600));
    openGauss=# insert into facts values ('2019-02-26', generate_series(1,700));
    openGauss=# insert into facts values ('2019-02-27', generate_series(1,800));
    
    -- Create another table and specify an HLL column.
    openGauss=# create table daily_uniques (
        date            date UNIQUE,
        users           hll
    );
    
    -- Group data by date and insert the data into the HLL.
    openGauss=# insert into daily_uniques(date, users)
        select date, hll_add_agg(hll_hash_integer(user_id))
        from facts
        group by 1;
    
    -- Calculate the numbers of users visiting the website every day.
    openGauss=# select date, hll_cardinality(users) from daily_uniques order by date;
        date    | hll_cardinality
    ------------+------------------
     2019-02-20 |              100
     2019-02-21 | 200.217913059312
     2019-02-22 |  301.76494508014
     2019-02-23 | 400.862858326446
     2019-02-24 | 502.626933349694
     2019-02-25 | 601.922606454213
     2019-02-26 | 696.602316769498
     2019-02-27 | 798.111731634412
    (8 rows)
    
    -- Calculate the number of users who had visited the website in the week from February 20, 2019 to February 26, 2019.
    openGauss=# select hll_cardinality(hll_union_agg(users)) from daily_uniques where date >= '2019-02-20'::date and date <= '2019-02-26'::date;
     hll_cardinality  
    ------------------
     696.602316769498
    (1 row)
    
    -- Calculate the number of users who visited the website yesterday but have not visited the website today.
    openGauss=# SELECT date, (#hll_union_agg(users) OVER two_days) - #users AS lost_uniques FROM daily_uniques WINDOW two_days AS (ORDER BY date ASC ROWS 1 PRECEDING);                                                                                                             
        date    | lost_uniques
    ------------+--------------
     2019-02-20 |            0
     2019-02-21 |            0
     2019-02-22 |            0
     2019-02-23 |            0
     2019-02-24 |            0
     2019-02-25 |            0
     2019-02-26 |            0
     2019-02-27 |            0
    (8 rows)
    
    -- Delete the table.
    openGauss=# drop table facts;
    openGauss=# drop table daily_uniques;
    
  • Scenario 3: The data to be inserted does not meet the requirements of the HLL data structure.

    When inserting data into a column of the HLL type, ensure that the data meets the requirements of the HLL data structure. If the data does not meet the requirements after being parsed, an error will be reported. In the following example, E\\1234 to be inserted does not meet the requirements of the HLL data structure after being parsed. As a result, an error is reported.

    openGauss=# create table test(id integer, set hll);
    openGauss=# insert into test values(1, 'E\\1234');
    ERROR:  not a hll type, size=6 is not enough
    openGauss=# drop table test;
    

Range Types

A range type is a data type that represents the range of a value of an element type (called the subtype of a range). For example, the range of timestamp may be used to express a time range in which a conference room is reserved. In this case, the data type is tsrange and timestamp is the subtype. The subtype must have an overall order so that the element value can be clearly specified within, before, or after a range.

Range types are useful because they can express multiple element values in a single range value and can clearly express concepts such as range overlapping. The time and date range used for scheduling is a good example, and the price range and the range of an instrument are also examples of range type.

  • Built-in ranges

    The following built-in ranges are available:

    • int4range: integer range
    • int8range: bigint range
    • numrange: numeric range
    • tsrange: range of timestamp without the time zone
    • tstzrange: range of timestamp with the time zone
    • daterange: date range
  • Including and excluding bounds

    Each non-empty range has two bounds, a lower bound and an upper bound. All values between the upper and lower bounds are included in the range. An inclusion bound means that the bound value itself is included in the range, while an exclusion bound means that the bound value is not included in the range.

    In a textual form of a range, the inclusion lower bound is expressed as “[” and an exclusion lower bound is expressed as “(”. Similarly, one containing the upper bound is expressed as “]” and one excluding the upper bound is expressed as “)”.

    The lower_inc and upper_inc functions test the upper and lower bounds of a range value, respectively.

  • Infinite (Unbounded) range

    When the lower bound of a range is unbounded, it means that all values less than the upper bound are included in the range, for example, (,3], indicating all values less than the upper bound 3 are included in the range. Similarly, when the upper bound of a range is unbounded, all values greater than the upper bound are included in the range. When both the upper and lower bounds are unbounded, all values of the element type are considered within the range. The missing bounds are automatically converted to exclusions, for example, [,] is converted to (,). You can consider these missing values as positive infinity or negative infinity, but they are special range type values and are considered to be positive and negative infinity values that go beyond any range element type.

    Element types with the infinity values can be used as explicit bound values. For example, in the timestamp range, [today, infinity) does not include a special timestamp value infinity.

    The** lower_inf** and upper_inf functions test the infinite upper and lower bounds of a range, respectively.

  • Range input/output

    The input of a range value must follow one of the following formats:

    (lower-bound, upper-bound)
    (lower-bound, upper-bound]
    [lower-bound, upper-bound)
    [lower-bound, upper-bound]
    Empty
    

    Parentheses () or square brackets [] indicate whether the upper and lower bounds are excluded or included. Note that the last format is empty, which represents an empty range (a range that does not contain values).

    The value of lower-bound can be a valid input character string of the subtype or null, indicating that there is no lower bound. Similarly, upper-bound can be a valid input character string of the subtype or null, indicating that there is no upper bound.

    Each bound value can be referenced using the quotation marks(""). This is necessary if the bounds value contains parentheses (), square brackets [], commas (,), quotation marks (""), or backslashes (\), because otherwise those characters will be considered part of the range syntax. To put the quotation mark or backslash in a referenced bound value, put a backslash in front of it (and a pair of double quotation marks in its referenced bound value represents one quotation mark, which is similar to the single quotation mark rule in SQL character strings). In addition, you can avoid referencing and use backslash escapes to protect all data characters; otherwise they will be used as part of the return syntax. Also, if you want to write a bound value that is an empty string, write "", indicating infinite bounds.

    Spaces are allowed before and after a range value, but any space between parentheses() or square brackets[] is used as part of the upper or lower bound value (depending on the element type, the space may or may not represent a value).

    Examples:

    -- 3 is included, 7 is not included, and all values between 3 and 7 are included.
    SELECT '[3,7)'::int4range;  
    -- Neither 3 nor 7 is included, but all values between them are included.
    SELECT '(3,7)'::int4range;  
    -- Only 4 is included.
    SELECT '[4,4]'::int4range;  
    -- Exclude any value (and will be normalized to empty)
    SELECT '[4,4)'::int4range; 
    
  • Constructing range

    Each range type has a constructor function with the same name. Using constructor functions is often more convenient than writing a range literal constant because it avoids extra references to bound values. Constructor functions accept two or three parameters. Two parameters form a range in the standard form, where the lower bound is included and the upper bound is excluded, and three parameters form a range according to the bound specified by the third parameter. The third parameter must be one of the following character strings: (), (], [], or []. For example:

    -- The complete form is: lower bound, upper bound, and textual parameters indicating the inclusion/exclusion of bounds.
    SELECT numrange(1.0, 14.0, '(]');  
    -- If the third parameter is omitted, it is assumed to be '[)'.
    SELECT numrange(1.0, 14.0);  
    -- Although '(]' is specified here, the value will be converted to the standard format when displayed, because int8range is a discrete range type (see below).
    SELECT int8range(1, 14, '(]');  
    -- Using NULL for a bound causes the range to be unbounded on that side.
    SELECT numrange(NULL, 2.2); 
    
  • Discrete range

    A range element type has a well-defined “step” such as integer or date. In these types, if there is no valid value between two elements, they can be said to be adjacent. This is in contrast to a continuous range in which other element values can always be identified between two given values. For example, a range above the numeric type is continuous, and the range of timestamp is also continuous. (Although timestamp has limited precision and can be considered as discrete in theory, it is better to consider it as continuous because the step is not normally considered.)

    Another way to consider discrete range types is to have a clear “next” or “previous” value for each element value. With this idea in mind, you can switch between inclusion and exclusion expressions of a range bound by replacing it with the original given next or previous element value. For example, in the integer range type, [4,8] and (3,9) represent the same set of values, but not for numeric ranges.

    A discrete range type should have a regularization function that knows the expected step size of the element type. The regularization function can convert the equivalents of the range type to the same expression, in particular consistent with the inclusion or exclusion bounds. If you do not specify a regularization function, ranges with different formats will always be considered as unequal, even if they actually express the same set of values.

    The built-in range types int4range, int8range, and daterange use a regularized form that includes the lower bound and excludes the upper bound, that is, [). However, user-defined range types can use other conventions.

  • Defining a new nange

    Users can define their own range types. A common reason is to use the range on the subtype that is not provided in the built-in range type. For example, to create the range type subtype float8, run the following command:

    CREATE TYPE floatrange AS RANGE (
         subtype = float8,
         subtype_diff = float8mi 
    );
    SELECT '[1.234, 5.678]'::floatrange; 
    

    Because float8 does not have a meaningful “step”, we did not define a regularization function in this example.

    Defining your own range type also allows you to specify a different subtype B-tree operator class or collection to change the sort order to determine which values fall within the given range.

    If the subtype is considered to have a discrete value instead of a continuous value, the CREATE TYPE command should specify a canonical function. The regularization function receives an input range value and must return an equivalent range value that may have different bounds and formats. For two ranges, for example, [1, 7] and [1, 8) that represent the same value set, the output must be the same. There is no relationship between choosing which expression to use as the regularization function, as long as two values of equal values in different formats can always be mapped to the same value in the same format. In addition to adjusting the inclusion/exclusion bound format, if the expected compensation is larger than the subtype can store, a regularization function may round the bound value. For example, a range type above a timestamp might be defined as having a one-hour epoch, so the regularization function might need to round off bounds that are not multiples of an hour, or might throw an error directly.

    In addition, any range type intended to be used with a GiST or SP-GiST index should define a subtype difference or subtype_diff function (the index can still work without subtype_diff, but may be less efficient than when a difference function is provided). The subtype difference function uses two subtype input values and returns a difference expressed as a float8 value (X minus Y). In the example above, we can use functions under the regular float8 subtraction operator. However, for any other subtype, some type conversion may be required. There may also be a need for innovative ideas on how to express differences as numbers. For maximum extensibility, the subtype_diff function should agree with the sort order of the selected operator class and sort rules. That is, if the first parameter of the sort order is greater than the second parameter, the result should be a positive value.

    The following is an example of a less simplified subtype_diff function:

    CREATE FUNCTION time_subtype_diff(x time, y time) RETURNS float8 AS 'SELECT EXTRACT(EPOCH FROM (x - y))' LANGUAGE sql STRICT IMMUTABLE;  
    CREATE TYPE timerange AS RANGE (     
         subtype = time,     
         subtype_diff = time_subtype_diff
     );  
    SELECT '[11:10, 23:00]'::timerange; 
    

Indexes

GiST and SP-GiST indexes can be created for table columns of the range type. For example, to create a GiST index, run the following command:

CREATE INDEX reservation_idx ON reservation USING GIST (during); 

A GiST or SP-GiST index can accelerate queries involving the following range operators: =, &&, <@, @>, «, >>, -|-, &<, and &>.

In addition, the B-tree and hash index can be created on table columns of the range type. For these index types, basically the only useful range operation is equivalence. Using the corresponding < and > operators, there is a B-tree sort oder for range value definitions, but that order is fairly arbitrary and is often less useful in the reality. The B-tree and hash support for range types is primarily designed to allow sorting and hashing within a query, rather than creating an index.

OID Types

OIDs are used internally by openGauss as primary keys for various system catalogs. OIDs are not added to user-created tables by the system. The OID type represents an object identifier.

The OID type is currently implemented as an unsigned four-byte integer. So, using a user-created table's OID column as a primary key is discouraged.

Table 15 OID types

Name

Reference

Description

Example

OID

N/A

Numeric object identifier

564182

CID

N/A

Command identifier. This is the data type of the system columns cmin and cmax. Command identifiers are 32-bit quantities.

N/A

XID

N/A

Transaction identifier. This is the data type of the system columns xmin and xmax. Transaction identifiers are also 64-bit quantities.

N/A

TID

N/A

Row identifier. This is the data type of the system column ctid. A row ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.

N/A

REGCONFIG

pg_ts_config

Text search configuration

english

REGDICTIONARY

pg_ts_dict

Text search dictionary

simple

REGOPER

pg_operator

Operator name

N/A

REGOPERATOR

pg_operator

Operator with parameter types

*(integer,integer) or -(NONE,integer)

REGPROC

pg_proc

Function name

sum

REGPROCEDURE

pg_proc

Function with parameter types

sum(int4)

REGCLASS

pg_class

Relationship name

pg_type

REGTYPE

pg_type

Data type name

integer

The OID type is used for a column in the database system catalog.

Example:

openGauss=# SELECT oid FROM pg_class WHERE relname = 'pg_type';
 oid  
------
 1247
(1 row)

The alias type of the OID is REGCLASS which allows simplified search for OID values.

Example:

openGauss=# SELECT attrelid,attname,atttypid,attstattarget FROM pg_attribute WHERE attrelid = 'pg_type'::REGCLASS;
 attrelid |  attname   | atttypid | attstattarget 
----------+------------+----------+---------------
     1247 | xc_node_id     |       23 |             0
     1247 | tableoid       |       26 |             0
     1247 | cmax           |       29 |             0
     1247 | xmax           |       28 |             0
     1247 | cmin           |       29 |             0
     1247 | xmin           |       28 |             0
     1247 | oid            |       26 |             0
     1247 | ctid           |       27 |             0
     1247 | typname        |       19 |            -1
     1247 | typnamespace   |       26 |            -1
     1247 | typowner       |       26 |            -1
     1247 | typlen         |       21 |            -1
     1247 | typbyval       |       16 |            -1
     1247 | typtype        |       18 |            -1
     1247 | typcategory    |       18 |            -1
     1247 | typispreferred |       16 |            -1
     1247 | typisdefined   |       16 |            -1
     1247 | typdelim       |       18 |            -1
     1247 | typrelid       |       26 |            -1
     1247 | typelem        |       26 |            -1
     1247 | typarray       |       26 |            -1
     1247 | typinput       |       24 |            -1
     1247 | typoutput      |       24 |            -1
     1247 | typreceive     |       24 |            -1
     1247 | typsend        |       24 |            -1
     1247 | typmodin       |       24 |            -1
     1247 | typmodout      |       24 |            -1
     1247 | typanalyze     |       24 |            -1
     1247 | typalign       |       18 |            -1
     1247 | typstorage     |       18 |            -1
     1247 | typnotnull     |       16 |            -1
     1247 | typbasetype    |       26 |            -1
     1247 | typtypmod      |       23 |            -1
     1247 | typndims       |       23 |            -1
     1247 | typcollation   |       26 |            -1
     1247 | typdefaultbin  |      194 |            -1
     1247 | typdefault     |       25 |            -1
     1247 | typacl         |     1034 |            -1
(38 rows)

Pseudo-Types

openGauss type system contains a number of special-purpose entries that are collectively called pseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a function's argument or result type.

Each of the available pseudo-types is useful in situations where a function's behavior does not correspond to simply taking or returning a value of a specific SQL data type. Table 16 lists all pseudo-types.

Table 16 Pseudo-types

Name

Description

any

Indicates that a function accepts any input data type.

anyelement

Indicates that a function accepts any data type.

anyarray

Indicates that a function accepts any array data type.

anynonarray

Indicates that a function accepts any non-array data type.

anyenum

Indicates that a function accepts any enum data type.

anyrange

Indicates that a function accepts any range data type.

cstring

Indicates that a function accepts or returns a null-terminated C string.

internal

Indicates that a function accepts or returns a server-internal data type.

language_handler

Indicates that a procedural language call handler is declared to return language_handler.

fdw_handler

Indicates that a foreign-data wrapper handler is declared to return fdw_handler.

record

Identifies a function returning an unspecified row type.

trigger

Indicates that a trigger function is declared to return trigger.

void

Indicates that a function returns no value.

opaque

Indicates an obsolete type name that formerly served all the above purposes.

Functions coded in C (whether built in or dynamically loaded) can be declared to accept or return any of these pseudo-types. It is up to the user to ensure that the function will behave safely when a pseudo-type is used as an argument type.

Functions coded in procedural languages can use pseudo-types only as allowed by their implementation languages. At present, the procedural languages all forbid use of a pseudo-type as argument type, and allow only void and record as a result type. Some also support polymorphic functions using the anyelement, anyarray, anynonarray, anyenum, and anyrange types.

The internal pseudo-type is used to declare functions that are meant only to be called internally by the database system, and not by direct invocation in an SQL query. If a function has at least one internal-type argument then it cannot be called from SQL. You are advised not to create any function that is declared to return internal unless it has at least one internal argument.

Example:

-- Create a table.
openGauss=# create table t1 (a int);

-- Insert two data records.
openGauss=# insert into t1 values(1),(2);

-- Create the showall() function.
openGauss=# CREATE OR REPLACE FUNCTION showall() RETURNS SETOF record
AS $$ SELECT count(*) from t1; $$
LANGUAGE SQL;

-- Call the showall() function.
openGauss=# SELECT showall();
 showall 
---------
 (2)
(1 row)

-- Delete the function.
openGauss=# DROP FUNCTION showall();

-- Delete the table.
openGauss=# drop table t1;

Data Types Supported by Column-store Tables

Table 17 lists the data types supported by column-store tables.

Table 17 Data types supported by column-store tables

Category

Data Type

Length

Supported or Not

Numeric types

smallint

2

Supported

integer

4

Supported

bigint

8

Supported

decimal

–1

Supported

numeric

–1

Supported

real

4

Supported

double precision

8

Supported

smallserial

2

Supported

serial

4

Supported

bigserial

8

Supported

largeserial

–1

Supported

Monetary types

money

8

Supported

Character types

character varying(n), varchar(n)

–1

Supported

character(n), char(n)

n

Supported

character, char

1

Supported

text

–1

Supported

nvarchar2

–1

Supported

name

64

Not supported

Date/Time types

timestamp with time zone

8

Supported

timestamp without time zone

8

Supported

date

4

Supported

time without time zone

8

Supported

time with time zone

12

Supported

interval

16

Supported

Big objects

clob

–1

Supported

blob

–1

Not supported

Other types

...

...

Not supported

XML Types

openGauss supports the XML types. The following is an example:

openGauss= CREATE TABLE xmltest ( id int, data xml ); 
openGauss= INSERT INTO xmltest VALUES (1, 'one');
openGauss= INSERT INTO xmltest VALUES (2, 'two'); 
openGauss= SELECT * FROM xmltest ORDER BY 1;
 id | data 
----+--------------------
1 | one 
2 | two 
(2 rows)
openGauss= SELECT xmlconcat('', NULL, ''); 
xmlconcat
(1 row)
openGauss= SELECT xmlconcat('', NULL, ''); 
xmlconcat
(1 row)

NOTE:

  • This function is disabled by default. To use this function, you need to use the build.sh script to compile the database again and add the** --with-libxml** parameter to the ./configure configuration parameter.
  • Before compilation, you need to run the yum install -y libxml2-devel command. Otherwise, the “configure: error: library 'xml2' (version >= 2.6.23) is required for XML support” error will be reported.
  • Before compilation, you need to add the dependencyoperating system environment/libobs/comm/lib in the binary file of the third-party library to the system environment variable LD_LIBRARY_PATH. Otherwise, an error message is displayed, indicating that libiconv.so does not exist.

Data Types Used by the Ledger Database

The ledger database uses the hash16 data type to store row-level hash digests or table-level hash digests, and uses the hash32 data type to store global hash digests or history table verification hashes.

Table 18 Hash types used by the ledger database

Name

Description

Storage Space

Value Range

HASH16

Stored as an unsigned 64-bit integer

8 bytes

0 to +18446744073709551615

HASH32

Stored as a group of 16 unsigned integer elements

16 bytes

Value range of an unsigned integer array of 16 elements

The hash16 data type is used to store row-level or table-level hash digests in the ledger database. After obtaining the hash sequence of a 16-character hexadecimal string, the system calls the hash16in function to convert the sequence into an unsigned 64-bit integer and stores the integer in a hash16 variable. For example:

Hexadecimal string: e697da2eaa3a775b; 64-bit unsigned integer: 16615989244166043483
Hexadecimal string: ffffffffffffffff; 64-bit unsigned integer: 18446744073709551615

The hash32 data type is used to store the global hash digest or history table verification hash in the ledger database. After obtaining the hash sequence of a 32-character hexadecimal string, the systemcalls the hash32in function to convert the sequence to an array containing 16 unsigned integer elements. For example:

Hexadecimal string: 685847ed1fe38e18f6b0e2b18c00edee
Hash32 array: [104,88,71,237,31,227,142,24,246,176,226,177,140,0,237,238]
Feedback
编组 3备份
    openGauss 2024-04-15 00:46:08
    cancel