Dynamic Data Masking

Availability

This feature is available since openGauss 1.1.0.

Introduction

Data masking is an effective database privacy protection solution, which can prevent attackers from snooping on private data. The dynamic data masking mechanism is a technology that protects privacy data by customizing masking policies. It can effectively prevent unauthorized users from accessing sensitive information while retaining original data. After the administrator specifies the object to be anonymized and customizes a data masking policy, if the database resources queried by a user are associated with a masking policy, data is anonymized based on the user identity and masking policy to restrict attackers' access to privacy data.

Benefits

Data privacy protection is one of the required database security capabilities. It can restrict attackers' access to privacy data, ensuring privacy data security. The dynamic data masking mechanism can protect the privacy of specified database resources by configuring masking policies. In addition, the masking policy configuration is flexible and can implement targeted privacy protection in specific user scenarios.

Description

The dynamic data masking mechanism customizes masking policies based on resource labels. It can select masking modes based on the site requirements or customize masking policies for specific users. The SQL syntax for creating a complete masking policy is as follows:

CREATE RESOURCE LABEL label_for_creditcard ADD COLUMN(user1.table1.creditcard);
CREATE RESOURCE LABEL label_for_name ADD COLUMN(user1.table1.name);
CREATE MASKING POLICY msk_creditcard creditcardmasking ON LABEL(label_for_creditcard);
CREATE MASKING POLICY msk_name randommasking ON LABEL(label_for_name) FILTER ON IP(local), ROLES(dev);

label_for_creditcard and msk_name are the resource labels for masking, and each label is allocated to two column objects. creditcardmasking and randommasking are preset masking functions. msk_creditcard specifies that the masking policy creditcardmasking will be applied when any user accesses resources with label_for_creditcard, regardless of the access source. msk_name specifies that the masking policy randommasking will be applied when local user dev accesses resources with label_for_name. If FILTER is not specified, the setting takes effect for all users. Otherwise, the setting takes effect only for specified users.

The following table shows the preset masking functions:

Masking Function	Example
creditcardmasking	'4880-9898-4545-2525' will be anonymized as 'xxxx-xxxx-xxxx-2525'. This function anonymizes digits except the last four digits.
basicemailmasking	'abcd@gmail.com' will be anonymized as 'xxxx@gmail.com'. This function anonymizes text before the first @.
fullemailmasking	'abcd@gmail.com' will be anonymized as 'xxxx@xxxxx.com'. This function anonymizes text before the first dot (.) (except @).
alldigitsmasking	'alex123alex' will be anonymized as 'alex000alex'. This function anonymizes only digits in the text.
shufflemasking	'hello word' will be anonymized as 'hlwoeor dl'. This weak masking function is implemented through character dislocation. You are not advised to use this function to anonymize strings with strong semantics.
randommasking	'hello word' will be anonymized as 'ad5f5ghdf5'. This function randomly anonymizes text by character.
regexpmasking	You need to enter four parameters in sequence. reg indicates the character string to be replaced, replace_text indicates the character string after replacement, pos indicates the position where the target character string starts to be replaced, and reg_len indicates the replacement length. Both pos and reg_len are of the integer type. reg and replace_text can be expressed by regular expressions. If pos is not specified, the default value is 0. If reg_len is not specified, the default value is –1, indicating that all character strings after pos will be replaced. CREATE MASKING POLICY msk_creditcard regexpmasking('[\d+]', 'x', 5, 9 ) ON LABEL(label_for_creditcard); '4880-9898-4545-2525' will be anonymized as '4880-xxxx-xxxx-2525'.
maskall	'4880-9898-4545-2525' will be anonymized as 'xxxxxxxxxxxxxxxxxxx'.

The data types supported by each masking function are as follows:

Masking Function	Supported Data Types
creditcardmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data in credit card format only)
basicemailmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data in email format only)
fullemailmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data in email format only)
alldigitsmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data containing digits only)
shufflemasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (text data only)
randommasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (text data only)
maskall	BOOL, RELTIME, TIME, TIMETZ, INTERVAL, TIMESTAMP, TIMESTAMPTZ, SMALLDATETIME, ABSTIME, TEXT, BPCHAR, VARCHAR, NVARCHAR2, NAME, INT8, INT4, INT2, INT1, NUMRIC, FLOAT4, FLOAT8, CASH

For unsupported data types, the maskall function is used for data masking by default. The data of the BOOL type is masked as '0'. The RELTIME type is masked as '1970'. The TIME, TIMETZ, and INTERVAL types are masked as '00:00:00.0000+00'. The TIMESTAMP, TIMESTAMPTZ, SMALLDATETIME, and ABSTIME types are masked as '1970-01-01 00:00:00.0000'. The TEXT, CHAR, BPCHAR, VARCHAR, NVARCHAR2, and NAME type are masked as 'x'. The INT8, INT4, INT2, INT1, NUMERIC, FLOAT4, FLOAT8 types are masked as '0'. If the data type is not supported by maskall, the masking policy cannot be created. If implicit conversion is involved in the masking column, the data type after implicit conversion is used for masking. In addition, if the masking policy is applied to a data column and takes effect, operations on the data in the column are performed based on the masking result.

Dynamic data masking applies to scenarios closely related to actual services. It provides users with proper masking query APIs and error handling logic based on service requirements to prevent raw data from being obtained through credential stuffing.

Enhancements

None.

Constraints

The dynamic data masking policy must be created by a user with the POLADMIN or SYSADMIN attribute, or by the initial user. Common users do not have the permission to access the security policy system catalog and system view.
Dynamic data masking takes effect only on data tables for which masking policies are configured. Audit logs are not within the effective scope of the masking policies.
In a masking policy, only one masking mode can be specified for a resource label.
Multiple masking policies cannot be used to anonymize the same resource label, except when FILTER is used to specify user scenarios where the policies take effect and there is no intersection between user scenarios of different masking policies that contain the same resource label. In this case, you can identify the policy that a resource label is anonymized by based on the user scenario.
It is recommended that APP in FILTER be set to applications in the same trusted domain. Since a client may be forged, a security mechanism must be formed on the client when APP is used to reduce misuse risks. Generally, you are not advised to set APP. If it is set, pay attention to the risk of client spoofing.
For INSERT or MERGE INTO operations with the query clause, if the source table contains anonymized columns, the inserted or updated result in the preceding two operations is the anonymized value and cannot be restored.
When the built-in security policy is enabled, the ALTER TABLE EXCHANGE PARTITION statement fails to be executed if the source table is in the anonymized column.
If a dynamic data masking policy is configured for a table, grant the trigger permission of the table to other users with caution to prevent other users from using the trigger to bypass the masking policy.
A maximum of 98 dynamic data masking policies can be created.
Only the preceding seven preset masking policies can be used.
Only data with the resource labels containing the COLUMN attribute can be anonymized.
Only columns in base tables can be anonymized.
Only the data queried using SELECT can be anonymized.
Taking an IPv4 address as an example, the following formats are supported:
IP Address Format
Example
Single IP address
127.0.0.1
IP address with mask
127.0.0.1|255.255.255.0
CIDR IP address
127.0.0.1/24
IP address segment
127.0.0.1-127.0.0.5

Dependencies

None.

IP Address Format	Example
Single IP address	127.0.0.1
IP address with mask	127.0.0.1\|255.255.255.0
CIDR IP address	127.0.0.1/24
IP address segment	127.0.0.1-127.0.0.5