Data security has become a top priority, and organizations throughout the world are looking for effective solutions to protect their expanding volumes of sensitive data. As the volume of sensitive data grows, so does the need for robust data protection solutions. This is where data governance comes in, guaranteeing that data is correctly handled and used to preserve accuracy, security, and quality.
In this article, we are going to discuss in depth on Snowflake Dynamic Data Masking, a Snowflake Data Governance Feature. We'll go through the concept, benefits, and implementation of this feature, as well as provide step-by-step instructions on how to build and apply masking policies. We will also explore advanced data masking techniques, how to manage and retrieve masking policy information, and the limitations of Snowflake's data masking capabilities.
Let’s dive right in!!
Overview of built-in Snowflake governance features
Snowflake offers robust data governance capabilities to ensure the security and compliance of your data. There are several built-in built-in Snowflake data governance features, including:
- Snowflake Column-level security: This feature enables the application of a masking policy to a specific column in a table or view. It offers two distinct features, they are:
- Snowflake Dynamic Data Masking
- External Tokenization
- Row-level access policies/security: This feature defines row access policies to filter visible rows based on user permissions.
- Object tagging: Tags objects to classify and track sensitive data for compliance and security.
- Object tag-based masking policies: This feature enables the protection of column data by assigning a masking policy to a tag, which can then be set on a database object or the Snowflake account.
- Data classification: This feature allows users to automatically identify and classify columns in their tables containing personal or sensitive data.
- Object dependencies: This feature allows users to identify dependencies among Snowflake objects.
- Access History: This feature provides a record of all user activity related to data access and modification within a Snowflake account. Essentially, it tracks user queries that read column data and SQL statements that write data. The Access History feature is particularly useful for regulatory compliance auditing and also provides insights into frequently accessed tables and columns.
Snowflake's Column Level Security Features
Now that we are familiar with various built-in Snowflake data governance features, let's shift our focus to the main center of this article, Snowflake Column-Level Security. Snowflake column-level security feature is available only in the Enterprise edition or higher tiers. It provides enhanced measures to safeguard sensitive data in tables or views. It offers two distinct features, which are:
- Snowflake Dynamic Data Masking: Snowflake Dynamic Data Masking is a feature that enables organizations to hide sensitive data by masking it with other characters. It allows users to create Snowflake masking policies to conceal data in specific columns of tables or views. Dynamic Data Masking is applied in real-time, ensuring that unauthorized users or roles only see masked data.
- External Tokenization: Before we delve into External Tokenization, let's first understand what Tokenization is. Tokenization is a process that replaces sensitive data with ciphertext, rendering it unreadable. It involves encoding and decoding sensitive information, such as names, into ciphertext. On the other hand, External Tokenization enables the masking of sensitive data before it is loaded into Snowflake, which is achieved by utilizing an external function to tokenize the data and subsequently loading the tokenized data into Snowflake.
While both Snowflake Dynamic Data Masking and External Tokenization are column-level security features in Snowflake, Dynamic Data Masking is more commonly used as it allows users to easily implement data masking without the need for external functions. External Tokenization, on the other hand, involves a more complex setup and is typically not widely implemented in organizations.
What exactly is Snowflake Dynamic Data Masking?
Snowflake Dynamic Data Masking (DDM) is a column-level security feature that uses masking policies to selectively mask plain-text sensitive data in table and view columns at query time. This means the underlying data is not altered in the database, but rather masked as it is retrieved.
DDM policies in Snowflake are defined at the schema level, and can be applied to any number of tables or views within that schema. Each policy specifies which columns should be masked as well as the masking method to use.
Masking methods can include:
- Redaction: Replaces data with a fixed set of characters, like XXX, ***, &&&.
- Random data: Replaces with random fake data based on column data type.
- Shuffling: Scrambles the data while preserving format.
- Encryption: Encrypts the data, allowing decryption for authorized users.
When a user queries a table or view protected by a Snowflake dynamic data masking policy, the masking rules are applied before the results are returned, ensuring users only see the masked version of sensitive data, even if their permissions allow viewing the actual data.
Snowflake dynamic data masking is a powerful tool for protecting sensitive data. It is easy to use, scalable, and can be applied to any number of tables or views. Snowflake Dynamic Data Masking can help organizations to comply with data privacy regulations, such as the General Data Protection Regulation (GDPR), HIPAA, SOC, and PCI DSS.
What are the reasons for Snowflake Dynamic Data Masking ?
Here are the primary reasons for Snowflake dynamic data masking:
- Risk Mitigation: The main purpose of Snowflake Dynamic Data Masking is to reduce the risk of unauthorized access to sensitive data. So by masking sensitive columns in query results, Snowflake Dynamic Data Masking prevents potential leaks of data to unauthorized users.
- Confidentiality: Snowflake may contain financial data, employee data, intellectual property or other information that should remain confidential. Snowflake Dynamic Data Masking ensures this sensitive data is not exposed in query results to unauthorized users.
- Regulatory Compliance: Regulations like GDPR, HIPAA, SOC, and PCI DSS require strong safeguards for sensitive and personally identifiable information. Snowflake Dynamic Data Masking helps meet compliance requirements by protecting confidential data from bad actors.
- Snowflake Governance Initiatives: Snowflake Data governance and security teams typically drive initiatives to implement controls like Snowflake Dynamic Data Masking to better manage and protect sensitive Snowflake data access.
- Privacy and Legal Requirements: Privacy regulations and legal obligations may require Snowflake to mask sensitive data from unauthorized parties. Dynamic Data Masking provides the technical controls to enforce privacy requirements for data access.
Implementing Snowflake Dynamic Data Masking—Step-by-Step Guide
Creating a Custom Role with Masking Privileges
Firstly, let's start by creating a custom role with the necessary masking privileges. This role will be responsible for managing the Snowflake masking policies.
To create the custom role, execute the following SQL statement:
CREATE ROLE dynamic_masking_admin;
Let’s grant privileges to create Snowflake masking policies to the role dynamic_masking_admin
GRANT CREATE masking policy ON SCHEMA my_db.my_schema TO ROLE dynamic_masking_admin;
Now, let’s grant privileges to apply Snowflake masking policies to the role dynamic_masking_admin.
GRANT apply masking policy ON account TO ROLE dynamic_masking_admin;
Assign a Masking Role to an Existing Role/User
Next, assign the masking role to an existing role or user who will be responsible for managing and applying Snowflake masking policies.
Granting the masking role can enable individuals to inherit the masking privileges and effectively implement data masking on the desired columns.
To assign the masking role to an existing role, execute the following SQL statement:
GRANT ROLE dynamic_masking_admin TO school_principal;
Note: dynamic_masking_admin role, by default, will not have access to any database or warehouse. The role needs to be assigned to another Custom Role or a User who has privileges to access a database and warehouse.
To assign the masking role to an individual user, execute the following SQL statement:
GRANT ROLE dynamic_masking_admin TO USER [USERNAME];
Steps for creating Snowflake masking policies
With the custom role and privileges in place, it's time to create a masking policy. A masking policy defines how data should be masked based on specific conditions or rules. Snowflake offers flexibility in defining masking policies to suit your data protection needs.
Here is what making policy should look like:
CREATE OR replace masking policy [POLICY_NAME] AS (val [COLUMN_TYPE]) returns [COLUMN_TYPE] BEGIN CASE WHEN current_role() IN ('[AUTHORIZED_ROLE]') THEN val ELSE '[MASKING_VALUE]' END END;
- [POLICY_NAME] with a suitable name for your masking policy
- [COLUMN_TYPE] with the data type of the column you wish to mask.
- [AUTHORIZED_ROLE] as the role that should have unmasked access
- [MASKING_VALUE] as the value to mask the data.
Here is an example:
The below SQL statement creates a masking policy, data_masking that can be applied to columns of type string.
CREATE OR replace masking policy data_masking AS (val string) returns string -> CASE WHEN current_role() IN (school_principal) THEN val ELSE '*************' END;
This masking policy masks the data applied on a column when queried from a role other than school_principal.
Applying the Masking Policy to a Table or View Column
After creating the masking policy, it's time to apply it to the desired column within a table or view. By applying the masking policy, you ensure that sensitive data in that column is appropriately masked, while authorized roles can still access the original data.
To apply the masking policy, execute the following SQL statement:
ALTER TABLE [TABLE_NAME] MODIFY COLUMN [COLUMN_NAME] SET MASKING POLICY [POLICY_NAME];
- [TABLE_NAME] with the name of the table or view where the column is located.
- [COLUMN_NAME] with the name of the column to be masked
- [POLICY_NAME] with the name of the masking policy created in the previous step.
Here is an example:
ALTER TABLE IF EXISTS student_records MODIFY COLUMN email SET masking policy data_masking;
Verifying the Masking Rules by Querying Data
To make sure the masking rules are correctly applied, it is crucial to verify the results by querying the data.
By testing the data retrieval from different roles, you can see the masking effects and confirm that sensitive information remains hidden from unauthorized access.
Execute queries from different roles to verify the masking rules:
Querying Data from school_principal Role:
When queried from the school_principal role, the data in the student_records table appears unmasked. Here is an image showing the unmasked data:
use role school_principal; select first_name, last_name, gender, email from student_records;
Querying Data from student Role:
When queried from the student role, the data in the student_records table has masking applied to the email column.
Here is an image showing the masked data:
use role student; select first_name, last_name, gender, email from student_records;
Unsetting Masking Policy on a Column
If we want to remove the masking policy applied to a specific column, we can use the following SQL statement:
ALTER TABLE IF EXISTS student_records MODIFY email UNSET MASKING POLICY;
This statement removes the masking policy from the email column in the student_records table.
Managing and Extracting Information of Snowflake Masking Policies
Altering Masking Policies
Snowflake allows us to modify existing masking policies by adding new rules or renaming the policy. Any changes made to the masking policy will take effect when the next SQL query that uses the policy runs.
To alter an existing masking policy in Snowflake, we use the following syntax:
ALTER MASKING POLICY [IF EXISTS] <name> SET BODY -> <expression_on_arg_name> ALTER MASKING POLICY [IF EXISTS] <name> RENAME TO <new_name> ALTER MASKING POLICY [IF EXISTS] <name> SET COMMENT = 'strings'
Extracting Information of Existing Masking Policies
We can extract information about existing masking policies in Snowflake using the following SQL statements:
Listing All Masking Policies:
The following SQL statement lists all the masking policies present in the current session's database and schema:
SHOW MASKING POLICIES;
This command provides information such as the creation date, database, schema names, owner, and any available comments for each masking policy.
Describing a Masking Policy:
The following SQL statement describes the details of a specific masking policy, including its creation date, name, data type, and SQL expression:
DESCRIBE MASKING POLICY <policy_name>;
This command extracts information about the specified masking policy.
Here is an example:
DESCRIBE MASKING POLICY DATA_MASKING;
Step by step process of dropping a Snowflake masking policy
To drop a masking policy in Snowflake, we need to follow these steps:
Find the Columns with Applied Policy:
First, we need to identify the columns where the masking policy is currently applied. We can use the following SQL statement to list all the columns on which the DATA_MASKING masking policy is applied:
SELECT * FROM TABLE(INFORMATION_SCHEMA.POLICY_REFERENCES(POLICY_NAME => 'DATA_MASKING'));
This statement retrieves information about the columns where the specified masking policy is applied.
Unset the Masking Policy:
Once we identify the columns where the masking policy is applied, we need to unset the masking policy from those columns.
This can be done using the following SQL statement:
ALTER TABLE IF EXISTS <table_name> MODIFY <column_name> UNSET MASKING POLICY;
Drop the Masking Policy:
Finally, to drop the masking policy, we use the following SQL statement:
DROP MASKING POLICY <policy_name>;
Replace <policy_name> with the name of the masking policy that you want to drop.
Here is an example of dropping of the DATA_MASKING masking policy:
Advanced Snowflake Dynamic Data Masking Techniques in Snowflake:
Partial Data Masking
Snowflake also supports partially masking column data. We can create a masking policy that partially masks the email data for the student role, leaving the email domain unmasked.
Creating the Partial Data Masking Policy:
We can create a masking policy called dynamic_email_masking using the following SQL statement:
create or replace masking policy dynamic_email_masking as (val string) returns string -> case when current_role() in ('SCHOOL_PRINCIPAL') then val else regexp_replace(val,'. +\@','*****@') -- leave email domain unmasked end;
This particular masking policy will mask the email address by replacing everything after the first period with asterisks (*). But, the email domain will be left unmasked, meaning that users with the SCHOOL_PRINCIPAL role will be able to see the full email address, while users with other roles will only be able to see the first part of the email address, followed by asterisks.
Applying the Masking Policy:
To apply the dynamic_email_masking policy to the email column in the student_records table, we can use the following SQL statement:
ALTER TABLE IF EXISTS student_records MODIFY COLUMN email SET MASKING POLICY dynamic_email_masking;
This statement applies the masking policy to the email column. Once you have applied the masking policy, users with the SCHOOL_PRINCIPAL role will be able to see the full email address for all students in the student_records table. Noet that users with other roles will only be able to see the first part of the email address, followed by asterisks.
Conditional Data Masking
Conditional Data Masking allows us to selectively apply masking to a column based on the value of another column. We can create conditional data masking in Snowflake using the student_records table for the email column, where users with the SCHOOL_PRINCIPAL role can see the full email address and users with other roles will see the first five characters and the last two characters of the email address:
Creating the Conditional Data Masking Policy:
We can create a masking policy called conditional_email_masking using the following SQL statement:
create or replace masking policy CONDITIONAL_EMAIL_MASKING as (val string) returns string -> case when current_role() in ('SCHOOL_PRINCIPAL') then val else substring(val, 1, 5) || '***' || substring(val, -2) end;
This masking policy will only be applied to the email column in the student_records table. Only users with the SCHOOL_PRINCIPAL role will be able to see the full email address, while users with other roles will only see the first five characters and the last two characters of the email address.
Applying the Masking Policy:
To apply the dynamic_email_masking policy to the email column based on the value of the student_id column in the student_records table, we use the following SQL statement:
ALTER TABLE IF EXISTS student_records MODIFY COLUMN email SET MASKING POLICY CONDITIONAL_EMAIL_MASKING USING (email, student_id);
This statement applies the masking policy to the email column, considering the values in the email and student_id columns.
Limitations of Snowflake Dynamic Data Masking
Here are some key limitations of Snowflake Dynamic Data Masking:
- Snowflake masking features require at least an Enterprise Edition subscription(or Higher).
- Masking can impact query performance since Snowflake has to evaluate the masking rules for each row returned in the result set. More complex rules can slow down query response times.
- Masking does not hide data in columns that are not selected in the query. For example, if a query selects only name and age columns, the masking rules will apply only to name and age. Other columns will be returned unmasked.
- Masking conditions cannot be based on encrypted column values since Snowflake cannot evaluate conditions on encrypted data. Masking rules can only use unencrypted columns.
- It does not mask data in temporary tables or unmanaged external tables. It only works for managed tables in Snowflake.
- It only works on SELECT queries. It does not mask data for INSERT, UPDATE or DELETE queries. So if a user has DML access to tables, they will still see the actual data. It only masks data for read-only access.
- It cannot be applied to virtual columns. Virtual columns are derived columns that are not stored in the database, which means that Dynamic Data Masking cannot be used to mask data in virtual columns.
- It cannot be applied to shared objects. Shared objects are objects that are stored in a Snowflake account and can be shared with other users or accounts.
- Dynamic Data Masking can be complex to set up and manage, especially if you have a large number of tables and columns. You need to create a masking policy for each column that you want to mask, and you need to make sure that the masking policy is applied to the correct tables and columns.
Points to Remember—Critical Do's and Don'ts—When Working With Snowflake Dynamic Data Masking
Here are some additional points to remember while working with Snowflake dynamic data masking:
- Snowflake dynamic data masking policies obfuscate data at query runtime, original data is unchanged
- Snowflake dynamic data masking prevents unauthorized users from seeing real data
- Take backup data before applying masking
- Masking applies only when reading data, not DML
- Snowflake dynamic data masking policy names must be unique within a database schema.
- Masking policies are inherited by cloned objects, ensuring consistent data protection across replicated data.
- Masking policies cannot be directly applied to virtual columns in Snowflake. To apply a dynamic data masking policy to a virtual column, you can create a view on the virtual columns and then apply the policy to the corresponding view columns.
- Snowflake records the original query executed by the user on the History page of the web interface. The query details can be found in the SQL Text column, providing visibility into the original query even with data masking applied.
- Masking policy names used in a specific query can be found in the Query Profile, which helps in tracking the applied policies for auditing and debugging purposes.
At last, data security is a critical concern for organizations, and Snowflake's Dynamic Data Masking feature offers a powerful solution to protect sensitive Snowflake data. Snowflake's Dynamic Data Masking is an extremely powerful tool that empowers organizations to bring sensitive data into Snowflake platforms while effectively managing it at scale. Snowflake dynamic data masking combines policy-based approaches and role-based access control (RBAC) and makes sure that only authorized individuals can access sensitive data, protecting it from prying eyes and mitigating the risk of data breaches. Throughout this article, we explored the concept, benefits, and implementation of Dynamic Data Masking, covering step-by-step instructions for building and applying masking policies. We also delved into advanced techniques like partial and conditional data masking, discussed policy management, and highlighted the limitations as well as its benefits.
Just as a skilled locksmith carefully safeguards valuable treasures in a secure vault, Snowflake's Dynamic Data Masking feature acts as a trustworthy guardian for organizations' sensitive data.
What is Snowflake Dynamic Data Masking?
Snowflake Dynamic Data Masking is a security feature in Snowflake that allows the masking of sensitive data in query results.
How does Dynamic Data Masking work in Snowflake?
It works by applying masking policies to specific columns in tables and views, which replace the actual data with masked data in query results.
Can I apply Dynamic Data Masking to any column in Snowflake?
Yes, you can apply it to any table or view column that contains sensitive data. It cannot be applied directly to virtual columns.
Is the original data altered when using Dynamic Data Masking?
No, the original data in the micro-partitions is unchanged. Only the query results are masked.
Who can define masking policies in Snowflake?
Only users with the necessary privileges, such as ACCOUNTADMIN or SECURITYADMIN roles, can define masking policies.
Can I use Dynamic Data Masking with third-party tools?
Yes, as long as the tool can connect to Snowflake and execute SQL queries.
How can I test my Snowflake masking policies?
You can test them by running SELECT queries and checking if the returned data is masked as expected.
Can I use Dynamic Data Masking to mask data in real-time?
Yes, the data is masked in real-time during query execution.
Can I use different Snowflake masking policies for different users?
Yes, you can define different masking policies and grant access to them based on roles in Snowflake.
What types of data can I mask with Dynamic Data Masking?
You can mask any type of data, including numerical, string, and date/time data.
What happens if I drop a masking policy?
Only future queries will show unmasked data. Historical query results from before the policy was dropped remain masked.
Can I use Dynamic Data Masking with Snowflake's Materialized Views feature?
Yes, masking will be applied at query time on the materialized view, not during its creation.