HOW TO: Create and Manage Snowflake File Formats

Snowflake file formats are like duct tape—they bind together the messy process of loading and unloading data. These database objects define the structure and organization of files in Snowflake stages, making staged data easy to query and ingest into tables. Snowflake file formats contain metadata about the data file, such as its type (CSV, JSON, AVRO, PARQUET, and more), formatting options, and compression method. They are valuable when loading/unloading data from Snowflake stages into tables, or when creating external tables on staged files.

In this article, we will walk you through how to create and manage Snowflake file formats. We will explore how this powerful feature can simplify the process of accessing staged data, making it easier than ever to load data into and unload data out of your Snowflake tables. We'll also delve into the various file types and formatting options.

Snowflake File Formats—Understanding the Depths

A Snowflake file format is a named database object that encapsulates information about a data file. This information includes the file's type (CSV, JSON, etc.), formatting options, and compression method.

Snowflake file formats are used to simplify the process of loading and unloading data from Snowflake tables. When you load data from a file into a table, you can specify the file format to use. This tells Snowflake how to interpret the data in the file and load it into the table correctly.

Here are the list of supported Snowflake file formats:

  • CSV (Comma-separated values): This is the most common file format for loading data into Snowflake.
  • JSON (JavaScript Object Notation): This is a flexible and lightweight file format that is often used for semi-structured data.
  • Avro: A binary file format that is efficient for storing and querying large datasets.
  • ORC (Optimized Row Columnar): A columnar file format that is optimized for analytical queries.
  • Parquet: A columnar file format that is similar to ORC, but it is more widely supported.
  • XML (Extensible Markup Language): This is a text-based file format that is often used for storing structured data.

Please note that Snowflake does not support unloading or exporting tables to all of the above file formats. The below table describes this.

Supported Snowflake file formats for loading and unloading
Supported Snowflake file formats for loading and unloading - snowflake file formats
Note: When loading data from files into tables, Snowflake supports either NDJSON (“Newline Delimited JSON”) standard format or comma-separated JSON format. However, when unloading data from tables to files, Snowflake exclusively outputs in NDJSON format.

How to Create Snowflake File Formats?

Step-by-step guide to creating Snowflake file format

Step 1:  Login to Snowflake and locate the "Databases" option and then Click on it to display a list of available databases. Choose the appropriate database and select the desired schema where you want to create the File Format.

Step 2: Now, at the top right-hand corner of the interface, you'll find a "Create" button. Click on this button to reveal a drop-down menu containing various options for creating different Snowflake objects.

Step 3: In the drop-down menu, look for the "File Format" option and click on it. This action will open a new window dedicated to creating a Snowflake File Format.

Creating a Snowflake File Format in the interface - snowflake file formats
Creating a Snowflake File Format in the interface

Step 5: Finally, you can specify the details and formatting options for your Snowflake File Format.

Creating a Snowflake File Format in the interface - snowflake file formats
Creating a Snowflake File Formats in the interface

Alternatively:

Step 1: Instead of navigating through the database and schema options, you can directly open a Snowflake worksheet by clicking on the "Worksheet" option.

Step 2: Now, within the opened worksheet, select the appropriate database and schema, or you can also specify the database and schema names in the file format name itself.

Step 3: In the worksheet, execute a SQL statement to create the Snowflake File Format. The syntax for creating a File Format in Snowflake follows the pattern:

CREATE FILE FORMAT <file_format_name>
TYPE = <file_type>
[FORMAT_OPTIONS = (<format_options>)]
[COMPRESSION = <compression_type>];

Snowflake File Formats Examples

Here are some examples of how to create Snowflake file formats for CSV, JSON, AVRO, ORC, PARQUET, and XML file type:

1) Snowflake File Format CSV:

CREATE FILE FORMAT my_csv 
  TYPE = 'CSV' 
  FIELD_DELIMITER = ',' 
  RECORD_DELIMITER = '\n' 
  FIELD_OPTIONALLY_ENCLOSED_BY = '\"';
Creating CSV file format in Snowflake with specific format type options - snowflake file formats
Creating CSV Snowflake file format with specific format type options

Here is a table summarizing the Snowflake File Format Type Options:

For CSV files:

Table summarizing the Snowflake File Format Type Options for CSV files - snowflake file formats
Table summarizing the Snowflake File Formats Type Options for CSV files

2) Snowflake File Format JSON:

CREATE FILE FORMAT my_json 
  TYPE = 'JSON'; 
Creating JSON file format in Snowflake - snowflake file formats
Creating JSON Snowflake file formats

Here is a table summarizing the Snowflake File Format Type Options:

For JSON files:

Table summarizing the Snowflake File Format Type Options for JSON files - snowflake file formats
Table summarizing the Snowflake File Format Type Options for JSON files

3) Snowflake File Format Avro:

CREATE FILE FORMAT my_avro 
  TYPE = 'AVRO' 
  COMPRESSION = 'auto';
Creating AVRO file format in Snowflake with specific format type options - snowflake file formats
Creating AVRO file format in Snowflake with specific format type options

Here is a table summarizing the Snowflake File Format Type Options:

For AVRO files:

Table summarizing the Snowflake File Format Type Options for AVRO files - snowflake file formats
Table summarizing the Snowflake File Format Type Options for AVRO files

4) Snowflake File Format ORC:

CREATE FILE FORMAT my_orc
   TYPE = 'ORC'; 
Creating ORC file format in Snowflake - snowflake file formats
Creating ORC file format in Snowflake - snowflake file formats

Here is a table summarizing the Snowflake File Format Type Options:

For ORC files:

Table summarizing the Snowflake File Format Type Options ORC files - snowflake file formats
Table summarizing the Snowflake File Format Type Options ORC files

5) Snowflake File Format Parquet:

CREATE FILE FORMAT my_parquet
  TYPE = 'PARQUET'
  COMPRESSION = 'snappy';
Creating Parquet file format in Snowflake - snowflake file formats
Creating Parquet file format in Snowflake

Here is a table summarizing the Snowflake File Format Type Options:

For PARQUET files:

Table summarizing the Snowflake File Format Type Options for PARQUET files - snowflake files formats
Table summarizing the Snowflake File Format Type Options for PARQUET files

6) Snowflake File Format XML:

CREATE FILE FORMAT my_xml 
  TYPE = 'XML' 
  STRIP_OUTER_ELEMENT = 'true' 
  ENABLE_SNOWFLAKE_DATA = 'true';
Creating XML file format in Snowflake with specific format type options - snowflake file formats
Creating XML file format in Snowflake with specific format type options

Here is a table summarizing the Format Type Options:

For XML files:

Table summarizing the Format Type Options for XML files - snowflake file formats
Table summarizing the Format Type Options for XML files
These are examples of how to create Snowflake file formats. For more in-depth information, please refer to the Snowflake documentation.

Use Cases for Snowflake File Formats

Snowflake File Formats are commonly utilized in three primary use cases:

  • Loading data into Snowflake tables: When you load data from internal Snowflake stage or external Snowflake stage into Snowflake tables, you specify a file format that matches your data files. This tells Snowflake how to parse and interpret your data.
  • Unloading data from Snowflake tables: Whenever you unload data from Snowflake tables into internal Snowflake stage or external Snowflake stage, you specify a file format to determine how the data files should be structured and formatted.
  • Creating Snowflake External Tables: Snowflake External Tables allow you to query data stored in external Snowflake stages without loading it into Snowflake tables. You specify a file format when creating the external table to define how Snowflake should parse the external data files.

Here is one simple example that shows loading data into the student table from Snowflake stage named my_stage using file format my_csv.

COPY INTO STUDENTS from @my_stage/input.csv
file_format = (format_name = my_csv);

Here is another example that shows creating an external table ext_table on top of student files using file format my_csv.

CREATE OR REPLACE EXTERNAL TABLE ext_table
  WITH LOCATION = @my_stage_location/
  FILE_FORMAT = (format_name = my_csv)
  PATTERN='.*students.*[.]csv';

Limitations of Snowflake File Formats:

Here are some of the limitations and restriction of Snowflake file formats:

  • Snowflake File format types are limited to - CSV, JSON, AVRO, ORC, PARQUET, XML. Snowflake currently only supports these format types.
  • File compression formats are limited to - GZIP, BZ2, ZSTD, DEFLATE, RAW (no compression). Snowflake only supports these compression codecs for file formats.
  • Schema must match the table schema. The schema defined in the file format (if any) must match the schema of the Snowflake table. Otherwise the COPY or CREATE EXTERNAL TABLE statement will fail.
  • Partitioning is only supported for a few types. Partitioning (loading subsets of data into tables based on partition keys) is only supported for ORC, PARQUET and JSON file formats currently.

Managing Snowflake File Formats

Snowflake file formats provides several operations for managing file formats, which allows users to modify, drop, show, and describe existing Snowflake file formats. These operations offer flexibility and control over the formatting options used for loading and unloading data in Snowflake.

ALTER Snowflake File Format:

ALTER FILE FORMAT command is used to modify existing file formats in Snowflake. It allows users to update designated properties of a file format to align with their specific needs. However, certain constraints apply; not all parameters are modifiable using this command. In cases where certain parameters must be changed, it becomes necessary to drop and subsequently recreate the file format.

Here is one example to change the compression method of a file format named "my_format" to GZIP, the following query can be executed:

ALTER FILE FORMAT my_format SET COMPRESSION = GZIP;

Let’s modify our Snowflake file format for the example we provided earlier:

ALTER FILE FORMAT my_csv SET COMPRESSION = GZIP;
Modifying file format to use GZIP compression - snowflake file formats
Modifying file format to use GZIP compression

SHOW Snowflake File Format:

SHOW FILE FORMATS command provides a list of all the Snowflake file formats available. This query is useful for users to view the existing Snowflake file formats and their properties. For example, to retrieve the list of file formats, the following query can be executed:

SHOW FILE FORMATS;
Showing list of all the Snowflake file formats available - snowflake file formats
Showing list of all the Snowflake file formats available

DESCRIBE Snowflake File Format:

DESCRIBE FILE FORMAT provides detailed information about a specific file format in Snowflake. It displays the properties and settings associated with the file format. For example, to describe the file format named "my_format," the following query can be used:

DESCRIBE FILE FORMAT my_format;

Let’s take an example we provided earlier:

DESCRIBE FILE FORMAT my_csv;
Displaying file format properties - snowflake file formats
Displaying file format properties

DROP Snowflake File Format:

DROP FILE FORMAT command allows users to delete an existing file format from the Snowflake database. This operation is useful when a file format is no longer needed or has become obsolete. To drop a file format named "my_format" the following query can be used:

DROP FILE FORMAT my_format;

Let’s delete our previously created Snowflake file format;

DROP FILE FORMAT MY_CSV;
-- DROP FILE FORMAT MY_JSON;
-- DROP FILE FORMAT MY_AVRO;
-- DROP FILE FORMAT MY_ORC;
-- DROP FILE FORMAT MY_PARQUET;
-- DROP FILE FORMAT MY_XML;
Deleting an existing file format from Snowflake - snowflake file formats
Deleting an existing file format from Snowflake

These operations allow users to easily manage their Snowflake file formats, allowing them to tailor the formatting options to their specific needs.

Conclusion

Snowflake file formats are a handy tool for simplifying and streamlining data ingestion in Snowflake. They provide a structured method for defining the type, format, and compression of files in Snowflake stages, making them easily accessible as tables. Using Snowflake file formats, you can turn a jumble of files in a Snowflake stage into neatly organized, query-ready data.

Imagine Snowflake file formats as the GPS of a large ship navigating the vast ocean of data. Each data type or format represents a different destination, and the GPS—our Snowflake file format—guides us to them. It's not just about reaching the destination, but understanding the route, knowing which destinations to choose, and how to best navigate the journey.

FAQs

What is Snowflake stage?

A Snowflake stage is a location where data can be stored within the Snowflake data warehouse. It can be thought of as a folder or directory within the Snowflake environment where data files in various formats (such as CSV, JSON, or Parquet) can be stored and accessed by Snowflake users.

What is the difference between Snowflake stage and External Tables?

A Snowflake stage is a storage location for data files, whereas an external table is a virtual table that points to data stored outside Snowflake. The key difference is that a stage loads data into Snowflake, while an external table enables querying of data located external to Snowflake.

What is the difference between external table and regular table in Snowflake?

External tables reference external data sources. Regular tables store data natively within Snowflake.

What is the difference between Snowpipe and Snowflake external table?

Snowpipe in Snowflake is an automated data ingestion service that continuously loads data from external sources into Snowflake tables. On the other hand, an external table is a virtual table that references data stored outside of Snowflake.

What are the supported Snowflake file formats?

Snowflake supports a variety of file formats, including CSV, JSON, Avro, Parquet, ORC, and XML.

How do you use a file format in Snowflake?

Create it with CREATE FILE FORMAT, specifying the format type, compression, encoding, etc. Use it when loading/unloading data.

What is default Snowflake file format?

The default Snowflake file format is CSV. This means that if you do not specify a file format when loading or unloading data, Snowflake will use the CSV format.

What is the best Snowflake file format?

It depends on use case. Parquet/ORC for structured data, JSON/Avro for semi-structured, CSV for simple use cases. Consider size, performance, compression.

How do I get a list of Snowflake file formats in a Snowflake?

Use SHOW FILE FORMATS to display names, types, and properties of existing file formats.

Pramit Marattha

Technical Content Lead

Pramit is a Technical Content Lead at Chaos Genius.

People who are also involved

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.