Introducing Chaos Genius for Databricks Cost Optimization

Join the waitlist

Snowflake Data Transfer Costs 101 - An In-Depth Guide to Manage and Optimize Data Transfer Costs

Snowflake, a popular cloud data platform, provides users with superior scalability, low latency, advanced analytics and flexible pay-per-use pricing—but managing its data transfer costs can be very daunting and hectic for businesses. To help get the most value out of Snowflake, it's important to understand how Snowflake data transfer costs work, including both data ingress and egress costs.  

In this article, we'll explain Snowflake's data transfer costs, including data ingress and egress fees, how they vary depending on cloud providers and regions, and how to optimize Snowflake costs and maximize ROI.

Understanding Snowflake Data Transfer Costs

Snowflake calculates data transfer costs by considering criteria such as data size, transfer rate, and the region or cloud provider from where the data is being transferred. Before tackling how to reduce these costs, it is important to understand how Snowflake calculates them.

Snowflake Data Ingress and Egress Costs

Snowflake data transfer cost structure depends on two factors: ingress and egress.

Ingress, meaning transferring the data into Snowflake, is free of charge. Data egress, however, or sending the data from Snowflake to another region or cloud platform incurs a certain fee. If you transfer your data within the same region, it's usually free. When transferring externally via external tables, functions, or Data Lake Exports, though, there are per-byte charges associated that can vary among different cloud providers.

Take a look at some Snowflake pricing tables for comparison to get an idea of what the difference in cost could look like:

Snowflake Data Transfer Costs in AWS:

Snowflake data transfer costs in AWS
Snowflake data transfer costs in AWS

Snowflake Data Transfer Costs in  Microsoft Azure:

Snowflake data transfer costs in Microsoft Azure
Snowflake data transfer costs in Microsoft Azure

Snowflake Data Transfer Costs in  GCP

Snowflake data transfer costs in GCP
Snowflake data transfer costs in GCP

Snowflake Features that Triggers Data Transfer Costs

Transferring data from a Snowflake account to a different region within the same cloud platform or to a different cloud platform using certain Snowflake features comes with  Snowflake data transfer costs. These costs may be triggered by several activities, such as:

1) Unloading Data in Snowflake

Unloading data in Snowflake is the process of extracting information from a database and exporting it to an external file or storage system such as Amazon S3, Azure Blob or Google Cloud Storage. This allows users to make the extracted data available for use in other applications, whether for analytics, reporting, or visualizations. Unloading can be done manually or automatically; however, it's important to consider factors such as data backup frequency, data volume, storage, and transfer costs first.

To unload data from Snowflake to another cloud storage provider, the COPY INTO <location> command is used. This requires setting up a stage associated with certain costs. It's possible to unload the data to a location different from where the Snowflake account is hosted. The complete syntax for the command is as follows:

COPY INTO { internalStage | externalStage | externalLocation }
     FROM { [<namespace>.]<table_name> | ( <query> ) }
[ PARTITION BY <expr> ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
                    TYPE = { CSV | JSON | PARQUET } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_ROWS ]
[ HEADER ]

Let’s chunk down this syntax line by line to understand it better.

To indicate where the data will be unloaded, use the COPY INTO command followed by the stage location, which can be internal or external. For instance:

Internal or external stage: COPY INTO @your-stage/your_data.csv

Google Cloud Bucket: COPY INTO 'gcs://bucket-name/your_folder/'

Amazon S3: COPY INTO 's3://bucket-name/your_folder/'

Azure: COPY INTO `azure://account.blob.core.windows.net/bucket-name/your_folder/'

The FROM clause specifies the source of the data, which can be a table or a query. For example:

Copy entire table: FROM SNOWFLAKE_SAMPLE_DATA.USERS

Copy using query: FROM (SELECT USER_ID, USER_FIRST_NAME, USER_LAST_NAME, USER_EMAIL FROM SNOWFLAKE_SAMPLE_DATA.USERS)

When unloading to an external storage location, you need to specify the STORAGE_INTEGRATION parameter, which contains authorization for Snowflake to write data to the external location.

The PARTITION BY parameter splits the data into separate files based on a string input, while the FILE_FORMAT parameter sets the file type and compression. The copyOptions parameter includes optional parameters that can be used to further customize the results of the COPY INTO command.

To ensure data consistency, consider creating a named file format where you can define file type, compression, and other parameters upfront. The VALIDATION_MODE parameter is used to return the rows instead of writing to your destination, allowing you to confirm that you have the correct data before unloading it.

Lastly, the HEADER parameter specifies whether you want the data to include headers.

Example query:

COPY INTO 's3://bucket-name/your_folder/'
FROM (SELECT USER_ID, USER_FIRST_NAME, USER_LAST_NAME, USER_EMAIL FROM SNOWFLAKE_SAMPLE_DATA.USERS)
STORAGE_INTEGRATION = YOUR_INTEGRATION
PARTITION BY LEFT(USER_LAST_NAME, 1)
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = 'AUTO')
DETAILED_OUTPUT = TRUE
HEADER = TRUE;
Note: Always ensure that the data has been unloaded successfully at the destination and take appropriate measures to prevent any potential data leaks.

2) Replicating Data in Snowflake

Replicating data is the process of creating a copy of data and storing it in a separate location. This process can be highly valuable for disaster recovery or for providing read-only access for reporting purposes. There are different methods of database replication that can be used, such as taking an up-to-date data snapshot of the primary database and copying it to the secondary database.

For Snowflake accounts, replication can involve copying data to another Snowflake account hosted on a different platform or region than the origin account. This approach helps ensure data availability, even in case of an outage or other issues with the primary account. Also, replicating data to different locations can bring benefits like lower latency and improved performance for users accessing data from various regions worldwide.

3) Writing external functions in Snowflake

External functions in Snowflake refer to user-defined functions that are not stored within the Snowflake and are instead executed outside of it. This feature allows users to easily access external application programming interfaces (APIs) and services, including geocoders, machine learning models, and custom code that may be running outside the Snowflake environment. External functions eliminate the need to export and re-import data when accessing third-party services, significantly simplifying data pipelines. This feature streamlines the data flow by allowing Snowflake to access data from external sources and then process it within the Snowflake environment. External functions can also enhance Snowflake's capabilities by integrating with external services, such as machine learning models or other complex algorithms, to perform more advanced data analysis. This integration is made possible by allowing Snowflake to send data to an external service, receive the results, and then process them within the Snowflake environment.

We will delve deeper into this topic in our upcoming article, but in the meantime, you can refer to this documentation to know more about it.

Delving into Snowflake Data Transfer Cost

By now, you may be aware that importing data into your Snowflake account is free of cost, but there is a per-byte fee to transfer data across regions on the same cloud platform or to a different cloud platform. To gain a more thorough understanding of historical data transfer costs, users can make use of Snowsight, the Snowflake web interface, or execute queries against the ACCOUNT_USAGE and ORGANIZATION_USAGE schemas. Snowsight provides a visual dashboard for obtaining cost overviews quickly. If you require additional in-depth information, you may run the queries against usage views to analyze cost data in greater depth and even generate custom reports/dashboards.

How to access Snowflake Data Transfer Costs?

In Snowflake, only the account administrator (a user with the ACCOUNTADMIN role) has default access to view cost and usage data in Snowsight, ACCOUNT_USAGE schema, and the ORGANIZATION_USAGE schema. However, if you have a USERADMIN role or higher, you can grant access to other users by assigning them SNOWFLAKE database roles.

The following SNOWFLAKE database roles can be used to access cost and usage data:

  • USAGE_VIEWER : This role provides access to cost and usage data for a single account in Snowsight and related views in the ACCOUNT_USAGE schema.
  • ORGANIZATION_USAGE_VIEWER: If the current account is the ORGADMIN account, this role provides access to cost and usage data for all accounts in Snowsight, along with views in the ORGANIZATION_USAGE schema that are related to cost and usage but not billing.

So, by default, only the account administrator has access to cost and usage data. However, SNOWFLAKE database roles can be used to give other users who need to view and analyze cost and usage data access to the data.

How to view the overall Snowflake Data Transfer Cost using Snowsight?

As we have already mentioned above, the account administrator( a user with the ACCOUNTADMIN role) can only use Snowsight to obtain an overview of the overall cost of using Snowflake for any given day, week, or month.

Here are the steps to use Snowsight to explore the overall cost:

  • Navigate to the "Admin" section and select "Usage"
Admin section and usage dropdown menu - Snowflake data transfer costs
Admin section and usage dropdown menu
  • From the drop-down list, select "All Usage Types."
Usage dashboard - Snowflake data transfer costs
Usage dashboard

How to Query Data for Snowflake Data Transfer Cost?

Before we begin querying the data for Snowflake data transfer costs, we must first understand that Snowflake has two schemas for querying data transfer costs: ORGANIZATION_USAGE and ACCOUNT_USAGE. These schemas contain data related to usage and cost and provide very detailed and granular, analytics-ready usage data to build custom reports or even dashboards.

  • ORGANIZATION_USAGE schema contains data about usage and costs at the organization level, including storage usage and cost data for all organizational databases and stages in a shared database named SNOWFLAKE. The ORGANIZATION USAGE schema is only accessible to users with the ORGADMIN or ACCOUNTADMIN roles.
  • ACCOUNT_USAGE schema contains usage and cost data at the account level, including storage consumption and cost for all tables inside an account.

Most views in the ORGANIZATION_USAGE and ACCOUNT_USAGE schemas show how much it costs to send data based on how much data is transferred. To view cost in currency rather than volume, write queries against the USAGE_IN_CURRENCY_DAILY View. This view transforms the amount of data transferred into a currency cost based on the daily price of transferring a TB.

The table below lists the views that contain usage and data transfer costs information from your Snowflake account to another region or cloud provider. These views can be used to gain insight into the costs associated with data transfer.

Views

Description

ORGANIZATION_USAGE.DATA_TRANSFER_DAILY_HISTORY

Provides the number of bytes transferred on a given day.

ORGANIZATION_USAGE.DATA_TRANSFER_HISTORY

Provides the number of bytes transferred, along with the source and target cloud and region, and type of transfer.

ACCOUNT_USAGE.DATABASE_REPLICATION_USAGE_HISTORY

Provides the number of bytes transferred and credit consumed during database replication.

ORGANIZATION_USAGE.REPLICATION_USAGE_HISTORY

Provides the number of bytes transferred and credits consumed during database replication.

ORGANIZATION_USAGE.REPLICATION_GROUP_USAGE_HISTORY

Provides the number of bytes transferred and credits consumed during replication for a specific replication group.

ORGANIZATION_USAGE.USAGE_IN_CURRENCY_DAILY

Provides daily data transfer in TB, along with the cost of that usage in the organization's currency.

The table above shows different Snowflake views about how data transfer is used and how much it costs, along with their descriptions and schema information. With these views, you can get more granular details about how Snowflake data transfer works and how much it costs.

5 Best Practices for Optimizing Data Transfer Costs

Searching for strategies to reduce Snowflake data transfer costs? Knowing how the platform calculates such costs is a good start. But there are also best practices that should be kept in mind if you want to save massively on Snowflake data transfer costs, such as:

1) Understanding Snowflake Data Usage Patterns

Understanding your data usage patterns is one of the most effective ways to optimize Snowflake data transfer costs.  By carefully analyzing how you access and use data, you can identify points where reducing transfers will not reduce the data quality. This can help to bring down transfer costs significantly.

For example, If you discover that certain datasets are not being accessed as often, you can reduce the frequency of their data transfers, thereby lowering transfer costs while preserving data integrity. So by utilizing this method, you can efficiently cut down on data transfer costs while preserving the availability of your data.

2) Reducing Unnecessary Snowflake Data Transfer

Reducing unnecessary data transfers is another way to optimize Snowflake data transfer costs. This can be achieved by minimizing data duplication and consolidating/merging datasets where possible. Doing so will help reduce the amount of data you send, keeping your transfer costs low, thus cutting down overall costs.

For example, if you have multiple copies of the same dataset stored in separate places, you can merge those copies to save on data transfer costs. Similarly, if you have datasets that are duplicated across multiple accounts, you can merge those accounts to reduce data transfer costs.

3) Using Snowflake Data Compression and Encryption Features

Snowflake offers built-in data compression and encryption features that can help you reduce data transfer costs. Data compression helps minimize the amount of data that needs to be transferred, while encryption ensures that your data is secure during transit. Encryption can also help you comply with industry regulations and prevent potential data breaches. By leveraging these cost-saving features, users can make sure that their data is transferred safely and securely without breaking the bank.

4) Optimizing your Snowflake data transfer using Snowpipe

Snowpipe is a Snowflake feature that can help you significantly reduce data transfer costs. This one-of-a-kind loading process seamlessly loads streaming data into your Snowflake account, removing the need to preprocess the data before transferring it. Snowpipe also makes use of Snowflake's auto-scaling capability to optimally allocate compute resources as needed to process incoming data – helping you maximize efficiency while still cutting costs.

5) Monitoring your data transfer costs and usage

Keeping a close eye on and monitoring your Snowflake data transfer costs is essential for optimizing performance and managing your budget. Snowflake gives you access to comprehensive reports on data transfers, indicating the amount of data transferred, where it moved from and to, and associated costs. This detailed information can be used to control your costs and change how you move data around.

Conclusion

Knowing how to reduce your Snowflake data transfer costs is key to a successful strategy to keep your Snowflake data transfer costs as low as possible. In this guide, we have provided you an overview of what is a Snowflake data transfer cost, how to calculate it, and tips on how to minimize the costs through proactive measures. Armed with these practices, you can make effective decisions for cost optimization and keep your Snowflake data transfer costs LOW!


FAQs

What factors influence the cost of Snowflake data transfer?

Snowflake data transfer cost structure depends on two factors: ingress and egress.

Is data ingress free in Snowflake?

Yes, transferring data into Snowflake (data ingress) is generally free of charge.

Does Snowflake charge for data egress?

Yes, there are fees associated with data egress in Snowflake. Sending data from Snowflake to another region or cloud platform incurs a certain fee.

Will I incur data transfer costs if I move data within the same region?

No, there are no data transfer costs when moving data within the same region in Snowflake.

Can I dispute an unexpected spike in charges?

Yes, you can contact Snowflake support to review and dispute any unexpected data transfer charges on your Snowflake bill.


Pramit Marattha

Technical Content Lead

Pramit is a Technical Content Lead at Chaos Genius.

People who are also involved

“Chaos Genius has been a game-changer for our DataOps at NetApp. Thanks to the precise recommendations, intuitive interface and predictive capabilities, we were able to lower our Snowflake costs by 28%, yielding us a 20X ROI

Chaos Genius has given us a much better understanding of what's driving up our data-cloud bill. It's user-friendly, pays for itself quickly, and monitors costs daily while instantly alerting us to any usage anomalies.

Anju Mohan

Director, IT

Simon Esprit

Chief Technology Officer

Join today to get upto
30% Snowflake
savings

Join today to get upto 30% Snowflake savings

Unlock Snowflake Savings Join waitlist
Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.