Cost Efficiency in Snowflake: Optimize Warehouses & Storage

Introduction

Managing cost efficiency in Snowflake is crucial for organizations looking to optimize cloud expenses without compromising performance. Snowflake’s virtual warehouses, data storage, and table configurations directly impact billing. Understanding these factors helps businesses achieve optimal performance while controlling costs. This blog explores best practices to enhance cost efficiency in Snowflake.

1. Snowflake’s Virtual Warehouses and Cost Optimization

Virtual warehouses are a fundamental component of Snowflake’s architecture. Selecting the right configurations helps minimize costs while maintaining performance.

1.1 Choosing the Optimal Warehouse Size

Warehouse size affects both cost and performance. Selecting an extra-small or small warehouse is ideal for simple queries, while complex queries benefit from larger warehouse sizes. Performance improves as the warehouse size increases, but after a certain point, the benefits plateau.

To determine the optimal warehouse, test query execution times across different warehouse sizes. Ideally, a warehouse should run for at least one minute per query to maximize cost efficiency.

1.2 Auto Suspension of Warehouse

Auto Suspend allows warehouses to suspend automatically when they are not in use, hence saving cost.

Auto Suspend property can be disabled when:

If there are frequent, steady workloads on the warehouse 24/7. If you need the virtual warehouse readily available all the time for faster query results from the cache.

Note that disabling auto suspend can lead to heavy billing, and hence choose wisely. General practice is to keep the auto suspension enabled so that you only pay for the active time and not for the idle time.

By default, the Auto_Suspend time interval is 600 seconds. This cannot be optimal if, say, you run the queries once in 10 minutes and the query execution time is 1 minute. In such cases, it’s always better to set the AUTO_SUSPEND time interval according to the needs.

The time interval for Auto Suspend can be decided on the below factors:

1. Time interval between two subsequent queries executed in the warehouse.
2. Average query execution time.

Cost vs. Performance While Setting the Auto_Suspend Limit

Let’s assume that there is a recurring pattern of executing similar queries every 75 seconds, with an average query execution time of 10 seconds, and the warehouse has been set to AUTO_SUSPEND after 60 seconds. In such cases, below is what would happen.

0th sec → Query is initiated and starts the warehouse

10th sec → Query is executed successfully

70th sec → Warehouse auto suspends

75th sec → Query is initiated and starts the warehouse

85th sec → Query is executed successfully

145th sec → Warehouse auto suspends

150th sec → Query is initiated and starts the warehouse

160th sec → Query is executed successfully

220th sec → Warehouse auto suspends

And so on…

Here, if you notice the AUTO_SUSPEND of 60 sec is not benefiting us when we consider the cost vs performance factor. The total uptime of the warehouse in the above case is 210 sec. Total uptime, if AUTO_SUSPEND was disabled, would have been 220 sec. However, the benefit of disabling AUTO_SUSPEND in this given scenario would have been faster query processing time. Every time the warehouse is restarted, the data would be fetched from the remote disk to the local cache, and then the query would undergo processing.

But in the case of disabling AUTO_SUSPEND, since the queries were similar, it just had to process over the local disk cache and would result in faster query performance. Maybe a couple of seconds instead of 10 seconds. And if the same query was reissued and if there was no data change, the output would have been in milliseconds directly from the result cache.

So always consider the trade-off between saving credits by suspending a warehouse versus maintaining the cache of data from previous queries to help with performance.

Now that we understand the costs associated with warehouses in Snowflake, let’s take a look at how data storage in Snowflake affects the overall billing.

2. Data Storage Costs in Snowflake

Data storage costs are frequently overlooked in Snowflake as they are considered inexpensive. However, it is crucial to carefully consider the type of tables to create in Snowflake, taking into account the costs associated with time travel, fail-safe, data sharing, and cloning strategies. This understanding will aid in developing effective approaches for managing internal tables.

If you come from an RDBMS background, you may assume that executing “create table” in Snowflake will create a normal table. However, this is not the case. Instead, it will create a table with time travel enabled, which can lead to increased costs if such a table is not needed. Any inserts, updates, and deletes on these tables are accounted for data storage, and in case of frequent DML operations, the size of tables with time travel data can grow within no time.

Hence, if you have not decided on which kind of table you need to create, then always use:

“`
CREATE TRANSIENT TABLE
“`

This has to be communicated to developers as the general habit is always to use Create Table. For a normal table, if it is large in size along with high churn, the costs can grow exponentially. Note that not all tables need to have time travel features; hence, use CREATE TABLE wisely.

For example, suppose we have a table that is 200GB in size and receives frequent updates. This table is configured for time travel, specifically for 90 days, and it’s assumed that each record in the table undergoes an update at least 20 times during this period. After the 90-day period, the table will be moved to fail-safe storage by Snowflake, where it will be stored for 7 days.

Although the table is of size 0.2TB, the cost incurred is 32.2 TB when time travel is enabled.

Below is the case if the same table would have been a transient table with 0 days of time travel.

Although you can enable time travel for up to 90 days, choose the number of days that suits your needs. For example, in production, if I know that if any data-related issues exist and can be figured out and fixed within 7 days, then I’d choose the time travel days as 7 days. Even in the worst case, if the issue persists for more than 7 days, you can contact Snowflake support and get a copy of fail-safe data.

If you are going with the transient table approach for critical tables, then the best practice is always to keep a backup at periodic intervals.

Although the backup table also costs the same as the actual table, the total cost of both combined would always be much less than the table with time travel.

3. Use Cloning Instead of CTAS for Cost Savings

Cloning in Snowflake is a cost-effective alternative to CREATE TABLE AS SELECT (CTAS) for creating table copies.

3.1 Benefits of Cloning:

– Storage Efficiency: Cloning shares micro-partitions between the original and cloned table instead of duplicating data.

– Faster Debugging: Developers can create temporary clones for troubleshooting without additional storage costs.

– Schema & Database Level Cloning: Snowflake allows cloning at multiple levels to save costs on redundant data.

3.2 When to Use Cloning:

– Creating backups for debugging

– Duplicating tables for testing environments

– Minimizing storage costs while preserving historical snapshots

4. Cost-Efficient Data Sharing in Snowflake

Cloning is not possible across Snowflake accounts, but data sharing provides a cost-effective solution.

4.1 How Data Sharing Works:

Instead of replicating data, Snowflake shares metadata references, ensuring:

– No additional storage costs

– Real-time data availability across accounts

– Minimal performance overhead

The production database is in Snowflake Account A1 under Org O. The dev database is in Snowflake Account A2 under the same Org O. You need to test the dev data engineering pipeline with the same source tables as in the production account. Now, since cloning across databases for source tables is not possible, in such cases, we can opt for data sharing between accounts.

Let’s see how this works:

In the production account:

Use the AccountAdmin role.

“`
use role ACCOUNTADMIN;
“`

Creates a share object.

“`
create share AWSCUR_MAIN_NON_PROD;
“`

Grants necessary privileges to the share.

“`
grant usage on database PAYER_PROD to share AWSCUR_MAIN_NON_PROD;
grant usage on schema PAYER_PROD.PAYER_ANALYTICS_MASTER to share AWSCUR_MAIN_NON_PROD;
grant select on table PAYER_PROD.PAYER_ANALYTICS_MASTER.AWSCUR_MAIN to share AWSCUR_MAIN_NON_PROD;
“`

Add the account ID of the development Snowflake to the share.

“`
alter share AWSCUR_MAIN_NON_PROD add accounts=;
“`

In the development account:

Use the AccountAdmin role.

“`
use role ACCOUNTADMIN;
“`

Create a database out of the shared object.

“`
CREATE DATABASE PAYER_PROD FROM SHARE ..”AWSCUR_MAIN_NON_PROD”;
“`

Grant the database to respective roles in the development account.

“`
GRANT IMPORTED PRIVILEGES ON DATABASE PAYER_PROD TO ROLE “”;
“`

Using data sharing instead of replication ensures cost efficiency in Snowflake by eliminating redundant storage.

Conclusion

Optimizing cost efficiency in Snowflake requires strategic decisions regarding warehouse configurations, data storage, and sharing mechanisms. By implementing Auto Suspend, transient tables, cloning, and data sharing, businesses can significantly reduce expenses while maintaining performance.

As Snowflake continues to evolve, understanding its cost structures and optimization techniques will become even more critical for organizations. By leveraging these best practices, businesses can achieve maximum efficiency in their cloud-based data operations.

For a deeper dive into Snowflake’s architecture, refer to our guide on Introduction To Snowflake: A Scalable Data Warehouse For Modern Analytics.

Do you like to read more educational content? Read our blogs at Cloudastra Technologies or contact us for business enquiry at Cloudastra Contact Us

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top