Introducing Chaos Genius for Databricks Cost Optimization

Join the waitlist

HOW TO: Use Snowflake LISTAGG to Concatenate Multiple Input Values (2024)

Snowflake offers a variety of functions to perform analysis and manipulation. One of these functions is the Snowflake LISTAGG function, a column aggregation function that concatenates all values within a column to a list, using a defined delimiter to separate the elements. Snowflake LISTAGG function can be useful for creating comma-separated lists, combining values based on conditions, ordering elements in the list—and so much more. All of these capabilities assist in transforming/formatting your data to make it easier to read and understand.

In this article, we will provide an in-depth overview of Snowflake LISTAGG function. We will cover everything from syntax, arguments, data types, practical examples, limitations, best practices—and so much more!!

What Does LISTAGG() Do in Snowflake?

Snowflake LISTAGG function in Snowflake concatenates multiple string values from input rows into a single string. The concatenated values are separated by a specified <delimiter> such as a comma or space.

Essentially, Snowflake LISTAGG function takes all the data that may be spread across multiple rows and “aggregates” it into a single string output. This is extremely useful for condensing large data sets into a single value that can be easily exported, processed, or even analyzed.

Some example use cases are:

  • Combining values from multiple rows into a single cell in reports
  • Aggregating long text inputs
  • Creating delimited lists of IDs, names—or other attributes from table rows

So, if you use Snowflake LISTAGG function carefully, you can save significant time in data manipulation and gain better insights into aggregated string data.

How does Snowflake LISTAGG function Work?

As we've covered already, Snowflake LISTAGG function is a column aggregation function that concatenates all values within a column into a list, using a defined delimiter to separate the elements. But, it's important to note that the Snowflake LISTAGG function can be used either as an aggregate function or a window function, depending on the context.

The basic syntax of Snowflake LISTAGG function is as follows:

Aggregate function:

LISTAGG(<expr1>, <delimiter> [, <expr2>]) [[WITHIN GROUP] (ORDER BY <orderby_clause>)]

Window function:

LISTAGG( <expr1> [, <delimiter> ] )
    [ WITHIN GROUP ( <orderby_clause> ) ]
    OVER ( [ PARTITION BY <expr2> ] )

Let’s break down Snowflake LISTAGG() syntax:

  • <expr1>: The column or expression containing the values to concatenate
  • <delimiter>: The delimiter used to separate the string values (e.g. comma, space). It can be any string literal or expression that evaluates to a string. If omitted, the default delimiter is a comma (,).
  • <expr2>: Optional secondary sort expression for ordering
  • <orderby_clause>: Optional statement to define ordering before concatenation. It can be any valid ORDER BY clause, such as ORDER BY <expr2> [ASC|DESC]. If omitted, the order of the values is arbitrary.

When Snowflake LISTAGG() is called, it processes rows in groups defined by the query (similar to SUM() or COUNT()) and returns a single string per group.

Here is one simple example of how to use Snowflake LISTAGG function:

First, let’s create a sample table called STUDENTS and insert some dummy data:

CREATE TABLE STUDENTS (
    ID NUMBER,
    FIRST_NAME STRING, 
    LAST_NAME STRING
);

INSERT INTO STUDENTS 
VALUES (1, 'chaos', 'genius'),
       (2, 'elon', 'musk'),
       (3, 'larry', 'ellision'),
       (4, 'bill', 'gates');
Snowflake LISTAGG example

As you can see, this table has student ID, first name, and last name.

Now lets use Snowflake LISTAGG function to concatenate the last names together, separated by commas:

SELECT LISTAGG(LAST_NAME, ', ') WITHIN GROUP (ORDER BY LAST_NAME) AS NAMES
FROM STUDENTS;
Snowflake LISTAGG example
Using Snowflake LISTAGG function to concatenate last names together
Using Snowflake LISTAGG function to concatenate last names together

As you can see, this groups all the rows and sorts alphabetically by last name before concatenating.

To break this down:

  • LAST_NAME is the column value I want to concatenate
  • (,) provides a comma and space <delimiter> between values
  • ORDER BY LAST_NAME defines the sort order before concatenating alphabetically.

The final output is a string with all the last names combined together in a defined order. Take a look at the last part of this section to learn more about the output format of the Snowflake LISTAGG function.

Supported Data Types

Snowflake LISTAGG function supports the following data types as input:

1) STRING, VARCHAR, CHAR, TEXT

These are string data types that can be directly concatenated by the function.

Example:

The example of the string data type is exactly the one we mentioned above ☝️.

2) NUMERIC, INTEGER, FLOAT, DOUBLE, DECIMAL

These are numeric data types that can be implicitly converted to strings and concatenated by Snowflake LISTAGG function.

Example:

Here is one simple example:

First, let’s create a sample table called numbers and insert some dummy data:

CREATE TABLE numbers (
    id INTEGER, 
    int_col INTEGER,
    double_col DOUBLE
);

INSERT INTO numbers VALUES 
    (1, 10, 1.5),
    (2, 25, 2.75);
Snowflake LISTAGG example

Now let's use Snowflake LISTAGG function to concatenate integer column together, separated by commas:

SELECT LISTAGG(int_col::STRING, ',') AS concatenated  
FROM numbers;
Snowflake LISTAGG example
Using Snowflake LISTAGG function to concatenate integer column together
Using Snowflake LISTAGG function to concatenate integer column together

Again, let's use Snowflake LISTAGG function to concatenate double column together, separated by commas:

SELECT LISTAGG(double_col::STRING, ',') AS concatenated
FROM numbers;
Snowflake LISTAGG example
Using Snowflake LISTAGG function to concatenate double column together
Using Snowflake LISTAGG function to concatenate double column together

3) DATE, TIME, TIMESTAMP

These are date and time data types that can be implicitly converted to strings and concatenated by Snowflake LISTAGG function. The format of the output string depends on the session parameters and the data type.

Example:

Here is one simple example:

First, let’s create a sample table called dates and insert some dummy data:

CREATE TABLE dates (
    id INTEGER,
    date_col DATE,
    time_col TIME
);

INSERT INTO dates VALUES
    (1, '2023-12-15', '11:00:00'), 
    (2, '2023-12-16', '12:30:00');
Snowflake LISTAGG example

Now let's use Snowflake LISTAGG function to concatenate dates column together, separated by commas:

SELECT LISTAGG(date_col::STRING, ',') AS concatenated
FROM dates;
Snowflake LISTAGG example
Using Snowflake LISTAGG function to concatenate dates column together
Using Snowflake LISTAGG function to concatenate dates column together

Again, let's use Snowflake LISTAGG function to concatenate time column together, separated by commas:

Using Snowflake LISTAGG function to concatenate time column together
Using Snowflake LISTAGG function to concatenate time column together

4) BOOLEAN

This is a logical data type that can be implicitly converted to strings and concatenated by the function. The output string is either TRUE or FALSE.

Example:

Here is one simple example:

First, let’s create a sample table called bools and insert some dummy data:

CREATE TABLE bools (
    id INTEGER, 
    boolean_col BOOLEAN
);

INSERT INTO bools VALUES
   (1, TRUE),
   (2, FALSE);
Snowflake LISTAGG example

Now let's use Snowflake LISTAGG function to concatenate boolean column together, separated by commas:

SELECT LISTAGG(boolean_col::STRING, ',') AS concatenated
FROM bools;
Snowflake LISTAGG example
Using Snowflake LISTAGG function to concatenate boolean column together
Using Snowflake LISTAGG function to concatenate boolean column together

Remember this: the delimiter parameter must be a single-character string. Common delimiters are commas (‘,’), spaces (‘ ‘), hyphens (‘-’), and line breaks ('\n' or ''), but any string of characters is valid and allowed.

Snowflake LISTAGG function does not support the following data types as input:

  • GEOSPATIAL(GEOMETRY, GEOGRAPHY): These are geospatial data types that cannot be concatenated by the Snowflake LISTAGG function (Error will be raised if these data types are used as input).
  • BINARY, VARBINARY: These are binary data types that cannot be concatenated by the Snowflake LISTAGG function (Error will be raised if these data types are used as input).
  • VARIANT, ARRAY, OBJECT: These are complex data types that cannot be concatenated by the Snowflake LISTAGG function (Error will be raised if these data types are used as input).

Output of Snowflake LISTAGG function

Snowflake LISTAGG function returns a string that includes all of the non-NULL input values, separated by the <delimiter>. The output string has a maximum size of 16,777,216 bytes or 16 MB. If the output string exceeds this limit, an error will be raised. To avoid this error, you can use the DISTINCT option to remove duplicate values, or the WITHIN GROUP option to limit the number of values in the list.

How Snowflake LISTAGG function Handles Null or Empty Values?

Snowflake LISTAGG function handles NULL and empty values as follows:

  • If the input is empty, meaning that there are no rows to aggregate, the function returns an empty string ('').
  • If all input expressions evaluate to NULL, meaning that there are no non-NULL values to concatenate, the function also returns an empty string ('').
  • If some but not all input expressions evaluate to NULL, meaning that there are some non-NULL values to concatenate, the function returns a string that contains all non-NULL values and excludes the NULL values. The delimiter is only inserted between the non-NULL values, and not before or after them.

What Are the Restrictions of Snowflake Listagg When Used as Window Function?

When used as a window function, the LISTAGG function does not support the following features:

  • ORDER BY sub-clause in the OVER() clause: Snowflake LISTAGG function cannot use an ORDER BY sub-clause within the OVER() clause to specify the order of the values in the list. The order of the values is determined by the partition and the order of the rows in the result set.
  • Window frames: Snowflake LISTAGG function cannot use window frames to define the subset of rows to be aggregated within each partition. The function aggregates all the rows within each partition.

Practical Examples and Use Cases of Snowflake LISTAGG function

Now, let's delve into a practical example of the Snowflake LISTAGG function. In this section, we will demonstrate how to use the Snowflake LISTAGG function in various scenarios with different parameters.

To demo the usage of the Snowflake LISTAGG function, first, we will create a table called orders table and insert some sample data into it. We will use this sample table and data throughout this example.

CREATE TABLE orders_table (  
  order_id INT,  
  customer_id INT,
  customer_name VARCHAR,
  product_id VARCHAR,  
  quantity INT,  
  price INT
);
   
INSERT INTO orders_table VALUES
  (1, 1001, 'Chaos Genius', 'P1001', 1, 100),
  (2, 1001, 'Chaos Genius', 'P1002', 5, 50),
  (3, 1002, 'Elon Musk', 'P1003', 3, 75),
  (4, 1002, 'Elon Musk', 'P1004', 7, 275),
  (5, 1003, 'Larry', 'P1005', 9, 65),
  (6, 1003, 'Larry', 'P1006', 12, 35);
Snowflake LISTAGG example
Creating a table called orders table and inserting some sample data into it
Creating a table called orders table and inserting some sample data into it

Example 1 — Concatenating data with a hyphen "-" and comma ","

In this example, we will use Snowflake LISTAGG function to concatenate the product IDs and the quantities for each customer, using a hyphen (-) to separate the product ID and the quantity, and a comma (,) to separate the pairs.

To do so, we can use the following SQL query:

SELECT 
  customer_name,
  LISTAGG(product_id || '-' || quantity, ', ') WITHIN GROUP (ORDER BY order_id) AS products
FROM orders_table
GROUP BY customer_name;
Snowflake LISTAGG example

As you can see, we use the concatenation operator (||) to combine the product ID and the quantity with a hyphen, and then we pass this expression as the first argument to the Snowflake LISTAGG function. We also specify the comma and the space as the delimiter for the Snowflake LISTAGG function. Finally, we then group the rows by the customer name and select the customer name and the products as the output columns.

The output of this query is:

Concatenating data using Snowflake LISTAGG with a hyphen "-" and comma ","
Concatenating data using Snowflake LISTAGG with a hyphen "-" and comma ","

Example 2 — Concatenating data with empty space

Now, in this example, we will use Snowflake LISTAGG function to concatenate the product IDs separated by a space for each customer_id group.

SELECT
  customer_id,
  LISTAGG(product_id, ' ') AS products
FROM orders_table
GROUP BY customer_id;
Snowflake LISTAGG example

As you can see, this query concatenates product_ids separated by a space for each customer_id group. Unlike previous example, no ORDER BY is included so it preserves the original order of rows. The output is a space-separated list of all products purchased by each unique customer ID.

The output of this query is:

Concatenating data using Snowflake LISTAGG with empty space
Concatenating data using Snowflake LISTAGG with empty space

Example 3 — Concatenating order details in specific order

Now, in this example, we will use Snowflake LISTAGG function to concatenate the order IDs for each customer, using a comma (,) delimiter. But, we want to order the order IDs by the price in descending order, so that the most expensive order comes first.

To do so, we can use the following SQL query:

SELECT
  customer_name, LISTAGG(order_id, ', ') WITHIN GROUP (ORDER BY price DESC) AS order_list
FROM orders_table
GROUP BY customer_name;
Snowflake LISTAGG example

As you can see, in this query, we use WITHIN GROUP clause to specify the order of the values in the list. We then use the ORDER BY sub-clause to order the order IDs by the price in descending order. The rest of the query is the same as in the previous examples.

The output of this query is:

Concatenating order details in specific order - Snowflake LISTAGG
Concatenating order details in specific order

Example 4 — Concatenating values based on conditions

Finally, in this example we will demonstrate conditional concatenation capability of Snowflake LISTAGG function.

As you can see below, this query demonstrates conditional concatenation by checking the purchase quantity and classifying products as either "HIGH" or "LOW" product quantity. These labels are concatenated along with the product_id by customer name, which allows segmentation of products into high/low volume for analysis. The final output string groups each customer's products by purchase frequency.

SELECT
  customer_name,
  LISTAGG(CASE 
    WHEN quantity > 2 THEN product_id || ':HIGH'  
    ELSE product_id || ':LOW'
  END, ',') AS product_status
FROM orders_table
GROUP BY customer_name;
Snowflake LISTAGG example

The output of this query is:

Concatenating values based on conditions - Snowflke LISTAGG
Concatenating values based on conditions

Advanced Tips and Techniques with Snowflake LISTAGG()

Snowflake LISTAGG function offers robust data aggregation capabilities. If you aim to maximize its potential, here are some advanced techniques to follow. These tips will lift your concatenation skills to the next level.

1) Use Snowflake DISTINCT

Snowflake DISTINCT allows you to remove duplicate values from the list, so that each value appears only once, which can be useful when you want to create a unique list of values from a column that contains repeated values.

2) Use WITHIN GROUP Option

WITHIN GROUP option allows you to order the values in the list, using a valid ORDER BY clause, which can be extremely useful when you want to sort the values in the list by some criteria, such as alphabetical order, numerical order—or custom order.

3) Combine Snowflake LISTAGG() with Other Aggregate Functions

Snowflake LISTAGG function can be combined with other functions in Snowflake to perform more complex operations and transformations on the data.

For example, you can use the following functions with Snowflake LISTAGG function:

SELECT 
  MAX(LISTAGG(---)) WITHIN GROUP (ORDER BY --- DESC)
  FROM <table_name>
  GROUP BY ---;
Snowflake LISTAGG example
  • Snowflake CONCAT or || operator to concatenate the values with other strings or expressions before or after the list.
SELECT CONCAT('(', LISTAGG(----, ', '), ')') AS ----
FROM <table>
GROUP BY ----;
Snowflake LISTAGG example
  • SPLIT or SPLIT_PART functions to split the output string into an array or a single element based on the delimiter.
SELECT ----, SPLIT(LISTAGG(----, ', '), ', ') AS ----
FROM <table>
GROUP BY ----;
Snowflake LISTAGG example

4) Use TABLESAMPLE to test Snowflake LISTAGG()

When dealing with extremely large tables, sampling data with TABLESAMPLE can help test and refine your Snowflake LISTAGG formulas faster. Let’s try this one out in one of our previous examples.

SELECT 
  customer_name,
  LISTAGG(product_id || '-' || quantity, ', ') WITHIN GROUP (ORDER BY order_id) AS products
FROM orders_table TABLESAMPLE (2 ROWS)
GROUP BY customer_name;
Snowflake LISTAGG example
Use TABLESAMPLE to test Snowflake LISTAGG
Use TABLESAMPLE to test Snowflake LISTAGG

As you can see, this samples just 2 rows to test the concatenation.

Limitations and Best Practices of Snowflake LISTAGG()

Now, in this section, we will discuss some of the limitations and best practices of the Snowflake LISTAGG function, and how to overcome or avoid them.

Limitations of Snowflake LISTAGG()

Snowflake LISTAGG function has the following limitations that you should be aware of:

1) Max size of the output string

The output string of the LISTAGG function has a maximum size of 16,777,216 bytes or 16 MB. If the output string exceeds this limit, an error will be raised and the query will fail. This can often happen when you have a massive number of values to concatenate, or when the values are extremly long.

2) NULL impact

Snowflake LISTAGG function handles NULL values differently depending on the context. If the input is empty, or all input expressions evaluate to NULL, the output is an empty string. If some but not all input expressions evaluate to NULL, the output contains all non-NULL values and excludes the NULL values. To handle NULL values, you can use the COALESCE functions or the CASE expression to conditionally return a value or NULL.

3) Performance implication

Snowflake LISTAGG function can have a significant impact on the performance of your queries, especially when you use it on large tables or with complex expressions. It requires a lot of memory and CPU resources to process and concatenate the values—and it can cause data skew and spilling issues if the output string is too large or unevenly distributed. To improve the performance, follow the tips below:

  • Use DISTINCT
  • Use WITHIN GROUP
  • Use PARTITION BY clause
  • Use LIMIT clause
  • Use FILTER clause
  • Use OVER clause to use Snowflake LISTAGG function as a window function and avoid grouping the entire table.
  • Use TABLESAMPLE

Best Practices of Snowflake LISTAGG()

To use the function effectively and efficiently, you should follow some best practices, such as:

1) Use DISTINCT to remove duplicate values

DISTINCT option allows you to remove duplicate values from the list, so that each value appears only once. This can help you to create a unique list of values from a column that contains repeated values.

2) Use appropriate delimiters

Delimiter is the expression that specifies the separator to be used to separate the values in the list. It can be any string literal or expression that evaluates to a string. You should choose a delimiter that is suitable for your data and your purpose—and that does not conflict with the values in the list.

3) Expressions within the LISTAGG function must be convertible to string

Snowflake LISTAGG function accepts any data type that can be implicitly converted to a string, such as numeric, date, or boolean. But, if you use a data type that cannot be converted to a string, such as geospatial, binary, or variant, an error will be raised and the query will fail. To avoid this, you should always use a data type that can be converted to a string, or use an explicit conversion function, such as TO_VARCHAR, TO_DATE, or TO_BOOLEAN, to convert the values to strings before passing ‘em.

4) Document each and every query

Snowflake LISTAGG function can be used to create complex and dynamic queries that perform various operations and transformations on the data.But remember that, these queries can also be quite difficult to understand and maintain, especially when you use multiple parameters and expressions. To make your queries more readable and understandable, you should document your queries by adding comments, using descriptive names.

What is the difference between Snowflake ARRAY_AGG and Snowflake LISTAGG?

The ARRAY_AGG function also aggregates row data into a single output, but returns an array rather than string. Some differences:

Snowflake LISTAGG Snowflake ARRAY_AGG
Returns the concatenated input values, separated by the delimiter string. Returns the input values, pivoted into an ARRAY.
Syntax for Snowflake LISTAGG is: LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] ) [ WITHIN GROUP ( <orderby_clause> ) ] Syntax for Snowflake ARRAY_AGG is: ARRAY_AGG( [ DISTINCT ] <expr1> ) [ WITHIN GROUP ( <orderby_clause> ) ]
Output size limit is 16MB Output size limit is 16MB
NULL values are omitted from the output. NULL values are omitted from the output.
Delimiter is required, default is an empty string Delimiter is not required
Follows the collation of the input and the session. Collation is not applicable
Performance is very fast on large data sets Performance might get slower with more array overhead

As you can see, Snowflake LISTAGG outputs a simple string perfectly suited for concatenation tasks like reporting. ARRAY_AGG allows post-processing like indexing but adds more overhead.

Conclusion

And that’s a wrap! Snowflake LISTAGG function is extremely powerful if you use it carefully. It is an extremely versatile function that helps for concatenating values within a column to a list, using a defined delimiter to separate the elements. It can help you to transform and format your data in a way that is easier to read and understand—and to solve common problems and scenarios.

In this article, we covered:

  • What Does LISTAGG() Do in Snowflake?
  • How Snowflake LISTAGG function Works?
  • How Snowflake LISTAGG function Handles Null or Empty Values?
  • Practical Examples and Use Cases of Snowflake LISTAGG function
  • Advanced Tips and Techniques with Snowflake LISTAGG()
  • Limitations and Best Practices of Snowflake LISTAGG()
  • Difference between Snowflake ARRAY_AGG and Snowflake LISTAGG?

—and so much more!

Think of the Snowflake LISTAGG function as a glue stick that sticks the values together into one single string, using a separator/delimiter of your choice.


FAQs

What is the LISTAGG function in Snowflake?

Snowflake LISTAGG is an aggregate function that concatenates multiple string values from input rows into a single delimited string.

How does Snowfklake LISTAGG function work ?

Snowflake LISTAGG processes rows in groups based on the query and returns a single concatenated string per group. The ORDER BY clause sorts the rows before concatenating them.

What data types can be used as input to Snowflake LISTAGG?

LISTAGG supports STRING, VARCHAR, NUMERIC, DATE, TIME, TIMESTAMP and BOOLEAN data types.

What is the maximum size for Snowflake LISTAGG output?

Output string is limited to 16 MB by default.

How does Snowflake LISTAGG handle NULL values in input data?

NULL values are excluded from the final concatenated output string.

Can Snowflake LISTAGG remove duplicate values?

Yes, the DISTINCT keyword can eliminate duplicates in the concatenated list.

When should ARRAY_AGG be used over Snowflake LISTAGG?

If post-processing of output as an array is needed, ARRAY_AGG may be a better choice.

What are some common use cases for the Snowflake LISTAGG function?

Common uses include combining multiple rows of data into a single cell, aggregating text strings like comments/notes, and creating delimited IDs or names.

Can Snowflake LISTAGG concatenate values from multiple columns?

Yes, multiple columns can be concatenated by using the concatenation operator || between column names.

Can Snowflake LISTAGG output be ordered without affecting the query groups?

No, LISTAGG ordering always applies within the existing groups defined by the query.

Can window functions be used for Snowflake LISTAGG?

Yes, Snowflake LISTAGG can act as a window function using the OVER() clause.

Does Snowflake LISTAGG allow concatenation of geospatial data?

No, geospatial data types cannot be handled by LISTAGG.

Does Snowflake LISTAGG guarantee order without ORDER BY clause?

No, without ORDER BY the concatenate order should not be relied upon.

Pramit Marattha

Technical Content Lead

Pramit is a Technical Content Lead at Chaos Genius.

People who are also involved

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.