HOW TO: ADD COLUMN in Databricks Table

Data workflows are always changing, aren't they? What begins as a straightforward dataset often balloons in complexity as your needs shift. Databricks provides you solid tools for managing and tweaking your data structures. Databricks handles several types of data objects, with the main five being Catalog, Database, Table, View, and Function. Tables are really the heart of it, controlling access to your tabular data. Since data is always on the move, schema updates like adding new columns are pretty common. For instance, if new incoming data brings an extra field, you'll likely need to add a Column to a Databricks Table to store it.

In this article, you will learn how to add a column to a Databricks table using multiple approaches (SQL, Databricks PySpark), comparing techniques across different table types, and sharing practical recommendations to make your schema changes smooth and reliable. At the end, we cover best practices like schema evolution, validation, and rollback strategies.

Databricks Tables 101

So, what exactly is a Databricks table? A Databricks table is a structured collection of data, neatly organized into rows and columns. Databricks keeps track of these tables using a metastore, either the traditional Hive metastore or the more modern Unity Catalog metastore. Metastore holds all the metadata: the table's schema (what the columns are and their types), where the actual data files are stored, and other properties.

Types of Databricks Tables

It's good to know the different kinds of tables you might be working with, as it affects how you can modify them. When you're working in Databricks, tables generally fall into two main categories:

Databricks Delta Tables vs Non-Delta Tables

Databricks Delta Tables are the standard in Databricks. They are built on Apache Parquet files but include a transaction log. This log is a big deal; it provides ACID transactions (Atomicity, Consistency, Isolation, Durability), schema enforcement (which helps keep your data clean), and time travel (allowing you to query or restore older data versions). Delta Lake is the default and recommended format in Databricks because it makes data management much more reliable.

Non-Delta Tables include formats like Parquet, ORC, CSV, and JSON. These lack the advanced features of Delta Tables, such as built-in transaction logs or time travel.

Databricks Managed Table vs Databricks External Tables

Databricks offers two ways to manage table data and metadata:

Databricks Managed Tables	Unmanaged (External) Tables
Data is stored and managed within the Databricks File System (DBFS).	Data is stored in an external location, such as cloud object storage (e.g., Amazon S3, Azure Blob Storage) or a distributed file system (e.g., HDFS).
Databricks handles data lifecycle management, including data replication, backups, and optimizations.	Users are responsible for managing the data lifecycle, including backups and replication.
Deleting Databricks managed table also deletes the underlying data files.	Deleting Databricks external table does not delete the underlying data files.
Performance is optimized for the Databricks platform.	Performance depends on the characteristics of the external storage system.

Supported Table Formats in Databricks

Databricks is quite versatile and supports a range of table formats:

Delta Lake (the default and highly recommended)
Parquet
ORC
CSV
JSON
Avro
Text
Binary

The type and format of your table really dictate how you can add a Column to a Databricks Table. Databricks Delta tables offer the most flexibility for schema changes. For other formats, you might face more complex steps or even need to rewrite the entire table.

Save up to 50% on your Databricks spend in a few minutes!

Enter your work email

Databricks ALTER TABLE Basics

Databricks supports the ANSI SQL ALTER TABLE statement with extensions for Delta and other formats. You can use it to add or drop columns, rename columns, set default values, add or drop partitions, and more.

Here is the full syntax for Databricks ALTER TABLE:

ALTER TABLE table_name
    { RENAME TO clause |
      ADD COLUMN clause |
      ALTER COLUMN clause |
      DROP COLUMN clause |
      RENAME COLUMN clause |
      DEFAULT COLLATION clause |
      ADD CONSTRAINT clause |
      DROP CONSTRAINT clause |
      DROP FEATURE clause |
      ADD PARTITION clause |
      DROP PARTITION clause |
      PARTITION SET LOCATION clause |
      RENAME PARTITION clause |
      RECOVER PARTITIONS clause |
      SET { ROW FILTER clause } |
      DROP ROW FILTER |
      SET TBLPROPERTIES clause |
      UNSET TBLPROPERTIES clause |
      SET SERDE clause |
      SET LOCATION clause |
      SET OWNER TO clause |
      SET SERDE clause |
      SET TAGS clause |
      UNSET TAGS clause |
      CLUSTER BY clause |
      PREDICTIVE OPTIMIZATION clause}

Now that you know about Databricks ALTER TABLE, let's jump into how you can use this command to add a column to Databricks.

The basic syntax to add a Column to a Databricks Table using SQL is pretty straightforward:

ALTER TABLE table_name 
ADD COLUMN column_name data_type [options]

Or, to add multiple columns (for Delta Lake tables):

ALTER TABLE table_name
ADD COLUMNS (column_name1 data_type1 [COMMENT 'comment_string1'], column_name2 data_type2 [COMMENT 'comment_string2']);

Let's break down some of the useful options:

➥ COMMENT 'your_comment’— Adds a text description to the column.

➥ FIRST or AFTER existing_column_name — New columns are typically added to the end. FIRST places the column at the beginning, and AFTER puts it after a specified existing column.

➥ DEFAULT default_value (for column definitions) — This is where things get a bit different depending on your table type:

For Databricks Delta Tables: You cannot define a DEFAULT value directly within the Databricks ADD COLUMN statement itself for existing rows. When you add a column, existing rows will have NULL values for this new column. However, Delta Lake supports default values for new rows inserted after the column is added and the default is defined. To use this, you first enable a table feature:

ALTER TABLE your_delta_table_name
SET TBLPROPERTIES ('delta.feature.allowColumnDefaults' = 'supported');

Once that's enabled (it's an irreversible protocol upgrade for the table ), you can then set the default for future rows:

ALTER TABLE your_delta_table_name
ALTER COLUMN new_column_name SET DEFAULT your_default_value;

This default will not automatically backfill existing rows; they will remain NULL unless you explicitly UPDATE them.

For Non-Delta Tables (e.g: Parquet, CSV used directly): The ALTER TABLE ADD COLUMN command only updates the table's metadata. It doesn't rewrite data files. Existing rows will show NULL for the new column when queried. To populate these values, you'd typically rewrite the table.

Table Limitations and Some Important Notes

When adding a column to a Databricks table, remember:

Databricks Delta Table Specifics — New columns are always nullable at first. Default values only kick in for new rows and require enabling a feature.
- Non-Delta Table Realities — For formats like Parquet or CSV, ALTER TABLE ADD COLUMN is a metadata-only operation. Existing data files aren't changed. Queries will show NULL for the new column in existing rows. To populate these, you'll often read the table into a DataFrame, add the column with values, and rewrite the table.
JDBC Data Sources — You can't use ALTER TABLE ADD COLUMN for tables based on JDBC data sources. Schema changes must be made directly in the source systems.
Column Position for Existing Columns — You can position a new column with FIRST or AFTER during addition (mainly for Delta), but changing an existing column's position is a Delta-specific feature that may involve a table rewrite.

Now that you've got the basics of Databricks ALTER TABLE, let's get into the practical steps.

Step-By-Step Guide to Add a Column to Databricks Table

Prerequisite:

Before you jump in and start adding columns, a little prep can save you a lot of headaches. What do you need?

For tables managed by Unity Catalog (UC), you'll need the MODIFY permission on the table to add a column.
For non-UC tables (those in the older Hive metastore), you generally need to be the owner of the table.
If you're dealing with external tables, you'll also need write access to the underlying cloud storage location where the data files live.
Access to a Databricks workspace.
Basic knowledge of Databricks concepts like Databricks clusters, Databricks notebooks, and a bit of SQL or Databricks PySpark will be helpful.
Understanding of the table structure you're modifying.What's its current schema? What's its format (Delta, Parquet, ..etc.)?

🚨 Backup! Seriously, Backup Your Table First!

For any table that holds important data, creating a backup before you make schema changes is just plain smart. If something unexpected happens, you can roll back. How?

For a quick backup of any table, you can use Databricks CREATE TABLE AS SELECT (CTAS) statement:

CREATE TABLE table_name_backup
AS SELECT * FROM original_table_name;

If you're working with Delta tables, you have an even cooler option: CLONE.

a) Databricks Shallow Clone

Shallow Clone creates a copy of the metadata of the Delta table. It doesn't copy the data files themselves, so it's super fast and doesn't cost much in storage. It's great for quick experiments or development.

CREATE TABLE table_name_backup
SHALLOW CLONE original_delta_table_name;

b) Databricks Deep Clone

Deep Clone creates a full, independent copy of metadata and data files. It takes longer and uses more storage, but provides a complete backup:

CREATE TABLE table_name_backup
DEEP CLONE original_delta_table_name;

Choose what fits your needs. Now, let's get to it.

Let's Do This! Adding Columns in Databricks Step-by-Step

We'll explore a couple of primary ways to add a Column to a Databricks Table: good old SQL and the more programmatic approach with Databricks PySpark.

🔮 Method 1—Add Columns via Databricks SQL ALTER TABLE ADD COLUMN

This is often the most direct approach. Run these commands in a Databricks SQL editor (connected to a SQL Warehouse) or a notebook cell (attached to a cluster).

Step 1—Create a Databricks Cluster

First things first, log into your Databricks workspace and create a new cluster. In the Databricks UI, click Compute and then Create Compute.

Creating Databricks Cluster - Databricks ADD COLUMN

Give the cluster a name, pick a Databricks Runtime, and select the number and type of worker nodes. Finally, click Create compute.

Creating Databricks Cluster - Databricks ADD COLUMN

You will see that the cluster will spin up and show status Running when ready.

Step 2—Create and Attach a Databricks Notebook

With the Databricks cluster running, create a new Databricks notebook to run commands. In the sidebar, go to Workspace > your user folder (or the shared workspace), click Create > Notebook. Databricks Notebooks provide an interactive environment for running code and SQL queries. They're perfect for exploring data and testing schema changes.

Creating Databricks Notebook - Databricks ADD COLUMN

Name it something like Databricks_ADD_COLUMN. For language, choose Python or SQL (we'll use both).

Naming Databricks Notebook - Databricks ADD COLUMN

Tip: In Databricks Notebooks you can mix %sql and Python cells. You can use %sql at the top of a cell to run SQL commands, or use the Spark session in Python (spark.sql(...) or DataFrame operations).

After creating the Databricks notebook, attach it to the Databricks cluster you just created by selecting the cluster name in the top toolbar. You now have an interactive workspace to run Databricks PySpark and SQL commands.

Attaching Databricks Cluster to Databricks Notebook - Databricks ADD COLUMN

Step 3—Create Sample Databricks Tables

Let's create sample tables in different formats to demonstrate column addition.

Here we create one Delta Lake table, one Parquet-based table, and one standard (managed) table. By default, in Databricks, tables created by SQL or DataFrame APIs use Delta Lake as the storage format. Databricks Delta tables have ACID transactions and time-travel capability, which we’ll leverage later.

Step 3.1—Create a Delta Lake Table

Now let's start by creating Delta tables. To do so:

CREATE TABLE employees_delta (
  id INT,
  name STRING,
  department STRING
) USING DELTA;

INSERT INTO employees_delta VALUES 
  (1, 'Elon Musk', 'Engineering'),
  (2, 'Jeff Bezos', 'Marketing'),
  (3, 'Mark Zukerberg', 'Finance');

DESCRIBE TABLE employees_delta;

SELECT * FROM employees_delta;

Output:

Creating Delta Lake Table - Databricks ADD COLUMN

Step 3.2—Create a Parquet Table

Let's now create another table, a parquet table. To do so:

CREATE TABLE products_parquet (
  product_id INT,
  product_name STRING,
  category STRING
) USING PARQUET;

INSERT INTO products_parquet VALUES 
  (101, 'Laptop', 'Electronics'),
  (102, 'Desk', 'Furniture'),
  (103, 'Coffee Machine', 'Appliances');

DESCRIBE TABLE products_parquet;

SELECT * FROM products_parquet;

Output:

Creating Databricks Parquet Table - Databricks ADD COLUMN

Step 3.3—Create a Standard Databricks SQL Table

Finally, let's create a standard table in databricks. But note that by default it uses delta. To do so:

CREATE TABLE customers_standard (
  customer_id INT,
  customer_name STRING,
  region STRING
);

INSERT INTO customers_standard VALUES 
  (201, 'Microsoft', 'West'),
  (202, 'Apple', 'East'),
  (203, 'Google', 'Central');

DESCRIBE TABLE customers_standard;

SELECT * FROM customers_standard;

Output:

Creating a standard Databricks table - Databricks ADD COLUMN

Note: Creating tables with USING PARQUET explicitly creates a non-Delta Parquet table. If you omit the USING clause, it defaults to Delta.

Step 4—Add Columns via Databricks ALTER TABLE ADD COLUMNS

Now let's get straight into the core of this article. Let's insert some columns into our Databricks table:

Step 4.1—Example for Databricks Delta Table

Here is one example on the Delta table. Now, let's add a column called salary to our Delta table:

-- Add a salary column to the employees table Databricks
ALTER TABLE employees_delta ADD COLUMN salary DOUBLE;

DESCRIBE TABLE employees_delta;

SELECT * FROM employees_delta;

UPDATE employees_delta SET salary = 100000 WHERE id = 1;
UPDATE employees_delta SET salary = 90000 WHERE id = 2;
UPDATE employees_delta SET salary = 95000 WHERE id = 3;

SELECT * FROM employees_delta;

Output:

Adding Column to Databricks Delta Table - Databricks ADD COLUMN

To set a default value for new rows (Delta only):

ALTER TABLE employees_delta SET TBLPROPERTIES (
  'delta.feature.allowColumnDefaults' = 'supported'
);

ALTER TABLE employees_delta ALTER COLUMN salary SET DEFAULT 85000;

INSERT INTO employees_delta (id, name, department) VALUES (4, 'Larry Ellison', 'HR');

SELECT * FROM employees_delta;

If you try ALTER COLUMN ... SET DEFAULT without enabling allowColumnDefaults, you'll get an error. Make sure to mention:

SET TBLPROPERTIES ( 'delta.feature.allowColumnDefaults' = 'supported')

Or it will throw this error:

AnalysisException: [WRONG_COLUMN_DEFAULTS_FOR_DELTA_FEATURE_NOT_ENABLED] Failed to execute ALTER TABLE command because it assigned a column DEFAULT value, but the corresponding table feature was not enabled. Please retry the command again after executing ALTER TABLE tableName SET TBLPROPERTIES('delta.feature.allowColumnDefaults' = 'supported')."

Output:

Adding Column to Databricks Delta Table - Databricks ADD COLUMN

Step 4.2—Example for a Parquet Table

Adding a column to a Parquet table updates the table metadata:

Remember that trying to run an UPDATE on a Parquet table is not possible, UPDATE is only supported on Delta tables. Native Parquet tables in Spark (or Databricks) are immutable—they don’t support DML operations like UPDATE, DELETE, or MERGE.

To run DML like UPDATE, you must first convert the Parquet table to Delta.

CONVERT TO DELTA parquet.`dbfs:/user/hive/warehouse/products_parquet`;

CONVERT TO DELTA products_parquet;

ALTER TABLE products_parquet ADD COLUMN price DOUBLE;

DESCRIBE TABLE products_parquet;

SELECT * FROM products_parquet;

UPDATE products_parquet SET price = 1299 WHERE product_id = 101;
UPDATE products_parquet SET price = 250.99 WHERE product_id = 102;
UPDATE products_parquet SET price = 49.99 WHERE product_id = 103;

SELECT * FROM products_parquet;

Output:

Adding Column to Databricks Parquet Table - Databricks ADD COLUMN

The CONVERT TO DELTA command can take a table name or a path (e.g., CONVERT TO DELTA parquet.\/path/to/files`;`).

Step 4.3—Example for a Standard Table

Since standard tables in Databricks default to Delta format, adding a column works the same as with explicit Databricks Delta tables:

ALTER TABLE customers_standard ADD COLUMN email STRING;

DESCRIBE TABLE customers_standard;

SELECT * FROM customers_standard;

UPDATE customers_standard SET email = 'contact@chaosgenius.io' WHERE customer_id = 201;
UPDATE customers_standard SET email = 'info@chaosgenius.io' WHERE customer_id = 202;
UPDATE customers_standard SET email = 'support@chaosgenius.io' WHERE customer_id = 203;

SELECT * FROM customers_standard;

Output:

Adding Column to Databricks Standard Table - Databricks ADD COLUMN

🔮 Method 2—Add Columns Programmatically via Spark APIs

Now let's move on to the next method, where we will programmatically add a column in the Databricks table via Apache Spark APIs.

Option A—Using Databricks Spark SQL with Databricks PySpark

You can use PySpark to execute SQL commands programmatically:

spark.sql("""
CREATE TABLE IF NOT EXISTS inventory (
  item_id INT,
  item_name STRING,
  quantity INT
) USING DELTA
""")

spark.sql("""
INSERT INTO inventory VALUES 
  (301, 'Widget A', 100),
  (302, 'Widget B', 150),
  (303, 'Widget C', 75)
""")

display(spark.sql("DESCRIBE TABLE inventory"))

spark.sql("ALTER TABLE inventory ADD COLUMN last_updated TIMESTAMP")

display(spark.sql("DESCRIBE TABLE inventory"))

display(spark.sql("SELECT * FROM inventory"))

spark.sql("UPDATE inventory SET last_updated = current_timestamp()")

display(spark.sql("SELECT * FROM inventory"))

Output:

Adding columns programmatically using Databricks Spark SQL with Databricks PySpark

Option B—DataFrame Operations for Delta Tables (with Schema Evolution)

For Delta tables, you can read the table, add a column to the DataFrame, and write it back using mergeSchema. This is useful if the new column's values are derived.

from pyspark.sql.functions import lit

# Assuming 'employees_delta' table exists
df_employees = spark.read.table("employees_delta")

# Add a new column 'bonus_percentage' with a literal value
df_employees_with_bonus = df_employees.withColumn("bonus_percentage", lit(0.05))

# Overwrite the table, merging the schema to include the new column
df_employees_with_bonus.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable("employees_delta")

display(spark.sql("SELECT * FROM employees_delta"))

This overwrites the table but intelligently merges the new schema. Existing data in other columns is preserved.

Option C—Adding a Column to a Parquet Table (by Rewriting or Converting)

If you have a Parquet table and want to add a column by rewriting it as Parquet:

But before you proceed make sure your products_parquet table exists and is still Parquet. If you previously converted it to Delta, you might recreate it as Parquet for this example:

spark.sql("DROP TABLE IF EXISTS products_parquet_py")
data = [(101, 'Laptop', 'Electronics'), (102, 'Desk', 'Furniture'), (103, 'Coffee Machine', 'Appliances')]
schema = "product_id INT, product_name STRING, category STRING"
spark.createDataFrame(data, schema).write.format("parquet").saveAsTable("products_parquet_py") # or your original Parquet table name

Once you have done that, fire the code below:

from pyspark.sql.functions import lit
from pyspark.sql.types import IntegerType

df_products = spark.read.table("products_parquet_py")

# Add a new column 'stock_remaining'
df_products_new_schema = df_products.withColumn("stock_remaining", lit(None).cast(IntegerType()))

# Overwrite the Parquet table with the new schema and data
df_products_new_schema.write.format("parquet").mode("overwrite").saveAsTable("products_parquet_py")

display(spark.sql("DESCRIBE TABLE products_parquet_py"))
display(spark.sql("SELECT * FROM products_parquet_py"))

Note that stock_remaining will be NULL.

Alternatively, a common pattern is to convert Parquet to Delta to gain schema evolution benefits, as shown previously or using PySpark:

df_products_new_schema.write.format("delta").option("mergeSchema", "true").mode("overwrite").saveAsTable("products_parquet_delta_converted")

Best Practices for Adding Columns in Databricks

Successfully adding columns to Databricks tables involves more than just executing a command; it requires careful planning, adherence to best practices, and an understanding of how to troubleshoot potential issues.

1) Permissions and Privileges

Before adding columns, verify you have the necessary permissions:

For Unity Catalog tables, you need the MODIFY permission
For non-UC tables, you typically need to be the table owner
For external tables, you need write access to the underlying storage

2) Verification Steps

Always verify your changes after adding columns:

-- Check the updated schema
DESCRIBE TABLE table_name;

-- View sample data to confirm the new column exists
SELECT * FROM table_name LIMIT 10;

3) Use Delta Format When Possible

Databricks Delta tables offer the most flexibility for schema evolution:

Schema enforcement prevents accidental schema changes
Time travel allows rollback if needed
Transaction log provides an audit trail

4) Backup Before Major Changes

For critical tables, create a backup before making schema changes:

Create a backup using CTAS

CREATE TABLE table_name_backup AS SELECT * FROM table_name;

For Databricks Delta tables, use Databricks clone for efficiency

CREATE TABLE table_name_backup DEEP CLONE table_name;

5) Use Time Travel for Rollback

Databricks Delta tables support time travel, allowing you to access previous versions:

View the table history

DESCRIBE HISTORY table_name;

Access a previous version if needed

SELECT * FROM table_name VERSION AS OF 3;

6) Adding Multiple Columns

For efficiency, add multiple columns in a single operation when possible:

ALTER TABLE table_name ADD COLUMNS (
  column1 data_type1,
  column2 data_type2,
  column3 data_type3
);

7) Column Positioning

By default, new columns are added at the end of the table schema. You can specify the column position using FIRST or AFTER:

Add a column as the first column

ALTER TABLE table_name ADD COLUMN new_column data_type FIRST;

Add a column after a specific column

ALTER TABLE table_name ADD COLUMN new_column data_type AFTER existing_column;

8) Column Defaults and Non-Null Constraints

When adding a column, consider default values for new rows:

ALTER TABLE table_name ADD COLUMN new_column data_type;
ALTER TABLE table_name ALTER COLUMN new_column SET DEFAULT default_value;

Note that default values only apply to new rows, not existing ones.

9) Schema Evolution with DataFrames

When using DataFrames with Databricks Delta tables, enable schema evolution explicitly:

df.write.format("delta").option("mergeSchema", "true").mode("append").saveAsTable("table_name")

10) Document your code

Document schema changes for future reference:

ALTER TABLE table_name ADD COLUMN new_column data_type COMMENT 'Description of this column';

ALTER TABLE table_name ALTER COLUMN column_name COMMENT 'Updated description';

Want to take Chaos Genius for a spin?

It takes less than 5 minutes.

Enter your work email

Conclusion

And that’s a wrap! Adding columns to Databricks tables is generally straightforward, especially with Delta Lake offering the most flexibility. Remember, ALTER TABLE ADD COLUMN is your main SQL tool. Always back up production data before changes. Adding columns is usually a metadata change, but populating existing rows often requires an UPDATE (for Delta) or a table rewrite (for non-Delta, or when using DataFrames to derive values).

In this article, we have covered:

… and so much more!

FAQs

How do I add a column in a specific position?

Use the FIRST or AFTER keywords in your ALTER TABLE statement:

ALTER TABLE my_table ADD COLUMN new_col STRING FIRST;
ALTER TABLE my_table ADD COLUMN another_col INT AFTER existing_col;

What value do existing rows get when I add a column?

Existing rows will have NULL values in the newly added columns by default.

Can I add a column that cannot be NULL?

When adding a column to an existing table with data, the column must initially allow NULL values since existing rows need to have some value. After adding the column and populating it with data, you can add a NOT NULL constraint for Delta tables.

Is it easy to drop a column later?

Yes, you can drop columns using the ALTER TABLE command:

ALTER TABLE table_name DROP COLUMN column_name;

For Delta tables, this operation simply marks the column as dropped in the transaction log. Queries will no longer see the column, but the underlying data remains until the table is optimized.

How does adding columns affect table performance?

Adding columns is generally a metadata operation that doesn't immediately impact performance. However, consider these factors:

For Delta tables, adding many columns over time can increase the size of the transaction log
Wide tables (many columns) may have lower query performance if all columns are selected
Non-Delta formats may require table rewrites which can be resource-intensive

How can I see the history of schema changes?

Use DESCRIBE HISTORY table_name;. This shows metadata about each transaction, including operations like adding columns.