Introducing Chaos Genius for Databricks Cost Optimization

Join the waitlist

HOW TO: Master Databricks Notebook Features (2024)

Databricks is a powerful platform for performing various tasks related to data engineering, data science, and ML workloads. It provides a unified space called Databricks Workspace—a one-stop solution for data teams to collaboratively build end-to-end analytics solutions, all within the platform itself. The workspace contains a variety of objects, including clusters, notebooks, libraries, and models, to streamline the data lifecycle. In particular, Databricks Notebook is a key component that enables you to integrate code, visualizations, and documentation within an interactive canvas. Notebooks are where all the magic happens—you can perform tasks ranging from exploration to modeling to orchestration collaboratively using languages like Python, R, Scala, and SQL directly inside notebooks. You can schedule notebooks to deploy reproducible workflows with just a few clicks, integrate ‘em with version control systems, simplify collaboration—and much more!

In this article, we will cover everything you need to know about Databricks notebooks, and we will walk you through a step-by-step guide to using notebooks for data analysis. This will cover aspects like the notebook interface, creating and managing notebooks, writing and executing code, working with outputs, automating with jobs, and more. On top of that, we will also dive into 7 powerful tricks to master notebooks.

What Are Notebooks in Databricks?

Databricks Notebook is an interactive coding canvas within the Databricks platform where you can write, execute, and collaborate on code. Notebooks allow you to easily visualize, document, and share your data analysis process in an organized manner.

Databricks Notebook Logo
Databricks Notebook Logo 

Here are some of the notable features of Databricks notebooks:

  1. Cell-based structure: Databricks notebooks consist of individual code cells that can contain SQL, Python, Scala, or R code.
  2. Interactive Execution, Cell by Cell: Each cell can be run independently to view live outputs such as text, plots, or tables.
  3. Notebook interface: Databricks notebooks provide features like collapsible headings, a toolbar, and sidebar navigation for effective organization.
  4. Collaboration: Multiple users can collaborate and co-author notebooks in real time using comments and mentions.
  5. Code modularization: Reusable snippets can be created using the %run magic command or standalone Repos.
  6. Scaling Notebook: You can attach a Databricks notebook to clusters that provide the compute resources for running your code.
  7. Automation: Databricks notebooks can easily be automated using the Databricks jobs feature. You can schedule your notebooks to run periodically or trigger them on demand or by events.
  8. Output visualization: Databricks notebooks provide flexible options to view outputs as text, tables, or plots.
  9. Version history: Every change in Databricks notebooks is tracked, allowing the restoration of previous states.
  10. Formatting: Databricks notebooks provide syntax highlighting options for code blocks and markdown support for documentation.

…and more!

These are just a glimpse of what you can do with Databricks notebooks. They are more than just code execution environments; they are powerful tools that help you unlock the full potential of your data.

Step-by-Step Guide on Using Databricks Notebooks for Comprehensive Data Analysis

Now that we have covered the basics of Databricks notebooks, without further ado, let's jump right in and discuss how to leverage notebooks step-by-step for end-to-end data analysis.

Step 1—Getting Familiar With Databricks Notebook Interface and Controls

Upon logging into your Databricks workspace, you will see a "+ New" button located on the left sidebar navigation bar of the Databricks workspace.

Creating a new Databricks notebook in the Databricks workspace
Creating a new Databricks notebook in the Databricks workspace

All you have to do is click on that button, and a new menu will appear that should display options to create objects. In that menu, you should see an option called "Notebook". Click on that, and you will get redirected to the new Databricks notebook.

Opening Databricks notebook
Opening Databricks notebook 

As you can see, now you will have access to the Databricks notebook interface, where you can interact with your notebook, write and run code, view outputs, and access various notebook features and settings.

The Databricks notebook interface consists of the following components:

1) Notebook Toolbar

The toolbar at the top contains menus, icons, and buttons to manage and edit the notebook.

Databricks notebook Toolbar
Databricks notebook Toolbar
a) Notebook Name

At the very top of the Databricks notebook interface, you will see a randomly generated name with a timestamp and date. You can change the notebook name by double-clicking it.

Renaming Databricks notebook name
Renaming Databricks notebook name
b) Language dropdown

Next to the notebook name, there is a dropdown button to change the default language used in that particular notebook.

Setting default language for Databricks notebook
Setting default language for Databricks notebook
c) File Menu

Right below the notebook name, you will see a “File” menu. This menu provides shortcut actions to share the notebook, schedule it, create a new notebook, import, clone, rename, move to trash, or delete it. Also, users can upload external data, export the notebook, view recent activity, check version history, clear version history, and change the default language of the notebook.

File menu option - Databricks Notebook
File menu option - Databricks Notebook
d) Edit Menu

Next to the File menu is the “Edit” menu. This menu allows you to undo/redo actions, cut/copy/paste, delete cells, insert cells, move cells, add widgets, format cells/code/notebook, and clear cell outputs. Keyboard shortcuts are listed for each action.

Edit menu option - Databricks Notebook
Edit menu option - Databricks Notebook
e) View Menu

Next to the Edit menu is the “View” menu. Here you can configure output formatting, open a web terminal, view logs, toggle line numbers, command numbers, table of contents, collapsible headings, centered layout, and switch between light and dark themes. You can also change the code editor theme.

View menu option - Databricks Notebook
View menu option - Databricks Notebook
f) Run Menu

The “Run” menu provides shortcuts to run selected cells, run all cells, cancel running cells, and clear cell outputs.

Run menu option - Databricks Notebook
Run menu option - Databricks Notebook
g) Help Menu

Next to the Run menu is the “Help” menu. This menu lists all keyboard shortcuts and provides links to submit feedback, get support, request features—and more.

Help menu option - Databricks Notebook
Help menu option - Databricks Notebook
h) Last Edit

Click the "Last edit was..." text to view previous versions of the Databricks notebook.

Clicking Last edit message to view previous versions of the Databricks notebook
Clicking Last edit message to view previous versions of the Databricks notebook
i) New UI

Next to the Last edit ..., there is an option to toggle between the new UI and the original UI. To do this, click on the dropdown, and you will find the option to switch to the new cells UI. Toggle it on and reload the entire page. The new cell UI for the notebook provides an updated look and feel, improved performance, and new features.

Toggling the new cell UI in Databricks Notebook
Toggling the new cell UI in Databricks Notebook
j) Run All/Interrupt

Run all cells or stop execution. Button name changes depending on the notebook state.

Running all cells - Databricks notebook
Running all cells - Databricks notebook
Stopping cells execution - Databricks notebook
Stopping cells execution - Databricks notebook
k) Compute

Next to “Run all”, click on “Connect” to open the compute selector and attach the notebook to a cluster or SQL warehouse. It displays the attached resource.

Attaching compute cluster to Databricks notebook
Attaching compute cluster to Databricks notebook
l) Schedule

Next to “Connect”, click on “Schedule” to open a dialog to schedule notebook execution.

Scheduling Databricks notebook
Scheduling Databricks notebook
m) Sharing

Finally, there is an option to share the notebook as well. For that, there is an option next to “Schedule” called “Share”. If you click it, it opens a permissions dialog, and then you can share your notebook with anyone.

Sharing Databricks notebooks
Sharing Databricks notebooks

2) Right Sidebar Actions

There are several actions available from the Databricks notebook’s right sidebar:

a) Notebook Comments

The notebook comments icon opens a panel on the right sidebar where you can add and view comments on the notebook.

Databricks notebooks comments
Databricks notebooks comments
b) MLflow Experiment

The MLflow experiment icon appears if the notebook is part of an MLflow experiment for tracking machine learning models. Clicking this icon opens the experiment tracking UI. You can view run details, metrics, parameters, tags, artifacts, and more.

Mlflow Experiments - Databricks Notebook
Mlflow Experiments - Databricks Notebook
c) Notebook Version History

This opens the notebook version history panel, allowing you to view previous versions of the notebook, compare changes between versions, and restore earlier versions.

Databricks notebook version history
Databricks notebook version history
d) Variable Explorer

The variable explorer shows all defined variables and their values for the Python kernel. It allows inspecting and modifying variables dynamically from the sidebar. This is available in Databricks Runtime 12.1 and above for Python notebooks.

Variable explorer - Databricks Notebook
Variable explorer - Databricks Notebook
e) Python Environment Panel

This panel shows all of the installed Python libraries and versions available to the notebook kernel. It includes libraries installed on the cluster, those installed in the notebook environment specifically, and libraries included with the Databricks Runtime like Spark, NumPy, Pandas, etc.

Python environment panel - Databricks Notebook
Python environment panel - Databricks Notebook

3) Notebook Cells

Notebooks contain code cells and markdown cells:

  • Code Cells: Contain executable code in Python, R, Scala, SQL, etc.
Databricks notebook code cells
Databricks notebook code cells
  • Markdown Cells: Contain text, images, and media rendered from Markdown formatting. It is used for documentation purposes.
Databricks Notebook markdown cells
Databricks Notebook markdown cells
Databricks notebook markdown cells
Databricks notebook markdown cells

You can add, remove, and rearrange cells as necessary. Also, you can execute a single cell, a group of cells, or the entire notebook at once.

Note: Each notebook cell has a maximum capacity of 10MB, and the output generated by a notebook cell is limited to 20MB.

4) Cell Actions Menu

Cells are organized within notebooks, and there are several options for configuring cells. The cell actions menu located in the top-right of each cell provides options to cut, copy, paste, delete, run, and manage cells.

In the original UI, the menu can be accessed by clicking the down arrow icon in the top right corner of the cell. It lets you cut, copy, run, minimize, hide code/results, or delete a cell.

Databricks notebook cell action menu (original UI)
Databricks notebook cell action menu (original UI)

From this menu, you can also run code cells:

Databricks notebook cell action menu (original UI)
Databricks notebook cell action menu (original UI)

In the new UI, the menu is accessed by clicking the three-dot icon (Kebab icon). It provides similar options along with additional features like adding a cell title, hiding code/results, collapsing the cell, etc.

Databricks notebook cell action menu (new UI)
Databricks notebook cell action menu (new UI)

Beyond the cell actions menu, there are various keyboard shortcuts, display options, and features for managing cells. New cells can be created by hovering over existing cells and clicking the "+" icon or language buttons for code or markdown cells. The cell language or type can be changed by selecting from the button in the top-right. Cells can be selected individually or as a group using the shift+up/down arrow keys.

5) Running Notebook Cells

Running cells is a core part of notebooks. Code cells are executed by clicking the run icon or down arrow in the top-left. After running, elapsed time and performance metrics are visible. Any values assigned in code cells become variables that can be accessed in subsequent cells. Cell outputs like matrices, plots, or tables appear below the code.

 Running Databricks Notebook Cells
Running Databricks Notebook Cells

6) Notebooks and Cell Customization

The display of notebooks and cells can be easily customized. Options under the View menu allow toggling line numbers, command numbers, sidebar layout, light/dark theme, and formatting cells. Standard, results-only, or side-by-side views define how code and outputs are rendered.

Databricks notebook and cell customization
Databricks notebook and cell customization

7) Cell Language

Each cell has a language associated with it to specify if it contains code or Markdown. The language can be changed using the button next to the cell number. This button provides an easy way to toggle a cell between codes. Remember that a proper language setting is required for cells to render and execute as intended.

Selecting cell language - Databricks notebooks
Selecting cell language - Databricks notebooks

8) Create and Manage Cells

Notebooks consist of cells that act as building blocks. New cells can be added using the "+" icon in the original UI and the “Code” or “Text” to create a code or Markdown cell, respectively in the new UI.

Creating cells in Original UI - Databricks notebooks
Creating cells in Original UI - Databricks notebooks
Creating cells in New UI - Databricks notebooks
Creating cells in New UI - Databricks notebooks

You can cut, copy, and paste cells in Databricks notebooks using the cell menu, keyboard shortcuts, or edit menu.

To cut or copy, select the cell menu or use keyboard shortcuts (Ctrl/Cmd+X to cut, Ctrl/Cmd+C to copy). Then paste (Ctrl/Cmd+V) below the current cell.

To select multiple cells, click a cell and use the Shift+Up/Down arrows. Or use Edit ➤ Select All Cells (Ctrl/Cmd+A). You can then cut, copy, paste, or delete the selected cells together.

You can also perform undo actions with Ctrl/Cmd+Z.

9) Databricks Assistant

Databricks Assistant is an AI tool that provides help and suggestions for code in cells. To activate it, click the assistant icon in the top right corner of a code cell (next to the language selector). This opens a text box where you can type a text prompt.

Databricks assistant - Databricks notebook
Databricks assistant - Databricks notebook

10) Hide and Show Cell Content

The content of cells can be hidden or collapsed for brevity using options in the cell menu. This includes hiding codes or results separately.

Hiding and showing cell content - Databricks notebooks
Hiding and showing cell content - Databricks notebooks

Expanding collapsed cells makes the content visible again. This feature helps focus on specific parts while hiding unnecessary details from the viewer.

11) Compute Resources

You have the choice to execute a notebook either on a Databricks cluster or, for SQL commands specifically, you can opt to utilize a SQL warehouse—a specialized computing resource optimized for SQL analytics.

a) Attaching notebooks to clusters

To attach a notebook to a cluster, you need the CAN ATTACH TO CLUSTER permission on that cluster.

Attaching a notebook to a cluster grants any user with CAN RUN permission on the notebook access to the cluster.

To attach a notebook to a cluster:

  • Click the compute selector in the notebook toolbar and select a cluster from the dropdown menu. This shows recently used clusters and running clusters.
Attaching clusters to Databricks notebooks
Attaching clusters to Databricks notebooks
  • To see all available clusters, click “More” and select a cluster from the full list.
Viewing all the available clusters - Databricks notebooks
Viewing all the available clusters - Databricks notebooks
  • You can also create a new cluster by selecting Create new resource...
Note: The attached notebooks have the Spark variables sc, sqlContext, and spark pre-defined. Do not re-declare these to avoid inconsistent behavior.
b) Using notebooks with SQL warehouses

When attached to a SQL warehouse, notebooks can only run SQL and Markdown cells. Running other languages, like Python, will cause errors.

SQL queries run on a warehouse appear in its query history. Users can view query profiles from the notebook by clicking the elapsed time.

Attaching to a SQL warehouse requires a Pro or Serverless tier. You need access to the workspace and warehouse.

To attach to a SQL warehouse:

  • Click the compute selector and select a warehouse from the dropdown menu or click the More… option.
  • In the dialog, select SQL Warehouse and choose a warehouse, then click Attach.
Attaching a SQL warehouse - Databricks notebooks
Attaching a SQL warehouse - Databricks notebooks

Limitations when using SQL warehouses:

  • Idle timeout after 8 hours
  • Max results size of 10,000 rows or 2MB
  • No support for notebook parameters or widgets

12) Detaching Clusters From Databricks notebooks

To detach a notebook, click the compute selector and hover over the attached resource. Select Detach from the side menu.

You can also detach from the cluster details page.

Detaching removes the execution context and clears computed variables.

Detaching Clusters From Databricks Notebook

Now, let's move on to the next step.

Step 2—Creating and Managing Databricks Notebooks

In this section, we will learn how to actually create and manage Databricks notebooks.

1) Creating Databricks notebooks

You can initiate the creation of a new notebook by clicking the "+" button and selecting the Notebook option in your Databricks workspace.

Creating and Managing Databricks notebook
Creating and Managing Databricks notebook

Alternatively,

You can create a notebook by navigating to your workspace, clicking on the “Add” dropdown menu, and then selecting “Notebook to initiate the creation of a new one.

Creating and Managing Databricks notebooks
Creating and Managing Databricks notebooks

You can then specify the name and language (Python, R, SQL, and Scala) of your notebook. Note that you can also change the language later by using the magic commands.

To create a new notebook in any folder (for example, in the Shared folder), follow these steps:

  • On the left sidebar, click the Workspace icon.
  • Right-click on the name of the folder where you want to create the notebook and select Create ➤ Notebook. Databricks will open a new, blank notebook in the selected folder.
  • Enter the name of your notebook, and select the language and the cluster for your notebook.

2) Opening an existing notebook

You can open an existing notebook by clicking on the notebook name on the left sidebar. You can also hover over the notebook name to see the notebook path, which is the location of the notebook in the workspace or the repos.

3) Deleting a notebook

You can delete a notebook by following these steps:

  • On the left sidebar, click the Workspace icon.
  • Right-click on the name of the notebook that you want to delete, and select Move to Trash. Databricks will ask you to confirm your action.
Deleting Databricks Notebook
Deleting Databricks Notebook
  • Click “Confirm and move to Trash” to remove the notebook from the workspace or the repos. Databricks will move the notebook to the Trash.
Deleting Databricks Notebook
Deleting Databricks Notebook
  • To restore or permanently delete the notebook from the trash, go to the Trash page in the Databricks workspace and select the appropriate option. You can either permanently delete it, restore it, or empty the trash.
Deleting Databricks Notebook
Deleting Databricks Notebook

4) Copying the notebook path or URL

You can copy the notebook path or URL without opening the notebook. To do so, right-click the notebook and select Copy Path or Copy URL from the menu.

Copying the notebook path or URL - Databricks notebooks
Copying the notebook path or URL - Databricks notebooks

5) Renaming Databricks notebook

You can rename a notebook by following these steps:

  • Open the notebook that you want to rename.
  • Click on the notebook name located at the top left corner of the notebook, and enter the new name of the notebook.
Renaming Databricks notebook
Renaming Databricks notebook
  • You can also click on File ➤ Rename and enter the new name of the notebook.
Renaming Databricks notebook
Renaming Databricks notebook

6) Controlling access to a notebook

If your Databricks account has the Premium plan or above, you can use workspace access control to control who has access to a notebook. You can assign different roles and permissions to different users or groups, such as view, edit, run, or manage.

7) Configuring editor settings

You can configure the editor settings of your notebook to customize various aspects of your notebook editor, such as the font size, the line numbers, the word wrap, the auto-completion, the auto-indentation, the syntax highlighting, the code formatting, the bracket matching, the whitespace visibility—and more.

To configure the editor settings of your notebook, follow these steps:

  • Click on your username located at the top right corner of the notebook, and select User Settings from the drop-down menu.
Configuring editor settings - Databricks Notebooks
Configuring editor settings - Databricks Notebooks
  • On the Settings sidebar, select Developer.
Configuring editor settings - Databricks Notebooks
Configuring editor settings - Databricks Notebooks
  • On the Developer page, you can see and change the editor settings of your notebook.
Configuring editor settings - Databricks Notebooks
Configuring editor settings - Databricks Notebooks

8) Viewing notebooks attached to a cluster

You can view the notebooks that are attached to a cluster by following these steps:

  • Open the notebook that is attached to the cluster that you want to view.
  • Click on the Cluster icon on the top right corner of the notebook, and select View notebooks attached to this cluster from the menu. You can also click on the Notebooks tab on the right sidebar and select View Notebooks attached to this cluster from the pop-up menu.
  • On the Notebooks page, you can see the list of notebooks that are attached to the same cluster as your notebook, and access their outputs and logs. You can also detach or attach your notebook to a different cluster from the same page.
Viewing notebooks attached to a cluster - Databricks notebooks
Viewing notebooks attached to a cluster - Databricks notebooks

Step 3—Writing Code in Databricks notebooks

Databricks notebooks allow you to write code (and queries) using different programming languages. In this section, we will dive deep into how you can write and execute code in Databricks notebooks. But first, you need to open the notebook; for that, you can follow the earlier step. Once that notebook is opened, you can start coding.

1) Supported languages—Python, R, SQL, and Scala

Databricks notebooks support four languages: Python, R, SQL, and Scala. You can use any of these languages in your notebooks and switch between them using the magic commands. Magic commands are special commands that start with a % symbol and allow you to perform various tasks, such as changing the language, running external scripts, executing shell commands, accessing file systems—and more.

To change the language of a cell, you can use the following magic commands:

  • %python: to use Python
  • %r: to use R
  • %SQL: to use SQL
  • %scala: to use Scala

For example, if you want to use Python in a cell, you can write:

%python
# Python cell
print("Hello, There!")
Using Magic command to run Python Databricks notebook
Using Magic command to run Python Databricks notebook
Note: By default, cells use the notebook's default language.

2) Running code cells and selected code

You can run code cells and selected code in your notebooks and see the output below the cell. You can run each cell individually or run the entire notebook at once. You can also run selected code within a cell without running the whole cell.

To run a cell, you can either click on the run icon on the right side of the cell, or directly press Shift + Enter on your keyboard, or use the Run menu on the toolbar and select "Run Cell".

Running code cells and selected code - Databricks Notebook
Running code cells and selected code - Databricks Notebook

To run the entire cell in that particular notebook, you can either click on the "Run All Above" or “Run All Below” icon in the top right corner of the cell or use the Run menu on the toolbar and select "Run All".

3) Browsing Available Datasets

To browse available datasets in the Databricks notebook, use the schema browser panel to explore tables, views, and files available in the notebook. Click the notebook Catalog icon (or press Ctrl + Alt + C) on the left to open the schema browser.

Browsing data in Databricks notebook
Browsing data in Databricks notebook

There are two options that you will see: one is “All”, and the other is “For You”. The “All” option shows you all the data, whereas the “For You” section displays tables you've recently used or marked as favorites.

To view actions for an item, hover over or click on its name to open the kebab menu. For tables, the available actions are as follows:

  • Copy table path: Copies the path to the table.
  • Open in Catalog Explorer: Opens the table in a new Catalog Explorer tab.
  • Add to favorites: Adds the table to your favorites list.
  • Preview in new cell: Automatically creates a cell to display the table data.
Table actions - Databricks notebook
Table actions - Databricks notebook

For catalogs and schemas, you can copy the path or open it in Catalog Explorer.

To insert a name into a cell, place your cursor where you want to insert it, hover over the name in the browser, and click the double arrow icon.

4) Keyboard Shortcuts

To see the list of available keyboard shortcuts in Databricks, you can either click “H” on the keyboard directly, or you can head over to the Help menu and then select "Keyboard shortcut" to open it. Here is the list of keyboard shortcuts that Databricks provides:

Databricks notebook Keyboard shortcuts
Databricks notebook Keyboard shortcuts

5) Find and Replace Code/Text

To find/replace code or text in a notebook, go to Edit ➤ Find and Replace. Matching text is highlighted. Using this tool, you can:

  • Click Replace to replace the current match.
  • Click Replace All to replace all matches in the notebook.
  • Use the Prev and Next buttons to navigate between matches.
  • Press Shift+Enter or Enter to go to the previous or next match.
  • Press Escape or click the delete icon to close the find/replace tool.
Finding and Replacing Code/Text - Databricks Notebook
Finding and Replacing Code/Text - Databricks Notebook

6) Observing Variables in Variable Explorer

For Python, R, and Scala, you can directly view notebook variables and their values/types in the variable explorer panel. To do this, first, click the variable explorer icon in the right sidebar to open it. It will display all currently defined variables in the notebook. Variable values update automatically when you run cells, and you can type in the search box to filter the variable list.

Observing Variables in Variable Explorer - Databricks Notebook
Observing Variables in Variable Explorer - Databricks Notebook

7) Format Python (or SQL) Code

You can format your Python and SQL code cells in your notebooks, to make them more readable and consistent.

Formatting Python Code

The notebook must be attached to a cluster with the black and tokenize-rt libraries installed. These are preinstalled on Databricks Runtime 11.2 and above.

For older runtimes, you must install black==22.3.0 and tokenize-rt==4.2.1 from PyPI in the notebook or on the cluster.

How to Format Cells?

You need the CAN EDIT permission on a notebook to format it.

To format code:

  • Single-cell: Use the keyboard shortcut Cmd/Ctrl+Shift+F, or select Format SQL/Python in the cell menu.
Format Python (or SQL) Code - Databricks notebook
Format Python (or SQL) Code - Databricks notebook
  • Multiple cells: Select the cells, then go to Edit ➤ Format Cells.
  • All cells: Go to Edit ➤ Format Notebook to format all Python and SQL cells.

8) Tracking/Accessing Version History

Databricks notebooks maintain a version history, allowing you to view, restore, and delete previous snapshots. You can also add comments to specific versions.

To access it, click the version history icon in the right sidebar or go to File Version History.

Tracking/Accessing version history - Databricks Notebook
Tracking/Accessing version history - Databricks Notebook
Adding Comments

To add a comment to the most recent version, follow these steps: First, select the latest version entry. Next, click on "Save Now". A dialog will appear, prompting you to input your comment text. Once you've entered your comment, click on "Save" to finalize and append your comment to the latest version.

Adding Comments to version history - Databricks notebook
Adding Comments to version history - Databricks notebook
Restoring a Version

To revert to a previous version, follow these steps: Start by clicking on the version entry that you wish to restore. Subsequently, click on "Restore This Version" to initiate the restoration process. Confirm the action by verifying the restoration of the snapshot. As a result, the chosen version will now be set as the most recent version.

Restoring Version History - Databricks notebook
Restoring Version History - Databricks notebook
Delete a Version

To permanently remove a particular version, follow these steps: First, click on the version entry you intend to delete. Then, locate and click on the trash 🗑️ icon. Confirm the deletion by acknowledging the prompt that appears. As a result, the selected version will be permanently removed from the version history.

Deleting Version History - Databricks notebook
Deleting Version History - Databricks notebook
Deleting Version History - Databricks notebook
Deleting Version History - Databricks notebook
Clearing All Versions

To erase the entire version history, follow these steps: Navigate to the "File" menu and select "Clear Version History". Confirm it when prompted.

Clearing All Version History - Databricks notebook
Clearing All Version History - Databricks notebook
Note: The version history cannot be recovered once cleared.

Step 4—Working with Outputs

After running cells in a notebook, you will see outputs, and your notebook will retain its state. Now, let's delve in and see how the output format will appear. Also, let's explore how to manage these outputs, handle the state, utilize download options, and adjust settings related to notebook results.

After running cells in a notebook, you will see outputs, and your notebook will retain its state. Now, let's delve in and see how the output format will appear. Also, let's explore how to manage these outputs, handle the state, utilize download options, and adjust settings related to notebook results.

Clearing Databricks Notebook Outputs/State

As you run cells in a notebook, the runtime state accumulates, including variables, functions, imported libraries, and cell outputs. At times, you may want to clear this state to start with a clean environment. Databricks notebooks provide options to clear state and outputs.

  • Clear All Cell Outputs option clears only the outputs generated by each cell while retaining notebook state like imported libraries and variable definitions. This can be useful for sharing the notebook logic without visualizations or outputs.
  • Clear State option clears all notebook runtime state, including function definitions, data variables, and imported libraries. This resets the environment but retains cell outputs.
  • Clear both state and outputs, select the Clear State and Outputs option. This will remove all notebook state and all cell outputs, providing a completely fresh runtime environment.
  • Clear State and Run All options will clear state and re-run all cells from top to bottom. This re-executes the entire notebook with a fresh state.
Clearing Databricks Notebook Outputs/State
Clearing Databricks Notebook Outputs/State

Displaying Cell Outputs and Results

By default, Databricks notebooks limit the amount of data returned to the client to avoid performance issues or browser crashes. For tables, a maximum of 10,000 rows or 2MB of data is returned, whichever limit is reached first. For text results, the limit is 50,000 characters.

To increase the text result limit, you (as an admin) can configure the Spark parameter spark.databricks.driver.maxReplOutputLength (supported only on Databricks Runtime 12.1 and above).

Accessing SQL Results in Python

You can load data using SQL and explore it using Python. In a Databricks Python notebook, table results from a SQL language cell are automatically made available as a Python DataFrame.

New Cell Result Table (Public Preview)

Databricks has released a preview of a new interactive cell result table. To enable this, click the “New result table” toggle in the upper right of any table output.

New Cell Result Table - Databricks Notebook
New Cell Result Table - Databricks Notebook

This new table helps to improve the notebook output experience with enhanced interactivity and control. As this feature is in preview, interfaces may change before general availability.

Downloading Notebook Results

By default, users can download CSV exports of any tabular output in a notebook. This allows full access to result data instead of just the sampled displayed output.

To download, click the down arrow icon in the table output header and select “Download all rows”. This will export the entire table contents to a file named export.csv on your local file system.

Downloading Databricks Notebook Results
Downloading Databricks Notebook Results

Downloading output can be disabled by admins if needed. This may be relevant for notebooks exposed publicly or containing sensitive data.

Note that users who can view a shared notebook can also download results unless downloading is explicitly disabled.

Multiple Outputs per Python Cell

Unlike earlier versions, modern Python notebooks in Databricks allow multiple outputs in a single cell.

For example, you can display both a plot and a table in one cell:

from sklearn.datasets import load_diabetes
import pandas as pd 
import matplotlib.pyplot as plt
# Load diabetes data
diabetes_data = load_diabetes()
# Create dataframe from data
df = pd.DataFrame(diabetes_data.data, columns=diabetes_data.feature_names)
# Display dataframe 
display(df.head())
# Create scatter plot btn  two features
plt.scatter(df['bmi'], df['s1'])
# Add axis labels
plt.xlabel('BMI')
plt.ylabel('S1')
plt.show()
Multiple Outputs per Python Cell - Databricks notebook
Multiple Outputs per Python Cell - Databricks notebook

Step 5—Automating with Databricks notebooks

You can automate and schedule Databricks notebooks to run as jobs on a regular basis. Jobs can be triggered manually or on a schedule.

To schedule a notebook job:

  • In the notebook, click the "Schedule" button in the top right corner.
  • If no jobs exist for the notebook, the "Schedule Notebook" dialog will appear. If jobs already exist, the "Jobs List" dialog will appear instead. Click "Add a Schedule" to open the "Schedule Notebook" dialog.
Scheduling Databricks notebook job
Scheduling Databricks notebook job
  • Enter a name for the job. The default name is the notebook name.
  • Select "Manual" to trigger runs manually, or "Scheduled" to set a schedule. If scheduled, choose the frequency, time, and time zone.
  • Choose the compute resource to run the job. By default, it runs on the cluster the notebook is attached to or a new job cluster if you have permission.
  • Optionally enter parameters to pass to the job as key-value pairs. Use dynamic value references to pass limited dynamic values.
  • Optionally set up email alerts for job events.
  • Click "Submit" to create the scheduled job.
Scheduling Databricks notebook job
Scheduling Databricks notebook job

To execute a job manually, click the "Schedule" button, and then select "Run Now" for the job.

Scheduling Databricks notebook job manually
Scheduling Databricks notebook job manually

To manage jobs, click the "Schedule" button, and then click the vertical ellipsis next to a specific job. From the options presented, you can choose actions such as editing, cloning, viewing details, pausing, resuming, or deleting the job.

Managing Databricks notebook job manually
Managing Databricks notebook job manually

Step 6—Collaborating on notebooks

Databricks notebooks support collaboration features like sharing, permissions, and comments. These features require Databricks Premium or above.

Share Notebooks

To manage sharing settings for a notebook, click the "Share" button at the top of the notebook to access the Sharing dialog. From there, select the users or groups you want to share the notebook with and choose an appropriate permission level.

Sharing Databricks notebook
Sharing Databricks notebook

Manage Permissions with Folders

You can control notebook permissions by placing them into folders. Notebooks within a folder inherit all the permission settings assigned to that particular folder.

Commenting on Code

To manage comments in your workspace, you can use the Comments icon, which serves as a toggle for the comments sidebar. If you click this icon, you can easily access and navigate the comments section. To add comments to specific code sections, highlight the code and click the comment icon.

Commenting on code - Databricks notebook
Commenting on code - Databricks notebook

You also have the option to edit, delete, or reply to comments directly from the sidebar.

Step 7—Importing and Exporting Databricks Notebooks

Databricks supports importing and exporting notebooks in various formats. Here are the list of supported Databricks notebooks formats:

  • Source File (.scala, .py, .sql, .r)
  • HTML (.html)
  • DBC Archive
  • IPython Notebook (.ipynb)
  • RMarkdown (.Rmd)

How to import Databricks notebook?

To import notebooks in Databricks, the process is fairly simple and straightforward. First, click on "Workspace", then choose "Import" in the folder you want. When prompted, provide a URL or select a file in a supported format as the source. After that, the notebooks will be imported into the folder you've selected.

Importing Databricks notebook
Importing Databricks notebook
Importing Databricks notebook
Importing Databricks notebook

How to export Databricks notebook?

To export your Databricks notebooks, follow these simple steps:

Click on "Export" in the toolbar of a specific notebook if you want to export that notebook alone. If you need to export multiple notebooks, right-click on a folder and select "Export". When prompted, choose the desired format, such as Source, HTML, or DBC Archive.

Exporting Databricks notebook
Exporting Databricks notebook
Exporting Databricks notebook
Exporting Databricks notebook

Step 8—Testing Databricks notebooks Effectively

Thorough testing is key before deploying notebooks to production. Databricks provides the following testing capabilities:

  • Unit testing: Libraries like PyTest can be used to write unit tests for Python code cells.
  • Isolated environments: Notebooks can be tested on different clusters through attach/detach.
  • Parameterization: Notebooks can be executed with different inputs using widgets.
  • Scheduled tests: Automated testing jobs can be configured via Cron schedules.

Testing early and often while incrementally building notebooks enables you to identify bugs early. It also prevents production failures.

By now, you should have a good grasp of the basics of working with Databricks notebooks. We've gone through the notebook creation process from start to finish, covering essential features. Now that we've covered the basics, let's dive into the next section where we will discuss 7 tips and tricks to effectively utilize Databricks notebooks.

7 Must-Know Tricks to Master Databricks Notebook

Let's discuss some power user tips and tricks to boost your productivity with Databricks notebooks:

Trick 1—Using Databricks Magic Commands

Databricks magic commands are special commands that start with a % symbol, and allow you to perform various tasks, such as changing the language, running external scripts, executing shell commands, accessing file systems, and more. You can use magic commands in any cell of your notebook, and they will override the default language of the cell.

Some of the commonly used magic commands are:

  • %run: Run another notebook within current notebook
  • %lsmagic: List all available magic commands
  • %sh: Run Linux shell commands
  • %fs: Interact with DBFS file system
  • %sql: Run SQL query on cluster
  • %scala: Write Scala code
  • %python: Run Python code
  • %md: Render Markdown text
  • %r: Run R code
  • %pip: Install Python libraries
  • %matplotlib: Render Matplotlib charts inline
  • %who: Display variables defined in notebook
  • %env: Show Spark cluster environment

These are some of the magic commands that you can use in your notebooks, but there are many more. You can use the %lsmagic command to see the full list, and experiment with them in your notebooks.

Databricks magic command - Databricks notebook
Databricks magic command - Databricks notebook

Trick 2—Schedule and Parameterize Databricks notebooks

Databricks notebooks can be scheduled and parameterized, which means that you can automate your notebooks to run periodically or trigger them on demand or by events. You can also pass parameters to your notebooks and use widgets to customize the input and output of your notebooks.

Scheduling your notebook is a straightforward process. We have covered all the steps above, so feel free to check ‘em out.

Now, let's move on to the next step and learn how to parameterize the notebook.

To parameterize your notebook, you can use the following steps:

  • Open the notebook you want to parameterize.
  • Click on the Edit menu in the top right corner of the notebook and select "Add widgets" from the menu.
Adding widgets to the Databricks notebook
Adding widgets to the Databricks notebook
  • In the "Add Widget" popup, specify the parameter name, widget label, widget type, and default parameter value. Choose from different types of widgets like text, number, dropdown, combo-box, multi-select, etc.
  • Click on the "Add widget" button and wait for Databricks to create your widget. You can then see your widget at the top of your notebook and use it to input or output values in your notebook.
Parameterize Databricks notebook
Parameterize Databricks notebook
  • You can also use the dbutils.widgets library to create and access widgets in your notebook. For example, dbutils.widgets.text("name", "Chaos genius") will create a text widget with the name “name” and the default value “Chaos Genius”, and dbutils.widgets.get("name") will get the value of the widget “name”.
Parameterize Databricks notebook
Parameterize Databricks notebook

Trick 3—Collaborate on Databricks notebooks

As we have already covered above, Databricks notebooks can be shared and collaborated on with other users using various features such as sharing, Git integration, comments, and co-authoring. Here are some tips to enhance collaboration in Databricks notebooks:

  • Use fine-grained controls to securely share notebooks with other users or groups.
  • Integrate notebooks with your Git-based version control system.
  • Leave comments and @ mention colleagues for discussions.
  • Track edit history and seamlessly roll back changes.

Trick 4—Upload Local Data Directly to Databricks Notebook

Databricks notebooks can access data from various sources, such as the Databricks file system (DBFS), the Databricks data lake, the Databricks SQL warehouse, and external data sources. However, sometimes you might want to upload your local data directly to your notebook without using any intermediate storage. You can do this by using the upload data feature in your notebook. This feature helps you transfer local data to your workspace. While the default target directory is set to /shared_uploads/<user-name>, you have the flexibility to choose a different destination and utilize the code provided in the Upload File dialog to access your files.

Uploading Local Data Directly to Databricks Notebook
Uploading Local Data Directly to Databricks Notebook

Trick 5—Utilizing Handy UI Enhancements

Databricks notebooks have some handy UI enhancements that can make your notebook experience more convenient and enjoyable. Some of these UI enhancements are:

Undo Delete Option to Recover Deleted Cells

The undo delete feature allows you to restore a cell that you have deleted by mistake. You can use the undo delete feature by heading over to Edit ➤ Undo delete cells, or by pressing Ctrl + Z on your keyboard.

Undoing delete to recover deleted cells - Databricks notebook
Undoing delete to recover deleted cells - Databricks notebook

Integrated Web-Based Terminal

The web terminal feature allows you to access a terminal session in your cluster and execute shell commands from your notebook. You can access the web terminal feature by clicking on the attached cluster and selecting the web terminal. You don't have to close your notebook.

Integrated web-based terminal in Databricks Notebook
Integrated web-based terminal in Databricks Notebook
Integrated web-based terminal in Databricks Notebook
Integrated web-based terminal in Databricks Notebook

Markdown Editing

Markdown editing feature enables you to edit and preview markdown text in your notebook, which is quite helpful for writing documentation for your code. You can use the markdown editing feature by selecting the markdown language located at the right-hand corner of the cell or by using the %md magic command in your cell. Then, you can write markdown text in your cell and use the preview button to see how it will look in your notebook.

Markdown editing in Databricks notebook
Markdown editing in Databricks notebook

Code Formatting

Exhausted from copying your code and pasting it elsewhere for formatting? Isn't it tiresome? What if there were a way to format your code without relying on any third-party tools, software, or websites? Well, Databricks has an inbuilt feature for that. The code formatting feature allows you to format your code in your notebook according to the language style guide. You can also directly use the formatting feature by pressing Ctrl + Alt + F on your keyboard or by using the Edit menu on the toolbar and selecting Format Cell.

Code formatting in Databricks notebook
Code formatting in Databricks notebook

These examples merely scratch the surface of the useful UI enhancements that Databricks provides. There are many more new features and enhancements, with updates regularly introduced.

Trick 6—Searching notebooks Effectively

Databricks notebooks can be searched effectively by using the global search or the search within a folder or workspace.

To utilize the global search feature in Databricks, follow these steps. First, navigate to the left sidebar and click on the search icon. In the search box that appears, input the desired keyword or phrase.

Searching Databricks notebooks
Searching Databricks notebooks

Also, you can use the advanced search option to fine-tune your search by applying filters such as language, cluster, date—and more.

Advanced search Databricks notebook
Advanced search Databricks notebook

After entering your search criteria, click on the search button and wait for the results.

You can also search for text within a specific folder or only search for notebook names.

Trick 7—Restore Previously Saved Revisions

All edits made to a notebook are automatically version-controlled.

You can view the revision history timeline, preview old versions, restore an older version, or permanently delete old revisions no longer needed.

This enables an effortless rollback to previous points in time.

Restoring Previously Saved Revisions - Databricks notebook
Restoring Previously Saved Revisions - Databricks notebook

Refer to the above section for a detailed, step-by-step guide on how you can effectively use this feature.

Conclusion

And that’s a wrap! There you have it. Databricks notebook is extremely useful if you are working with data in Databricks. Without it, you will not be able to do much. Databricks notebooks provide a flexible and interactive canvas for data teams to streamline their end-to-end analytics workflows.

In this article, we have covered:

  • What Are Notebooks in Databricks?
  • Step-by-Step Guide on Using Databricks Notebooks for Comprehensive Data Analysis
  • 7 Must-Know Tricks to Master Databricks Notebook

…and so much more!

FAQs

What is a Databricks notebook?

Databricks Notebook is an interactive coding canvas within the Databricks platform where you can write, execute, and collaborate on code.

What languages are supported in Databricks notebooks?

Databricks notebooks support four main programming languages - Python, R, SQL, and Scala. Cells can be switched between these languages using magic commands.

How do I open an existing notebook?

To open an existing notebook, click on the notebook name listed on the left sidebar. You can also hover over the name to view the notebook path.

How do I run code cells in a notebook?

You can run individual code cells by clicking the run icon on the right side of the cell. To run the entire notebook, click "Run All" at the top of the page. The keyboard shortcut is Shift + Enter.

How do I delete a notebook?

Right-click the notebook name and select "Move to Trash". Then confirm the deletion. Notebooks moved to the trash can be restored or permanently deleted.

How do I change the default language of a notebook?

You can change the default language of an existing notebook by clicking on the language button at the top of any cell and selecting the desired option.

How do I browse available datasets in a notebook?

Use the catalog icon on the left sidebar to open the schema browser panel. Here you can explore tables, files, and other data available in your workspace.

How do I format code in a notebook?

Select the code cells and either use the Format drop-down in the cell menu or press Cmd/Ctrl + Shift + F. This will auto-format Python/SQL code based on language conventions.

How do I share a notebook with other users?

Click the share button at the top and specify users/groups to share with along with their permission levels in the sharing dialog box.

How do I schedule a notebook to run as a job?

Use the Schedule button and configure the frequency, time, parameters, and compute resources for automated execution.

How do I view previous versions of a notebook?

Click the version history icon on the right sidebar to see the revision timeline. Older versions can be restored.

How do I import an external notebook?

Click Import from the Workspace menu and select a file or provide a URL for the source notebook.

How do I export a notebook?

Click Export from the notebook toolbar or right-click a notebook/folder and select Export to download it in a file format.

How do I attach a notebook to a cluster?

Click the Connect button and select a cluster from the dropdown menu or click More to see all available options.

How do I parameterize a notebook using widgets?

You can add widgets from the Edit menu to define parameters.

How do I add comments to a notebook?

Use the comments icon on the right sidebar to leave and view comments on specific code sections.

How do I clear outputs/state in a notebook?

You can use the Clear All Outputs or Clear All options under the Edit menu to clear outputs or reset the state respectively.

How do I search for text within notebooks?

You can use the search icon on the left sidebar and enter keywords to search globally or within a specific folder.

Can you run notebooks on an automated schedule?

Yes, Databricks notebooks can be scheduled using Cron syntax via the Jobs UI.

Where are notebook execution results stored?

Results are stored on the worker nodes of the attached cluster.

Can you co-author notebooks in real time?

Yes, Databricks enables multiple concurrent users to edit the same notebook.

What formats can you export a notebook in?

HTML, Jupyter Notebook (.ipynb), R Markdown (.Rmd), Python (.py), etc.

Tags

Pramit Marattha

Technical Content Lead

Pramit is a Technical Content Lead at Chaos Genius.

People who are also involved

“Chaos Genius has been a game-changer for our DataOps at NetApp. Thanks to the precise recommendations, intuitive interface and predictive capabilities, we were able to lower our Snowflake costs by 28%, yielding us a 20X ROI

Chaos Genius has given us a much better understanding of what's driving up our data-cloud bill. It's user-friendly, pays for itself quickly, and monitors costs daily while instantly alerting us to any usage anomalies.

Anju Mohan

Director, IT

Simon Esprit

Chief Technology Officer

Join today to get upto
30% Snowflake
savings

Join today to get upto 30% Snowflake savings

Unlock Snowflake Savings Join waitlist
Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.