24 May 2025

Nifty Databricks Tricks

by Gordon Weakliem

Some nifty Databricks tricks I’ve learned over the last couple of years:

Convert any .py file to a notebook

Do this by by putting this comment at the first line:

# Databricks notebook source

Then you divide the notebook into cells with this marker:

# COMMAND ----------

This is really powerful when you need to debug a file in place. I generally have a separate file with all the transformations: DataFrame(s) in, DataFrame out. You can open the file in Databricks as a notebook and run your code in place. I generally have a COMMAND section at the end with some commented out test code to load up a sample DataFrame.

Dataflint

I discovered Dataflint while debugging a slow job. I suspected a skew problem but I was mostly trying random things. Dataflint is fantastic - it pinpointed my skew problem but also gave some good suggestions on executor memory and cluster size. It’s a pretty new project so hopefully it will get even better over time.

Editor Tricks

Possibly the biggest lifehack here is the fact that Databricks’ worksheet editor is based on the Monaco engine, which is the same engine that powers VS Code. What that means is that many of the same keyboard accelerators work! CMD+D selects words, CMD+Shift+Arrow does multiselect, etc. I haven’t documented how much of the VS Code keymap is availible, but there are definitely some editing improvements available.

tags: [spark - databricks]

permalink

navigation

Is MCP Necessary? - TIR 2025-06-28

Eighty/Twenty

.plan for Gordon Weakliem

Nifty Databricks Tricks

Convert any .py file to a notebook

Dataflint

Editor Tricks