Skip to main content
Build models are data transformations that run during the build process (such as the sqrl build CLI command) and materialize their results in the Virtual Data Lake (VDL). They are used to create reusable data layers that can be referenced by other models. Build models can be written in SQL (DuckDB dialect) or Python, and are stored in the models/builds/ directory.

File structure

The logic of the build model can be written in SQL or Python. Optionally, the metadata of the build model (such as column descriptions) can be specified in a YAML file with the same name.
models/
└── builds/
    ├── my_build_model.sql   # SQL build model
    ├── my_build_model.yml   # optional configuration
    ├── my_python_model.py   # Python build model
    └── my_python_model.yml  # optional configuration

SQL build models

SQL build models use the DuckDB SQL dialect and support Jinja templating.

Example SQL model

models/builds/build_transactions.sql
{#- DuckDB dialect -#}

SELECT
    t.id,
    t.date,
    t.amount,
    c.category_name
FROM {{ ref("src_transactions") }} t
LEFT JOIN {{ ref("seed_categories") }} c 
    ON t.category_id = c.category_id

Available Jinja variables

The following variables are available in SQL build models:
VariableDescription
proj_varsDictionary of project variables from squirrels.yml
env_varsDictionary of environment variables
ref(model_name)Macro that returns the table name for the referenced model

The ref() macro

The ref() macro is used to reference other models. It can reference:
  • Sources (with load_to_vdl: true or DuckDB connection type)
  • Seeds
  • Other build models
-- Reference a source
FROM {{ ref("src_customers") }}

-- Reference a seed
FROM {{ ref("seed_categories") }}

-- Reference another build model
FROM {{ ref("build_transactions") }}

Python build models

Python build models define a main() function that receives a BuildModelArgs object and returns a Polars LazyFrame (or DataFrame).

Example Python model

models/builds/build_transactions.py
from squirrels import arguments as args
import polars as pl


def main(sqrl: args.BuildModelArgs) -> pl.LazyFrame:
    transactions = sqrl.ref("src_transactions")
    categories = sqrl.ref("seed_categories")
    
    result = transactions.join(
        categories, on="category_id", how="left"
    ).select(["id", "date", "amount", "category_name"])
    
    return result

YAML configuration

An optional YAML file with the same name provides additional configuration for the model.
models/builds/build_transactions.yml
description: |
  Enriched transaction data with category names joined.

materialization: TABLE     # TABLE or VIEW - default is VIEW for SQL, ignored for Python (always TABLE)

depends_on:                # required for Python models, optional for SQL models
  - src_transactions
  - seed_categories

columns:
  - name: id
    depends_on:
      - src_transactions.id
    pass_through: true     # inherit metadata from upstream column
  
  - name: date
    type: string
    description: Transaction date
    category: dimension
  
  - name: amount
    type: decimal
    description: Transaction amount
    category: measure
  
  - name: category_name
    type: string
    description: Category name
    category: dimension

Configuration fields

description
string
default:""
A description of the model for documentation purposes.
materialization
string
default:"VIEW"
How the model is stored in the VDL. Options are TABLE or VIEW.
Python models are always materialized as tables regardless of this setting.
depends_on
list[string]
default:"[]"
List of model names this build model depends on. Optional for SQL models (derived from ref() calls), but required for Python models.
columns
list[object]
default:"[]"
Column metadata definitions as a list.

Materialization

Build models can be materialized as either tables or views in the VDL:
MaterializationDescription
TABLEData is stored physically. Faster for repeated queries, but takes more storage.
VIEWData is computed on-demand. Saves storage, but slower if queried multiple times.
# Materialize as a table
materialization: TABLE

# Materialize as a view (default)
materialization: VIEW
Python build models are always materialized as tables, regardless of the materialization setting. This is because DuckDB only supports view definitions in SQL, not Python.

Best practices

  1. Use descriptive names: Prefix build model names with build_ to distinguish them from other model types.
  2. Declare dependencies explicitly: Even though SQL models can auto-detect dependencies via ref(), explicitly listing them in YAML helps with documentation.
  3. Use pass_through for inherited columns: When a column passes through unchanged from an upstream model, use pass_through: true to automatically inherit its metadata.
  4. Choose materialization wisely: Use TABLE for models that are queried frequently or have expensive computations. Use VIEW for simple transformations or infrequently accessed models.