Build models (builds/)

Build models are data transformations that run during the build process (such as the sqrl build CLI command) and materialize their results in the Virtual Data Lake (VDL). They are used to create reusable data layers that can be referenced by other models. Build models can be written in SQL (DuckDB dialect) or Python, and are stored in the models/builds/ directory.

File structure

The logic of the build model can be written in SQL or Python. Optionally, the metadata of the build model (such as column descriptions) can be specified in a YAML file with the same name.

models/
└── builds/
    ├── my_build_model.sql   # SQL build model
    ├── my_build_model.yml   # optional configuration
    ├── my_python_model.py   # Python build model
    └── my_python_model.yml  # optional configuration

SQL build models

SQL build models use the DuckDB SQL dialect and support Jinja templating.

Example SQL model

models/builds/build_transactions.sql

{#- DuckDB dialect -#}

SELECT
    t.id,
    t.date,
    t.amount,
    c.category_name
FROM {{ ref("src_transactions") }} t
LEFT JOIN {{ ref("seed_categories") }} c 
    ON t.category_id = c.category_id

Available Jinja variables

The following variables are available in SQL build models:

Variable	Description
`proj_vars`	Dictionary of project variables from `squirrels.yml`
`env_vars`	Dictionary of environment variables
`ref(model_name)`	Macro that returns the table name for the referenced model

The `ref()` macro

The ref() macro is used to reference other models. It can reference:

Sources (with load_to_vdl: true or DuckDB connection type)
Seeds
Other build models

-- Reference a source
FROM {{ ref("src_customers") }}

-- Reference a seed
FROM {{ ref("seed_categories") }}

-- Reference another build model
FROM {{ ref("build_transactions") }}

Python build models

Python build models define a main() function that receives a BuildModelArgs object and returns a Polars LazyFrame (or DataFrame).

Example Python model

models/builds/build_transactions.py

from squirrels import arguments as args
import polars as pl


def main(sqrl: args.BuildModelArgs) -> pl.LazyFrame:
    transactions = sqrl.ref("src_transactions")
    categories = sqrl.ref("seed_categories")
    
    result = transactions.join(
        categories, on="category_id", how="left"
    ).select(["id", "date", "amount", "category_name"])
    
    return result

YAML configuration

An optional YAML file with the same name provides additional configuration for the model.

models/builds/build_transactions.yml

description: |
  Enriched transaction data with category names joined.

materialization: TABLE     # TABLE or VIEW - default is VIEW for SQL, ignored for Python (always TABLE)

depends_on:                # required for Python models, optional for SQL models
  - src_transactions
  - seed_categories

columns:
  - name: id
    depends_on:
      - src_transactions.id
    pass_through: true     # inherit metadata from upstream column
  
  - name: date
    type: string
    description: Transaction date
    category: dimension
  
  - name: amount
    type: decimal
    description: Transaction amount
    category: measure
  
  - name: category_name
    type: string
    description: Category name
    category: dimension

Configuration fields

description

string

default:""

A description of the model for documentation purposes.

materialization

string

default:"VIEW"

How the model is stored in the VDL. Options are TABLE or VIEW.

Python models are always materialized as tables regardless of this setting.

depends_on

list[string]

default:"[]"

List of model names this build model depends on. Optional for SQL models (derived from ref() calls), but required for Python models.

columns

list[object]

default:"[]"

Column metadata definitions as a list.

Hide column metadata fields

name

string

default:""

Column name

type

string

default:""

Data type (e.g., string, integer, decimal)

description

string

default:""

Column description

Materialization

Build models can be materialized as either tables or views in the VDL:

Materialization	Description
`TABLE`	Data is stored physically. Faster for repeated queries, but takes more storage.
`VIEW`	Data is computed on-demand. Saves storage, but slower if queried multiple times.

# Materialize as a table
materialization: TABLE

# Materialize as a view (default)
materialization: VIEW

Python build models are always materialized as tables, regardless of the materialization setting. This is because DuckDB only supports view definitions in SQL, not Python.

Best practices

Use descriptive names: Prefix build model names with build_ to distinguish them from other model types.
Declare dependencies explicitly: Even though SQL models can auto-detect dependencies via ref(), explicitly listing them in YAML helps with documentation.
Use pass_through for inherited columns: When a column passes through unchanged from an upstream model, use pass_through: true to automatically inherit its metadata.
Choose materialization wisely: Use TABLE for models that are queried frequently or have expensive computations. Use VIEW for simple transformations or infrequently accessed models.

Sources - Configure source tables from external databases
Seeds - Static CSV data files
Federate models - Dynamic models that run at query time
Dbview models - Models that run on external databases
BuildModelArgs - API reference for Python build model arguments
sqrl build - CLI reference for sqrl build

Get started

Concepts

Project files

File structure

SQL build models

Example SQL model

Available Jinja variables

The `ref()` macro

Python build models

Example Python model

YAML configuration

Configuration fields

Materialization

Best practices

Get started

Concepts

Project files

​File structure

​SQL build models

​Example SQL model

​Available Jinja variables

​The ref() macro

​Python build models

​Example Python model

​YAML configuration

​Configuration fields

​Materialization

​Best practices

​Related pages

File structure

SQL build models

Example SQL model

Available Jinja variables

The `ref()` macro

Python build models

Example Python model

YAML configuration

Configuration fields

Materialization

Best practices

Related pages