Skip to main content
Federate models are data transformations that run at query time on the embedded DuckDB engine. They can reference sources, seeds, build models, dbview results, and other federate models, making them ideal for creating the final dataset layer that responds to user parameter selections. Federate models are stored in the models/federates/ directory and can be written in SQL (DuckDB dialect) or Python.

File structure

models/
└── federates/
    ├── my_federate.sql          # SQL federate model
    ├── my_federate.yml          # configuration (optional for SQL, required for Python)
    ├── my_python_federate.py    # Python federate model
    └── my_python_federate.yml   # configuration (required for Python)

SQL federate models

SQL federate models use the DuckDB dialect and support full Jinja templating with access to parameters, context variables, and user information.

Example SQL model

models/federates/fed_transactions_summary.sql
{#- DuckDB dialect -#}

SELECT 
    {{ ctx.group_by_column }} AS dimension,
    CAST(SUM(amount) AS DECIMAL(15, 2)) AS total_amount,
    COUNT(*) AS transaction_count

FROM {{ ref("build_transactions") }}

WHERE date >= '{{ ctx.start_date }}'
    AND date <= '{{ ctx.end_date }}'
{%- if ctx.has_category_filter %}
    AND category_id IN ({{ ctx.selected_categories | quote_and_join }})
{%- endif %}

GROUP BY {{ ctx.group_by_column }}

ORDER BY total_amount DESC

Available Jinja variables

The following variables are commonly used in federate models:
VariableDescription
ctxDictionary of context variables from context.py
ref(model_name)Macro that returns the table name for the referenced model
is_placeholder(name)Check if a placeholder exists
The following variables are also available in federate models, but are less commonly used given the availability of ctx:
VariableDescription
proj_varsDictionary of project variables from squirrels.yml
env_varsDictionary of environment variables
userThe authenticated user object with username and custom_fields
prmsDictionary of parameter objects
param_exists(param_name)Check if a parameter exists and is enabled
set_placeholder(name, value)Register a parameterized query placeholder
configurablesDictionary of configurable values set by the application

The ref() macro

The ref() macro is used to reference other models. Federate models can reference:
  • Sources (with load_to_vdl: true or DuckDB connection type)
  • Seeds
  • Build models
  • Dbview models
  • Other federate models
-- Reference a source (must have load_to_vdl: true or use DuckDB connection)
FROM {{ ref("src_customers") }}

-- Reference a seed
FROM {{ ref("seed_categories") }}

-- Reference a build model
FROM {{ ref("build_transactions") }}

-- Reference a dbview model
FROM {{ ref("dbv_recent_sales") }}

-- Reference another federate model
FROM {{ ref("fed_base_metrics") }}

Python federate models

Python federate models define a main() function that receives a ModelArgs object and returns a Polars LazyFrame (or DataFrame).

Example Python model

models/federates/fed_transactions_summary.py
from squirrels import arguments as args
import polars as pl


def main(sqrl: args.ModelArgs) -> pl.LazyFrame:
    """
    Create a federate model that summarizes transactions based on parameter selections.
    """
    df = sqrl.ref("build_transactions")
    
    # Apply date filters from context
    df = df.filter(
        (pl.col("date") >= sqrl.ctx["start_date"]) &
        (pl.col("date") <= sqrl.ctx["end_date"])
    )
    
    # Apply category filter if selected
    if sqrl.ctx.get("has_category_filter"):
        categories = sqrl.ctx["selected_categories"]
        df = df.filter(pl.col("category_id").is_in(categories))
    
    # Group by the selected dimension
    group_col = sqrl.ctx["group_by_column"]
    df = df.group_by(group_col).agg([
        pl.sum("amount").cast(pl.Decimal(precision=15, scale=2)).alias("total_amount"),
        pl.count().alias("transaction_count")
    ])
    
    # Sort by total amount descending
    df = df.sort("total_amount", descending=True)
    
    return df

YAML configuration

The YAML configuration file specifies dependencies and other options:
models/federates/fed_transactions_summary.yml
description: |
  Summarizes transactions by a selected dimension with optional category filtering.

depends_on:                # required for Python models, optional for SQL
  - build_transactions

eager: false               # whether to materialize as TABLE instead of VIEW

columns:
  - name: dimension
    type: string
    description: The selected dimension for grouping
    category: dimension
  
  - name: total_amount
    type: decimal
    description: Sum of transaction amounts
    category: measure
    depends_on:
      - build_transactions.amount
  
  - name: transaction_count
    type: integer
    description: Number of transactions
    category: measure

Configuration fields

description
string
default:""
A description of the model for documentation purposes or for use by data consumers (such as AI agents).
depends_on
list[string]
default:"[]"
List of model names this federate depends on. Optional for SQL models (derived from ref() calls), but required for Python models.
eager
boolean
default:"false"
If true, the SQL model result is materialized as a TABLE in memory. If false, it’s created as a VIEW. This only applies to SQL models. See Eager vs lazy evaluation for details.
columns
list[object]
default:"[]"
Column metadata definitions as a list.

Eager vs lazy evaluation

The eager setting controls how SQL federate models are created in DuckDB:
SettingDuckDB ObjectUse Case
eager: false (default)VIEWSingle-use results, saves memory
eager: trueTABLEResults referenced multiple times, complex calculations
When enabled:
  1. The federate model result is materialized as a TABLE in the temporary in-memory DuckDB database per request
  2. Subsequent references to this model read from the materialized table
  3. This prevents redundant computation when the same result is used multiple times
The eager setting only applies to SQL federate models. Python federate models use polars LazyFrames or DataFrames (depending on what is returned by the model). However, they are registered with DuckDB when referenced by downstream SQL models, or collected as a DataFrame if used as the dataset response.

Using placeholders for SQL injection prevention

For user-provided values, use placeholders to prevent SQL injection:
models/federates/fed_transactions.sql
SELECT *
FROM {{ ref("build_transactions") }}
WHERE description LIKE $description
The $placeholder_name syntax (for DuckDB) is used in the query for parameterized execution. The placeholder can be set using context variables in pyconfigs/context.py. For instance:
pyconfigs/context.py
from typing import Any
from squirrels import arguments as args, parameters as p


def main(ctx: dict[str, Any], sqrl: args.ContextArgs) -> None:
    if sqrl.param_exists("description"):
        description_param = sqrl.prms["description"]
        assert isinstance(description_param, p.TextParameter)
        
        description = description_param.get_entered_text()
        
        # Set a placeholder for use in SQL queries
        sqrl.set_placeholder("description", description.apply_percent_wrap())

Connecting to datasets

In squirrels.yml, you reference a data model as the model for a dataset. The target model can be any type of data model (source, seed, build, dbview, or federate):
squirrels.yml
datasets:
  - name: transaction_analytics
    label: Transaction Analytics
    model: fed_transactions_summary  # can reference any model type
    parameters:
      - date_range
      - category_filter
      - group_by

Best practices

  1. Use descriptive names: Prefix federate names with fed_ to distinguish them from other model types.
  2. Use context variables: Define complex logic in context.py and use simple context variable references in your SQL/Python models.
  3. Declare dependencies in YAML: Unlike SQL models which auto-detect dependencies via ref(), Python models require explicit depends_on in the YAML configuration. It is also recommended to declare dependencies in YAML for SQL models.
  4. Document conditional columns: If a specific column only exists based on parameter selections, use the condition field to document it.
  5. Use placeholders for user input: Always use set_placeholder() for values that come from user text input to prevent SQL injection.
  6. Mask sensitive data: Use conditional logic based on user fields to mask sensitive columns.
  • Sources - Configure source tables from external databases if loading to the VDL
  • Seeds - Static CSV data files
  • Build models - Materialized models in the VDL
  • Dbview models - Dynamic models that run on external databases
  • Context variables - Set context variables that transform parameter selections into meaningful values at runtime
  • ModelArgs - API reference for Python federate model arguments