Overview
The five types of data models in Squirrels are seeds, sources, build models, dbview models, and federate models.Seeds
Seeds are static CSV files stored and version-controlled in your project. They’re ideal for lookup tables, reference data, or small datasets that don’t change frequently. Key characteristics:- Stored as CSV files in the
seeds/directory - Loaded into the Virtual Data Lake (VDL) during sqrl build
- Best for small, static datasets (typically < 10,000 rows)
- Can be referenced by build models and federate models
Sources
Sources define metadata for tables that exist in external databases. Key characteristics:- Configured in
models/sources.yml - Can load data to VDL with
load_to_vdl: true - Supports incremental loading for large tables
- Can be referenced by dbview models that share the same database connection
- Can be referenced by build models and federate models if loaded to VDL
Build models
Build models are data transformations that run during the build process. They create reusable data layers that can be referenced by other models. Key characteristics:- Stored in
models/builds/directory - Can be written in SQL (DuckDB dialect) or Python
- Materialized as tables or views in the VDL
- Executed during sqrl build
- Can reference sources, seeds, and other build models
- Can be referenced by other build models and federate models
Dbview models
Dbview models are SQL queries that run directly on external databases at query time. They provide real-time access to data without materializing to storage. Key characteristics:- Stored in
models/dbviews/directory - Written in SQL (using the target database’s dialect)
- Typically executes queries on external databases at runtime
- Responds to user parameter selections dynamically
- Supports query translation to DuckDB for performance optimization
- Can reference sources that share the same database connection
- Can be referenced by federate models
If the connection type of the dbview model is DuckDB (as opposed to SQLAlchemy for example), then the SQL dialect would always be DuckDB even if the connection URI points to a non-DuckDB database.
Federate models
Federate models are dynamic data transformations that run on DuckDB at query time. They can reference any other model type, making them ideal for creating the final dataset layer. Key characteristics:- Stored in
models/federates/directory - Can be written in SQL (DuckDB dialect) or Python
- Execute queries on DuckDB at runtime
- Respond to user parameter selections dynamically
- Can reference sources, seeds, build models, dbview results, and other federate models
Categorizing the model types
In addition to describing data models by one of the five model types, we use additional terminology to categories these model types. These categories include “static models” vs “dynamic models”, or whether the models are “query models”.Static vs dynamic models
The five model types fall into two categories based on whether they are prepared ahead of time (static) or executed at request time (dynamic): Static models Static models are seeds, sources, and build models. These models are defined and configured during development, and their data or transformations are prepared ahead of time:- Seeds: CSV files stored in the Squirrels project. They are also loaded into the VDL during sqrl build
- Sources: Metadata definitions for external database tables (may optionally load data to VDL with
load_to_vdl: true) - Build models: Transformations executed and materialized in the VDL during sqrl build
Buildtime models is a more specific term that refers to the tables / views that get loaded to the VDL during sqrl build. These models are created from seeds, sources with
load_to_vdl: true, and build models, and do not include sources without load_to_vdl: true.- Dbview models: Run SQL queries directly on external databases
- Federate models: Run transformations on DuckDB in the API server, and can reference the VDL
Query models
Query models are data models defined with SQL or Python code that perform transformations. This category includes:- Build models: SQL or Python transformations materialized in the VDL
- Dbview models: SQL queries executed on external databases
- Federate models: SQL or Python transformations executed on the API server using DuckDB
Basic example of a SQL federate model
- Build models (DuckDB dialect - does not have access to context variables
ctx) - Dbview models (target database dialect - has access to context variables
ctx) - Federate models (DuckDB dialect - has access to context variables
ctx)
main() function that receives model arguments and returns a Polars LazyFrame or DataFrame (pandas DataFrame is also supported):
Basic example of a Python federate model
- Build models (does not have access to context variables
sqrl.ctx) - Federate models (has access to context variables
sqrl.ctx)
Dbview models can only be written in SQL, as they typically execute queries directly on external databases.
Model dependencies
Models can reference other models based on their type:| Model Type | Can Reference |
|---|---|
| Seeds | None (leaf nodes) |
| Sources | None (leaf nodes) |
| Build models | Sources (with load_to_vdl: true), seeds, other build models |
| Dbview models | Sources only (same connection) |
| Federate models | Sources (with load_to_vdl: true), seeds, build models, dbview models, other federate models |
Choosing the right model type
Use this decision tree to choose the appropriate model type:- Is it static reference data with a small number of rows? → Use Seeds
- Is it a table in an external database? → Use Sources
- Does it need to run on an external database at query time? → Use Dbview models
- Does it need to be materialized ahead of time for reuse? → Use Build models
- Otherwise → Use Federate models
Related pages
- Virtual Data Lake (VDL) - Understanding where buildtime models are stored
- Seeds - Static CSV data files
- Sources - External database tables
- Build models - Materialized transformations
- Dbview models - Dynamic queries on external databases
- Federate models - Dynamic transformations on DuckDB
- sqrl build - CLI reference for building the VDL
- sqrl run - CLI reference for running the API server