models/sources.yml file and can be loaded into the Virtual Data Lake (VDL) for use by other data models.
Configuration structure
Thesources.yml file contains a list of source definitions under the sources key:
Configuration fields
A unique identifier for the source. This name is used to reference the source in your models using the
ref() function.A human-readable description of the source table. Used for documentation and data consumers (such as AI agents).
The name of the database connection to use, as defined in
squirrels.yml or pyconfigs/connections.py. If not specified, uses the default connection defined by the SQRL_CONNECTIONS__DEFAULT_NAME_USED environment variable, or “default” if that is not set.The name of the table in the external database. If not specified, defaults to the source
name.Whether to load this source’s data into the Virtual Data Lake (VDL) during the build process. When
true, the source data is extracted from the external database and stored in the VDL, making it available for use by build and federate models. See VDL loading behavior for details on how columns are loaded.A list of column names that form the primary key of the source table. Used for incremental loading and upsert operations.
Configuration for incremental data loading.
A list of column configurations that define which columns to load and their metadata.
If no columns are specified and load_to_vdl is set to true, all columns from the source table are loaded with types inferred by DuckDB.
Example configurations
VDL loading behavior
Whenload_to_vdl: true is set, the source data is extracted from the external database and stored in the Virtual Data Lake during sqrl build. The behavior depends on whether columns are specified:
| Columns specified? | Behavior |
|---|---|
| Yes | Only the specified columns are loaded, and each column is cast to the specified type. |
| No | All columns from the source table are loaded with types inferred by DuckDB. |
When specifying columns, make sure the column types match or are compatible with the actual database column types to avoid casting errors during the build process.
Using sources in models
Sources can be referenced in your data models, but the method depends on the model type and whetherload_to_vdl is enabled.
In build models (SQL)
Build models written in SQL can reference sources using theref() macro which provides the fully qualified name of the source model:
In build models (Python)
Build models written in Python can reference sources using thesqrl.ref() function which returns a LazyFrame:
In federate models
Similar to build models, sources withload_to_vdl: true can be referenced directly in federate models using the ref() macro for SQL models or the sqrl.ref() function for Python models.
In dbview models
Dbview models reference sources using thesource() function (though ref() is also supported as an alias). Unlike build and federate models, dbview models have specific rules for referencing sources:
- Same connection required: A dbview model can only reference sources that share the same database connection. This is because dbview queries run directly on the external database (or at least give the illusion of doing so).
-
translate_to_duckdb restriction: If a dbview model has
translate_to_duckdb: true, it can only reference sources withload_to_vdl: true. This is because the translated query actually runs on DuckDB using VDL data.
Best practices
-
Use descriptive names: Prefix your source names with
src_to distinguish them from other model types. - Document columns: Add descriptions to all columns to help other developers and data consumers (including AI agents) understand the data.
- Specify column types: Always provide explicit column types for sources to ensure data is loaded correctly.
-
Consider incremental loading: For large or frequently updated tables, use
update_hintsto enable incremental loading and reduce build times. -
Keep VDL loading selective: Only set
load_to_vdl: truefor sources that need to be used by build or federate models. Sources used only by dbview models don’t need VDL loading.
Related pages
- Project structure - Overview of files in a Squirrels project
- Seeds - For static CSV data files
- Environment variables - Configuration through environment variables
- Build models - Models that materialize data in the VDL
- Dbview models - Models that run on external databases
- Federate models - Dynamic models that run in memory of the API server at query time