seeds/ directory at the root of your Squirrels project.
File structure
Seeds consist of two types of files:- CSV files (
.csv) - The data files containing your static data - YAML configuration files (
.yml) - Optional configuration files that define column types and metadata
.yml extension) and be located in the same directory.
Seeds are loaded recursively from the
seeds/ directory and its subdirectories. The seed name used in your models is the filename without the extension (e.g., seed_my_lookup for seed_my_lookup.csv).CSV files
Seed CSV files should follow standard CSV formatting:- The first row must contain column headers
- Values can optionally be quoted with double quotes
- Dates are automatically parsed when schema inference is enabled
Example
seeds/seed_categories.csv
YAML configuration
The optional YAML configuration file allows you to define metadata and column types for your seed.Configuration fields
A description of the seed data. This is used for documentation purposes.
When set to
true, columns are cast to the types specified in the columns configuration. This overrides the SQRL_SEEDS__INFER_SCHEMA environment variable for this specific seed.Column metadata definitions as a list. Expand below for fields of each item.
Example
seeds/seed_categories.yml
Environment variables
Two environment variables control how seeds are loaded:Whether to automatically infer column types when loading CSV seed files. Set to
false to treat all columns as strings by default.This setting is ignored for individual seeds where
cast_column_types: true is set in the YAML configuration.A JSON array of strings to treat as null/NA values when parsing seed CSV files.Example:
["", "NA", "N/A", "null"]Using seeds in models
Seeds are available to use in your data models after they are loaded. You can reference seeds using theref() function in both Jinja SQL templates and Python models.
In Jinja SQL templates
Use theref() function to reference a seed:
models/federates/fed_transactions.sql
In Python models
Use thesqrl.ref() method to get a seed as a Polars LazyFrame:
models/federates/fed_transactions.py
Schema inference vs. explicit types
Squirrels provides two approaches for determining column types:Schema inference (default)
WhenSQRL_SEEDS__INFER_SCHEMA is true (the default), Squirrels automatically infers column types from the CSV data. This includes:
- Numeric types for columns containing numbers
- Date/datetime types for columns with date patterns
- String types for everything else
Explicit type casting
For more control, setcast_column_types: true in your YAML configuration and specify column types explicitly:
Supported types
The following table shows all supported Squirrels column types and their corresponding Polars data types when loading the seeds into memory:| Squirrels Type Aliases | Polars Type |
|---|---|
string, varchar, char, text | pl.String |
tinyint, int1 | pl.Int8 |
smallint, short, int2 | pl.Int16 |
integer, int, int4 | pl.Int32 |
bigint, long, int8 | pl.Int64 |
float, float4, real | pl.Float32 |
double, float8 | pl.Float64 |
decimal | pl.Decimal(precision=18, scale=2) |
decimal(x, y) | pl.Decimal(precision=x, scale=y) |
boolean, bool, logical | pl.Boolean |
date | pl.Date |
time | pl.Time |
timestamp, datetime | pl.Datetime |
interval | pl.Duration |
blob, binary, varbinary | pl.Binary |
Best practices
- Keep seeds small: Seeds are loaded into memory. For large datasets (more than a few thousand rows), consider using sources or build models instead.
-
Use descriptive names: Prefix your seed files with
seed_to distinguish them from other model types in your project. - Document your columns: Add descriptions to columns in the YAML configuration to help other developers and data consumers (such as AI agents) understand the data.
- Version control your seeds: Since seeds contain static data, they should be committed to version control so changes are tracked.
- Use explicit types for critical data: If column types are important for your business logic, define them explicitly in the YAML configuration rather than relying on inference.
Related pages
- Environment variables - Seed-related environment variables
- Sources - For connecting to existing database tables
- Build models - For larger datasets that need to be materialized
- Federate models - For larger datasets that need to be materialized