This step-by-step tutorial will walk you through creating your own Squirrels project!
We will modify a working Squirrels project to create a different one for weather analytics. Once you’ve completed the tutorial, you will understand many of the key features of Squirrels!
Step 1: Complete the quickstart
Create a new folder for your project, and open it up in your favourite coding editor (such as VS Code). The name of the folder is your choice (one example is squirrels-tutorial).
Follow instructions in Quickstart to quickly get a sample working project going.
Then, in the .env file, set the SQRL_SECRET__ADMIN_PASSWORD environment variable to something of your choice (it is randomly generated by default).
Step 2: Add the weather database
Now add the SQLite database we will use for the rest of the tutorial. Simply run:
This adds a weather.db file in the assets folder. Feel free to delete the other files in the assets folder.
Note that this is mainly done for tutorial purposes. For most production use cases, you would simply specify the database connection string (more details on this soon) to an external database and not bring a copy of the database into your project.
Step 3: Define project configurations
Open the squirrels.yml file. This is the project configuration file that exists for all Squirrels projects, and it is used to configure many properties of the Squirrels project in yaml.
In this step, we will focus on the project_variables, connections, and datasets sections.
Setting the project variables
The project variables name and major_version are required. The label and description are optional. You are also free to add any of your own project variables here.
In this tutorial, we will be making datasets for historical weather data. Change the name to weather, label to Weather Analytics, and provide a description. We will leave major_version as is.
The project_variables section should now look like this:
project_variables:
name: weather
label: Weather Analytics
description: Analyze weather statistics from 2012 to 2015.
major_version: 1
Setting the database connections
The connections section is where we set all the database connection details that we need. We provide a list of connection names here and refer to them in other files. The connection name default must be provided for models that don’t specify a connection name explicitly.
Under default, change the uri field to sqlite:///{project_path}/assets/weather.db. Change the label to “SQLite Weather Database”.
You can also substitute environment variables defined in the .env file using Jinja. For instance, if there is an environment variable called “SQLITE_CONN_STR” in .env, then you can also set the url to:{{ env_vars.SQLITE_CONN_STR }}
The connections section should now look like this:
connections:
- name: default
label: SQLite Weather Database
type: sqlalchemy
uri: sqlite:///{project_path}/assets/weather.db
Defining the datasets
The datasets section is where we define the attributes for all datasets created by the Squirrels project. Every dataset defined will have their own “parameters API” and “dataset result API”.
Currently, you may see two datasets configured. We will only have one dataset for this tutorial.
Change the datasets section to look like the following instead:
datasets:
- name: weather_by_period
label: Weather by Time of Year
description: Gather weather statistics (precipitation, temperature, wind speed, etc.) by time of year or condition
model: weather_by_period # optional if same as dataset name
parameters:
- group_by_dim
The model field is the name of the target data model that we will create later. We will create the “group_by_dim” parameter that this dataset uses in the next step.
Step 4: Create the dataset parameters
Go into the pyconfigs/parameters.py file. This file contains the definitions of all the widget parameters used in the dataset.
Define the parameters
We will rewrite this file. Remove all the existing code in the file and replace it with the following:
from squirrels import parameters as p, parameter_options as po
@p.SingleSelectParameter.create_simple(
name="group_by_dim", label="Group By",
description="Dimension(s) to aggregate by"
)
def group_by_options():
return [
po.SelectParameterOption(
id="year", label="Year",
dim_col="year"
),
po.SelectParameterOption(
id="quarter", label="Quarter",
dim_col="quarter"
),
po.SelectParameterOption(
id="month", label="Month",
dim_col="month_name", order_by_col="month_order"
),
po.SelectParameterOption(
id="day", label="Day of Year",
dim_col="day_of_year"
),
po.SelectParameterOption(
id="cond", label="Condition",
dim_col="condition"
)
]
In this initial step, we defined a function decorated with @p.SingleSelectParameter.create_simple to create a single-select parameter. The function returns a list of parameter options as SelectParameterOption objects.
The id and label parameters to the SelectParameterOption constructor are required. Arbitrary keyword arguments such as “dim_col” and “order_by_col” can be specified to the SelectParameterOption constructor, which will be treated as custom fields to the parameter option.
The SelectParameterOption class has an “is_default” attribute to specify the parameter option(s) that are selected by default. By default, “is_default” is set to False. When none of the parameter options have “is_default” set as True, the first option is selected by default for single-select parameters, and nothing is selected by default for multi-select parameters.
The possible widget parameter types supported today are SingleSelectParameter, MultiSelectParameter, DateParameter, DateRangeParameter, NumberParameter, NumberRangeParameter, and TextParameter. Each parameter type can be created with one of the following decorators: create_simple, create_with_options, or create_from_source. Every decorator takes “name” and “label” as required arguments.For SingleSelectParameter, the arguments for create_simple and create_with_options are similar. The difference is that create_with_options lets you specify a parent parameter for cascading the shown options. For non-select parameter types like DateParameter, there are more differences.
Step 5: Create the context file
The context file is a Python file that runs in real-time to transform parameter selections and/or authenticated user attributes into meaningful values that can be used by dynamic data models.
Change the pyconfigs/context.py file to look like the following:
from typing import cast, Any
from squirrels import ContextArgs, parameters as p
from pyconfigs.user import CustomUserFields
def main(ctx: dict[str, Any], sqrl: ContextArgs) -> None:
"""
Define context variables AFTER parameter selections are made by adding entries to the dictionary "ctx".
These context variables can then be used in the models.
"""
# The param_exists method confirms whether the "group_by_dim" parameter exists for the current dataset.
# If it does, we define two context variables called "dim_col" and "order_col".
if sqrl.param_exists("group_by_dim"):
group_by_param = cast(p.SingleSelectParameter, sqrl.prms["group_by_dim"])
ctx["dim_col"] = group_by_param.get_selected("dim_col")
ctx["order_col"] = group_by_param.get_selected("order_by_col", default_field="dim_col")
# Define the context variable "role" based on whether the user is authenticated and its fields(s). See "user.py" for defining custom user fields.
# This is shown for demonstration purposes - the "role" context variable will not be used in any data models in this tutorial.
custom_user_fields = cast(CustomUserFields, sqrl.user.custom_fields)
ctx["role"] = custom_user_fields.role
In this example, we define context variables “dim_col” and “order_col” based on the “group_by_dim” parameter selection. We also define the “role” context variable based on the authenticated user’s attribute(s), which is done for demonstration purposes (the “role” context variable is not used in the rest of this tutorial).
The available methods on the parameter object depends on the parameter type. For example, SingleSelectParameter objects have a get_selected method to get a custom field from the selected option.By casting the parameter object to type SingleSelectParameter, this makes it easier to explore the available methods on the parameter through an IDE’s linting and auto-complete features.
Step 6: Create sources
The models/sources.yml file lets us document the metadata of sources from our database tables.
The weather.db SQLite database we retrieved earlier contains a table called “weather”. Replace the sources.yml file with the following contents:
sources:
- name: src_weather
description: Source table for weather metrics by day over time
connection: default
table: weather
load_to_vdl: true
columns:
- name: date
type: string
description: The date of the weather statistics in YYYY-MM-DD format
category: dimension
- name: precipitation
type: float
description: The amount of precipitation for the time period in centimeters
category: measure
- name: temp_max
type: float
description: The maximum temperature for the time period in degrees Celsius
category: measure
- name: temp_min
type: float
description: The minimum temperature for the time period in degrees Celsius
category: measure
- name: wind
type: float
description: The average wind speed for the time period in km/h
category: measure
- name: condition
type: string
description: The weather condition of the time period (e.g. "sun", "fog", "rain")
category: dimension
Step 7: Create seeds
Seeds are CSV data files that can be used by other data models. For example, we can create a seed that maps month numbers to month names.
Create a file named seed_month_names.csv in the seeds/ folder with the following contents. Feel free to delete the other files in the seeds/ folder.
month_order,month_name
1,January
2,February
3,March
4,April
5,May
6,June
7,July
8,August
9,September
10,October
11,November
12,December
In addition, create a seeds/seed_month_names.yml file to add metadata for the seed.
description: Month number to month name mapping
columns:
- name: month_order
type: integer
description: The month number
category: dimension
- name: month_name
type: string
description: The human-readable month name
category: dimension
Information in the yaml file may be useful for the Squirrels framework as well.
Step 8: Create data models from SQL queries
For data models that are created by code, the Squirrels framework supports:
- Creating build models (tables/views to be built offline) from SQL (DuckDB dialect) or Python files
- Creating dbview models (queries that run on an external database in real-time) from SQL files (dialect of the database connection used)
- Creating federate models (queries that run in the API server in real-time) from SQL (DuckDB dialect) or Python files
Sources, seeds, and build models are known as “static data models”.
Dbview models and federate models are known as “dynamic data models”.
We have already configured the “source” and “seed” models. For this tutorial, we will create a build model and a federate model with SQL, but will not create any dbview models.
Since we have replaced the source model in sources.yml, data models that are downstream of the replaced source model will no longer work.
Delete the existing files in the models/builds/, models/dbviews/, and models/federates/ folders. Feel free to delete the existing files in the macros/ folder as well.
Define a macro
First, create a macros/metrics.sql file with the following contents.
{%- macro get_metrics() -%}
CAST(SUM(precipitation) AS DECIMAL(6, 1)) AS precipitation,
CAST(MAX(temp_max) AS DECIMAL(4, 1)) AS temp_max,
CAST(MIN(temp_min) AS DECIMAL(4, 1)) AS temp_min,
CAST(AVG(wind) AS DECIMAL(6, 4)) AS wind
{%- endmacro -%}
Macros allow us to reuse the same SQL text in multiple places, including across different data models.
Define the build model
“Build models” are defined in the models/builds/ folder.
Create a models/builds/weather_by_date.sql file with the following contents.
WITH
weather_by_date AS (
SELECT
date::VARCHAR AS date,
EXTRACT(YEAR FROM date::DATE)::INT AS year,
EXTRACT(MONTH FROM date::DATE)::INT AS month_order,
EXTRACT(DOY FROM date::DATE)::INT AS day_of_year,
'Q' || CEIL(EXTRACT(MONTH FROM date::DATE) / 3)::INT AS quarter,
MODE(condition)::VARCHAR AS condition,
{{ get_metrics() | indent(4) }}
FROM {{ ref("src_weather") }}
GROUP BY date
)
SELECT
date,
condition,
year,
m.month_order,
m.month_name,
day_of_year,
quarter,
precipitation,
temp_max,
temp_min,
wind
FROM weather_by_date
LEFT JOIN {{ ref("seed_month_names") }} AS m USING (month_order)
This query finds the total precipitation, max/min temperature, and average wind speed for each date in the source table.
The SQL file is templated with Jinja. It calls the macros:
get_metrics() which is defined in the macros/metrics.sql file we created earlier
ref("src_weather") which references the “src_weather” source we defined earlier
Build models are able to call ref on sources (that have load_to_vdl: true), seeds, and other build models.
Let’s also add metadata for the build model in the models/builds/weather_by_date.yml file.
description: |
This build model aggregates weather data by date and adds additional time-based attributes.
materialization: TABLE # one of "TABLE" or "VIEW" - defaults to "VIEW" for SQL models if not specified
depends_on: # optional for SQL models - Squirrels can automatically track dependencies for SQL models via the `ref()` macro
- src_weather
- seed_month_names
columns:
- name: date
depends_on:
- src_weather.date
pass_through: true
- name: year
type: integer
description: The year extracted from the date
category: dimension
depends_on:
- src_weather.date
- name: month_order
type: integer
description: The order of the month
category: dimension
depends_on:
- src_weather.date
- seed_month_names.month_order
- name: month_name
type: string
description: The name of the month
category: dimension
depends_on:
- src_weather.date
- seed_month_names.month_name
- name: day_of_year
type: integer
description: The day of the year (1-366)
category: dimension
depends_on:
- src_weather.date
- name: quarter
type: string
description: The quarter (Q1, Q2, Q3, or Q4) derived from the month
category: dimension
depends_on:
- src_weather.date
- name: condition
type: string
description: The most common weather condition of the date
category: dimension
depends_on:
- src_weather.condition
- name: precipitation
type: decimal
description: Total precipitation of the date in centimeters, rounded to 1 decimal place
category: measure
depends_on:
- src_weather.precipitation
- name: temp_max
type: decimal
description: Maximum temperature of the date in degrees Celsius, rounded to 1 decimal place
category: measure
depends_on:
- src_weather.temp_max
- name: temp_min
type: decimal
description: Minimum temperature of the date in degrees Celsius, rounded to 1 decimal place
category: measure
depends_on:
- src_weather.temp_min
- name: wind
type: decimal
description: Average wind speed of the date in km/h, rounded to 4 decimal places
category: measure
depends_on:
- src_weather.wind
Define the federate model
Federate models are defined in the models/federates/ folder.
Create a models/federates/weather_by_period.sql file with the following contents.
WITH
cte_weather_grouped AS (
SELECT
{{ ctx.dim_col }} AS dim_value,
{{ ctx.order_col }} AS ordering,
{{ get_metrics() | indent(4) }}
FROM {{ ref("weather_by_date") }}
GROUP BY dim_value, ordering
)
SELECT
'{{ ctx.dim_col }}' AS dimension_type,
dim_value AS dimension_value,
precipitation,
temp_max AS temperature_max,
temp_min AS temperature_min,
wind
FROM cte_weather_grouped
ORDER BY ordering
This query finds the total precipitation, max/min temperature, and average wind speed for each group based on the “Group By” parameter selection in real-time.
The {{ ctx.dim_col }} and {{ ctx.order_col }} variables are used to reference the context variables defined in the context.py file.Just like the build model, we use the get_metrics macro again. We also use the ref macro to reference the build model created earlier.Federate models are able to call the ref macro on sources (that have load_to_vdl: true), seeds, build models, dbview models, and other federate models.
Also, create the models/federates/weather_by_period.yml file to add metadata for the federate model.
description: |
This model aggregates weather data by time of year or condition to show weather statistics.
depends_on: # optional for SQL models - Squirrels can automatically track dependencies for SQL models via the `ref()` macro
- weather_by_date
columns:
- name: dimension_type
type: string
description: The type of dimension (e.g. "year", "month", "condition")
category: dimension
- name: dimension_value
type: string
description: The value of the dimension (e.g. "2020", "January", "sun")
category: dimension
- name: precipitation
type: decimal
description: Total precipitation of the time period or condition in centimeters, rounded to 1 decimal place
category: measure
depends_on:
- weather_by_date.precipitation
- name: temperature_max
type: decimal
description: Maximum temperature of the time period or condition in degrees Celsius, rounded to 1 decimal place
category: measure
depends_on:
- weather_by_date.temp_max
- name: temperature_min
type: decimal
description: Minimum temperature of the time period or condition in degrees Celsius, rounded to 1 decimal place
category: measure
depends_on:
- weather_by_date.temp_min
- name: wind
type: decimal
description: Average wind speed of the time period or condition in km/h, rounded to 4 decimal places
category: measure
depends_on:
- weather_by_date.wind
For SQL models that use the ref macro, the depends_on field is optional. However, it is required for Python models, and it is recommended to specify it for SQL models as well.
Similar to federate models, dbview models are also run in real-time and can change behaviour based on parameter selections or authenticated user.The main difference is that dbview models run on an external database (instead of the API server), and can only reference sources that share the same database connection.
Step 9: Development testing
You can build the static data models by running:
Then, activate the API server by running:
Take the time now to explore the project in Squirrels Studio before proceeding.
Use AI to ask questions about your datasets! 🤖While the API server is active, you have an MCP server running at:http://localhost:4465/mcp
Check out our guide to Connect MCP Clients! This allows you to use your favourite AI agent to interact with your Squirrels project.
Congratulations, you have reached the end of the tutorial!
What’s next?
TBA