Tutorial - Squirrels Docs

This step-by-step tutorial will walk you through creating your own Squirrels project! We will modify a working Squirrels project to create a different one for weather analytics. Once you’ve completed the tutorial, you will understand many of the key features of Squirrels!

Step 1: Complete the quickstart

Create a new folder for your project, and open it up in your favourite coding editor (such as VS Code). The name of the folder is your choice (one example is squirrels-tutorial). Follow instructions in Quickstart to quickly get a sample working project going. Then, in the .env file, set the SQRL_SECRET__ADMIN_PASSWORD environment variable to something of your choice (it is randomly generated by default).

Step 2: Add the weather database

Now add the SQLite database we will use for the rest of the tutorial. Simply run:

sqrl get-file weather.db

This adds a weather.db file in the assets folder. Feel free to delete the other files in the assets folder.

Note that this is mainly done for tutorial purposes. For most production use cases, you would simply specify the database connection string (more details on this soon) to an external database and not bring a copy of the database into your project.

Step 3: Define project configurations

Open the squirrels.yml file. This is the project configuration file that exists for all Squirrels projects, and it is used to configure many properties of the Squirrels project in yaml. In this step, we will focus on the project_variables, connections, and datasets sections. Setting the project variables The project variables name and major_version are required. The label and description are optional. You are also free to add any of your own project variables here. In this tutorial, we will be making datasets for historical weather data. Change the name to weather, label to Weather Analytics, and provide a description. We will leave major_version as is. The project_variables section should now look like this:

project_variables:
  name: weather
  label: Weather Analytics
  description: Analyze weather statistics from 2012 to 2015.
  major_version: 1

Setting the database connections The connections section is where we set all the database connection details that we need. We provide a list of connection names here and refer to them in other files. The connection name default must be provided for models that don’t specify a connection name explicitly. Under default, change the uri field to sqlite:///{project_path}/assets/weather.db. Change the label to “SQLite Weather Database”.

You can also substitute environment variables defined in the .env file using Jinja. For instance, if there is an environment variable called “SQLITE_CONN_STR” in .env, then you can also set the url to:

{{ env_vars.SQLITE_CONN_STR }}

The connections section should now look like this:

connections:
  - name: default
    label: SQLite Weather Database
    type: sqlalchemy
    uri: sqlite:///{project_path}/assets/weather.db

Defining the datasets The datasets section is where we define the attributes for all datasets created by the Squirrels project. Every dataset defined will have their own “parameters API” and “dataset result API”. Currently, you may see two datasets configured. We will only have one dataset for this tutorial. Change the datasets section to look like the following instead:

datasets:
  - name: weather_by_period
    label: Weather by Time of Year
    description: Gather weather statistics (precipitation, temperature, wind speed, etc.) by time of year or condition
    model: weather_by_period  # optional if same as dataset name
    parameters:
      - group_by_dim

The model field is the name of the target data model that we will create later. We will create the “group_by_dim” parameter that this dataset uses in the next step.

Step 4: Create the dataset parameters

Go into the pyconfigs/parameters.py file. This file contains the definitions of all the widget parameters used in the dataset. Define the parameters We will rewrite this file. Remove all the existing code in the file and replace it with the following:

from squirrels import parameters as p, parameter_options as po


@p.SingleSelectParameter.create_simple(
    name="group_by_dim", label="Group By", 
    description="Dimension(s) to aggregate by"
)
def group_by_options():
    return [
        po.SelectParameterOption(
          id="year", label="Year", 
          dim_col="year"
        ),
        po.SelectParameterOption(
          id="quarter", label="Quarter", 
          dim_col="quarter"
        ),
        po.SelectParameterOption(
          id="month", label="Month", 
          dim_col="month_name", order_by_col="month_order"
        ),
        po.SelectParameterOption(
          id="day", label="Day of Year", 
          dim_col="day_of_year"
        ),
        po.SelectParameterOption(
          id="cond", label="Condition", 
          dim_col="condition"
        )
    ]

In this initial step, we defined a function decorated with @p.SingleSelectParameter.create_simple to create a single-select parameter. The function returns a list of parameter options as SelectParameterOption objects. The id and label parameters to the SelectParameterOption constructor are required. Arbitrary keyword arguments such as “dim_col” and “order_by_col” can be specified to the SelectParameterOption constructor, which will be treated as custom fields to the parameter option.

The SelectParameterOption class has an “is_default” attribute to specify the parameter option(s) that are selected by default. By default, “is_default” is set to False. When none of the parameter options have “is_default” set as True, the first option is selected by default for single-select parameters, and nothing is selected by default for multi-select parameters.

The possible widget parameter types supported today are SingleSelectParameter, MultiSelectParameter, DateParameter, DateRangeParameter, NumberParameter, NumberRangeParameter, and TextParameter. Each parameter type can be created with one of the following decorators: create_simple, create_with_options, or create_from_source. Every decorator takes “name” and “label” as required arguments.For SingleSelectParameter, the arguments for create_simple and create_with_options are similar. The difference is that create_with_options lets you specify a parent parameter for cascading the shown options. For non-select parameter types like DateParameter, there are more differences.

Step 5: Create the context file

The context file is a Python file that runs in real-time to transform parameter selections and/or authenticated user attributes into meaningful values that can be used by dynamic data models. Change the pyconfigs/context.py file to look like the following:

from typing import cast, Any
from squirrels import ContextArgs, parameters as p

from pyconfigs.user import CustomUserFields


def main(ctx: dict[str, Any], sqrl: ContextArgs) -> None:
    """
    Define context variables AFTER parameter selections are made by adding entries to the dictionary "ctx". 
    These context variables can then be used in the models.
    """
    
    # The param_exists method confirms whether the "group_by_dim" parameter exists for the current dataset.
    # If it does, we define two context variables called "dim_col" and "order_col".
    if sqrl.param_exists("group_by_dim"):
        group_by_param = cast(p.SingleSelectParameter, sqrl.prms["group_by_dim"])
        ctx["dim_col"] = group_by_param.get_selected("dim_col")
        ctx["order_col"] = group_by_param.get_selected("order_by_col", default_field="dim_col")
    
    # Define the context variable "role" based on whether the user is authenticated and its fields(s). See "user.py" for defining custom user fields.
    # This is shown for demonstration purposes - the "role" context variable will not be used in any data models in this tutorial.
    custom_user_fields = cast(CustomUserFields, sqrl.user.custom_fields)
    ctx["role"] = custom_user_fields.role

In this example, we define context variables “dim_col” and “order_col” based on the “group_by_dim” parameter selection. We also define the “role” context variable based on the authenticated user’s attribute(s), which is done for demonstration purposes (the “role” context variable is not used in the rest of this tutorial).

The available methods on the parameter object depends on the parameter type. For example, SingleSelectParameter objects have a get_selected method to get a custom field from the selected option.By casting the parameter object to type SingleSelectParameter, this makes it easier to explore the available methods on the parameter through an IDE’s linting and auto-complete features.

Step 6: Create sources

The models/sources.yml file lets us document the metadata of sources from our database tables. The weather.db SQLite database we retrieved earlier contains a table called “weather”. Replace the sources.yml file with the following contents:

sources:
  - name: src_weather
    description: Source table for weather metrics by day over time
    connection: default
    table: weather
    load_to_vdl: true

    columns:
      - name: date
        type: string
        description: The date of the weather statistics in YYYY-MM-DD format
        category: dimension
      
      - name: precipitation
        type: float
        description: The amount of precipitation for the time period in centimeters
        category: measure

      - name: temp_max
        type: float
        description: The maximum temperature for the time period in degrees Celsius
        category: measure
      
      - name: temp_min
        type: float
        description: The minimum temperature for the time period in degrees Celsius
        category: measure
      
      - name: wind
        type: float
        description: The average wind speed for the time period in km/h
        category: measure
      
      - name: condition
        type: string
        description: The weather condition of the time period (e.g. "sun", "fog", "rain")
        category: dimension

Step 7: Create seeds

Seeds are CSV data files that can be used by other data models. For example, we can create a seed that maps month numbers to month names. Create a file named seed_month_names.csv in the seeds/ folder with the following contents. Feel free to delete the other files in the seeds/ folder.

month_order,month_name
1,January
2,February
3,March
4,April
5,May
6,June
7,July
8,August
9,September
10,October
11,November
12,December

In addition, create a seeds/seed_month_names.yml file to add metadata for the seed.

description: Month number to month name mapping

columns:
  - name: month_order
    type: integer
    description: The month number
    category: dimension
  
  - name: month_name
    type: string
    description: The human-readable month name
    category: dimension

Information in the yaml file may be useful for the Squirrels framework as well.

Step 8: Create data models from SQL queries

For data models that are created by code, the Squirrels framework supports:

Creating build models (tables/views to be built offline) from SQL (DuckDB dialect) or Python files
Creating dbview models (queries that run on an external database in real-time) from SQL files (dialect of the database connection used)
Creating federate models (queries that run in the API server in real-time) from SQL (DuckDB dialect) or Python files

Sources, seeds, and build models are known as “static data models”. Dbview models and federate models are known as “dynamic data models”. We have already configured the “source” and “seed” models. For this tutorial, we will create a build model and a federate model with SQL, but will not create any dbview models. Since we have replaced the source model in sources.yml, data models that are downstream of the replaced source model will no longer work. Delete the existing files in the models/builds/, models/dbviews/, and models/federates/ folders. Feel free to delete the existing files in the macros/ folder as well. Define a macro First, create a macros/metrics.sql file with the following contents.

{%- macro get_metrics() -%}
    CAST(SUM(precipitation) AS DECIMAL(6, 1)) AS precipitation,
    CAST(MAX(temp_max) AS DECIMAL(4, 1)) AS temp_max,
    CAST(MIN(temp_min) AS DECIMAL(4, 1)) AS temp_min,
    CAST(AVG(wind) AS DECIMAL(6, 4)) AS wind
{%- endmacro -%}

Macros allow us to reuse the same SQL text in multiple places, including across different data models. Define the build model “Build models” are defined in the models/builds/ folder. Create a models/builds/weather_by_date.sql file with the following contents.

WITH
weather_by_date AS (

    SELECT
        date::VARCHAR AS date,
        EXTRACT(YEAR FROM date::DATE)::INT AS year,
        EXTRACT(MONTH FROM date::DATE)::INT AS month_order, 
        EXTRACT(DOY FROM date::DATE)::INT AS day_of_year,
        'Q' || CEIL(EXTRACT(MONTH FROM date::DATE) / 3)::INT AS quarter,
        MODE(condition)::VARCHAR AS condition,
        {{ get_metrics() | indent(4) }}
    
    FROM {{ ref("src_weather") }}
    
    GROUP BY date

)
SELECT
    date,
    condition,
    year,
    m.month_order,
    m.month_name,
    day_of_year,
    quarter,
    precipitation,
    temp_max,
    temp_min,
    wind

FROM weather_by_date
    LEFT JOIN {{ ref("seed_month_names") }} AS m USING (month_order)

This query finds the total precipitation, max/min temperature, and average wind speed for each date in the source table.

The SQL file is templated with Jinja. It calls the macros:

get_metrics() which is defined in the macros/metrics.sql file we created earlier
ref("src_weather") which references the “src_weather” source we defined earlier

Build models are able to call ref on sources (that have load_to_vdl: true), seeds, and other build models.

Let’s also add metadata for the build model in the models/builds/weather_by_date.yml file.

description: |
  This build model aggregates weather data by date and adds additional time-based attributes.

materialization: TABLE  # one of "TABLE" or "VIEW" - defaults to "VIEW" for SQL models if not specified

depends_on:             # optional for SQL models - Squirrels can automatically track dependencies for SQL models via the `ref()` macro
  - src_weather
  - seed_month_names

columns:
  - name: date
    depends_on:
      - src_weather.date
    pass_through: true

  - name: year
    type: integer
    description: The year extracted from the date
    category: dimension
    depends_on:
      - src_weather.date

  - name: month_order
    type: integer
    description: The order of the month
    category: dimension
    depends_on:
      - src_weather.date
      - seed_month_names.month_order

  - name: month_name
    type: string
    description: The name of the month
    category: dimension
    depends_on:
      - src_weather.date
      - seed_month_names.month_name

  - name: day_of_year
    type: integer
    description: The day of the year (1-366)
    category: dimension
    depends_on:
      - src_weather.date

  - name: quarter
    type: string
    description: The quarter (Q1, Q2, Q3, or Q4) derived from the month
    category: dimension
    depends_on:
      - src_weather.date

  - name: condition
    type: string
    description: The most common weather condition of the date
    category: dimension
    depends_on:
      - src_weather.condition

  - name: precipitation
    type: decimal
    description: Total precipitation of the date in centimeters, rounded to 1 decimal place
    category: measure
    depends_on:
      - src_weather.precipitation

  - name: temp_max
    type: decimal
    description: Maximum temperature of the date in degrees Celsius, rounded to 1 decimal place
    category: measure
    depends_on:
      - src_weather.temp_max

  - name: temp_min
    type: decimal
    description: Minimum temperature of the date in degrees Celsius, rounded to 1 decimal place
    category: measure
    depends_on:
      - src_weather.temp_min

  - name: wind
    type: decimal
    description: Average wind speed of the date in km/h, rounded to 4 decimal places
    category: measure
    depends_on:
      - src_weather.wind

Define the federate model Federate models are defined in the models/federates/ folder. Create a models/federates/weather_by_period.sql file with the following contents.

WITH
cte_weather_grouped AS (

    SELECT
        {{ ctx.dim_col }} AS dim_value,
        {{ ctx.order_col }} AS ordering,
        {{ get_metrics() | indent(4) }}

    FROM {{ ref("weather_by_date") }}

    GROUP BY dim_value, ordering

)
SELECT
    '{{ ctx.dim_col }}' AS dimension_type,
    dim_value AS dimension_value,
    precipitation,
    temp_max AS temperature_max,
    temp_min AS temperature_min,
    wind

FROM cte_weather_grouped

ORDER BY ordering

This query finds the total precipitation, max/min temperature, and average wind speed for each group based on the “Group By” parameter selection in real-time.

The {{ ctx.dim_col }} and {{ ctx.order_col }} variables are used to reference the context variables defined in the context.py file.Just like the build model, we use the get_metrics macro again. We also use the ref macro to reference the build model created earlier.Federate models are able to call the ref macro on sources (that have load_to_vdl: true), seeds, build models, dbview models, and other federate models.

Also, create the models/federates/weather_by_period.yml file to add metadata for the federate model.

description: |
  This model aggregates weather data by time of year or condition to show weather statistics.

depends_on:             # optional for SQL models - Squirrels can automatically track dependencies for SQL models via the `ref()` macro
  - weather_by_date

columns:
  - name: dimension_type
    type: string
    description: The type of dimension (e.g. "year", "month", "condition")
    category: dimension
  
  - name: dimension_value
    type: string
    description: The value of the dimension (e.g. "2020", "January", "sun")
    category: dimension

  - name: precipitation
    type: decimal
    description: Total precipitation of the time period or condition in centimeters, rounded to 1 decimal place
    category: measure
    depends_on:
      - weather_by_date.precipitation

  - name: temperature_max
    type: decimal
    description: Maximum temperature of the time period or condition in degrees Celsius, rounded to 1 decimal place
    category: measure
    depends_on:
      - weather_by_date.temp_max

  - name: temperature_min
    type: decimal
    description: Minimum temperature of the time period or condition in degrees Celsius, rounded to 1 decimal place
    category: measure
    depends_on:
      - weather_by_date.temp_min

  - name: wind
    type: decimal
    description: Average wind speed of the time period or condition in km/h, rounded to 4 decimal places
    category: measure
    depends_on:
      - weather_by_date.wind

For SQL models that use the ref macro, the depends_on field is optional. However, it is required for Python models, and it is recommended to specify it for SQL models as well.

Similar to federate models, dbview models are also run in real-time and can change behaviour based on parameter selections or authenticated user.The main difference is that dbview models run on an external database (instead of the API server), and can only reference sources that share the same database connection.

Step 9: Development testing

You can build the static data models by running:

sqrl build

Then, activate the API server by running:

sqrl run

Take the time now to explore the project in Squirrels Studio before proceeding.

Use AI to ask questions about your datasets! 🤖While the API server is active, you have an MCP server running at:

http://localhost:4465/mcp

Check out our guide to Connect MCP Clients! This allows you to use your favourite AI agent to interact with your Squirrels project.

Congratulations, you have reached the end of the tutorial!

What’s next?

TBA

Get started

​Step 1: Complete the quickstart

​Step 2: Add the weather database

​Step 3: Define project configurations

​Step 4: Create the dataset parameters

​Step 5: Create the context file

​Step 6: Create sources

​Step 7: Create seeds

​Step 8: Create data models from SQL queries

​Step 9: Development testing

​What’s next?

Step 1: Complete the quickstart

Step 2: Add the weather database

Step 3: Define project configurations

Step 4: Create the dataset parameters

Step 5: Create the context file

Step 6: Create sources

Step 7: Create seeds

Step 8: Create data models from SQL queries

Step 9: Development testing

What’s next?