The Boring Semantic Layer (BSL) is a lightweight semantic layer based on Ibis.
Key Features:
- Lightweight: pip install boring-semantic-layer
- Ibis-powered: Built on top of Ibis, supporting any database engine that Ibis integrates with (DuckDB, Snowflake, BigQuery, PostgreSQL, and more)
- MCP-friendly: Perfect for connecting Large Language Models to structured data sources
This project is a joint effort by xorq-labs and boringdata.
We welcome feedback and contributions!
1. Define your ibis input table
2. Define a semantic model
3. Query it
Example output (dataframe):
| JFK | 3689 |
| LGA | 2941 |
| ... | ... |
We'll use a public flight dataset from the Malloy Samples repository.
Define your data source and create a semantic model that describes your data in terms of dimensions and measures.
- Dimensions are attributes to group or filter by (e.g., origin, destination).
- Measures are aggregations or calculations (e.g., total flights, average distance).
All dimensions and measures are defined as Ibis expressions.
Ibis expressions are Python functions that represent database operations.
They allow you to write database queries using familiar Python syntax while Ibis handles the translation to optimized SQL for your specific database backend (like DuckDB, PostgreSQL, BigQuery, etc.).
For example, in our semantic model:
- lambda t: t.origin is an Ibis expression that references the "origin" column
- lambda t: t.count() is an Ibis expression that counts rows
- lambda t: t.distance.mean() is an Ibis expression that calculates the average distance
The t parameter represents the table, and you can chain operations like t.origin.upper() or t.dep_delay > 0 to create complex expressions. Ibis ensures these expressions are translated to efficient SQL queries.
Use your semantic model to run queries—selecting dimensions, measures, and applying filters or limits.
Example output:
| JFK | 3689 | 1047.71 |
| PHL | 7708 | 1044.97 |
| ... | ... | ... |
The query method can filter data using raw Ibis expressions for full flexibility.
| JFK | 3689 |
A format that's easy to serialize, good for dynamic queries or LLM integration.
Example output (dataframe):
| LGA | 7000 |
| PHL | 7708 |
BSL supports the following operators: =, !=, >, >=, in, not in, like, not like, is null, is not null, AND, OR
BSL has built-in support for flexible time-based analysis.
To use it, define a time_dimension in your SemanticModel that points to a timestamp column.
You can also set smallest_time_grain to prevent incorrect time aggregations.
Example output:
| PHL | 2013-01-01 | 5 |
| CLE | 2013-01-01 | 5 |
| DFW | 2013-01-01 | 7 |
| DFW | 2013-01-02 | 9 |
| DFW | 2013-01-03 | 13 |
BSL allows you to join multiple SemanticModel instances to enrich your data. Joins are defined in the joins parameter of a SemanticModel.
There are four main ways to define joins:
For full control, you can create a Join object directly, specifying the join condition with an on lambda function and the join type with how (e.g., 'inner', 'left').
First, let's define two semantic models: one for flights and one for carriers.
The flight model resulting from a join with the carriers model:
| Delta Air Lines | MDT | 235 |
| Delta Air Lines | ATL | 8419 |
| Comair (Delta Connections) | ATL | 239 |
| American Airlines | DFW | 8742 |
| American Eagle Airlines | JFK | 418 |
For common join patterns, BSL provides helper class methods inspired by Malloy: Join.one, Join.many, and Join.cross.
These simplify joins based on primary/foreign key relationships.
To use them, first define a primary_key on the model you are joining to. The primary key should be one of the model's dimensions.
Now, you can use Join.one in the flights model to link to carriers_pk_sm. The with_ parameter specifies the foreign key on the flights model.
- Join.one(alias, model, with_): Use for one-to-one or many-to-one relationships. It joins where the foreign key specified in with_ matches the primary_key of the joined model.
- Join.many(alias, model, with_): Similar to Join.one, but semantically represents a one-to-many relationship.
- Join.cross(alias, model): Creates a cross product, joining every row from the left model with every row of the right model.
Querying remains the same—just reference the joined fields using the alias.
Example output:
| Delta Air Lines | 10000 |
| American Airlines | 9000 |
| United Airlines | 8500 |
| Southwest Airlines | 8000 |
| JetBlue Airways | 7500 |
| table | Ibis table expression | Yes | Any Ibis table or view |
| dimensions | dict[str, callable] | Yes | Keys: dimension names; Values: functions mapping table → column |
| measures | dict[str, callable] | Yes | Keys: measure names; Values: functions mapping table → aggregation |
| joins | dict[str, Join] | No | Keys: join alias; Values: Join object (see below) |
| primary_key | str | No | Name of the primary key dimension (required for certain join types) |
| name | str | No | Optional model name (inferred from table if omitted) |
| time_dimension | str | No | Name of the column to use as the time dimension |
| smallest_time_grain | str | No | One of: TIME_GRAIN_SECOND, TIME_GRAIN_MINUTE, TIME_GRAIN_HOUR, TIME_GRAIN_DAY, TIME_GRAIN_WEEK, TIME_GRAIN_MONTH, TIME_GRAIN_QUARTER, TIME_GRAIN_YEAR |
- Use Join.one(alias, model, with_) for one-to-one/many-to-one
- Use Join.many(alias, model, with_) for one-to-many
- Use Join.cross(alias, model) for cross join
| dimensions | list[str] | No | List of dimension names (can include joined fields, e.g. "carriers.name") |
| measures | list[str] | No | List of measure names (can include joined fields) |
| filters | list[dict/str/callable] or dict/str/callable | No | See below for filter formats and operators |
| order_by | list[tuple[str, str]] | No | List of (field, direction) tuples, e.g. [("avg_delay", "desc")] |
| limit | int | No | Maximum number of rows to return |
| time_range | dict with start and end (ISO 8601 strings) | No | Example: {'start': '2024-01-01', 'end': '2024-12-31'} |
| time_grain | str | No | One of: TIME_GRAIN_SECOND, TIME_GRAIN_MINUTE, TIME_GRAIN_HOUR, TIME_GRAIN_DAY, TIME_GRAIN_WEEK, TIME_GRAIN_MONTH, TIME_GRAIN_QUARTER, TIME_GRAIN_YEAR |
- Simple filter (dict):
{"field": "origin", "operator": "=", "value": "JFK"}
- Compound filter (dict):
{ "operator": "AND", "conditions": [ {"field": "origin", "operator": "in", "values": ["JFK", "LGA"]}, {"field": "year", "operator": ">", "value": 2010} ] }
- Callable: lambda t: t.origin == 'JFK'
- String: "_.origin == 'JFK'"
Supported operators: =, !=, >, >=, <, <=, in, not in, like, not like, is null, is not null, AND, OR
Example output:
| JFK | 2015 | 350 |
| LGA | 2015 | 300 |
.png)


