# SQLNB Format Specification

Version 0.0.2

## Overview

SQLNB is a SQL notebook format: plain `.sql` files with configuration in comments. Cells are delimited by semicolons (`;`).

- A cell’s *data identity* (the thing other cells can reference) is the cell name (`@name`).
- A cell’s *display title* comes from `@name [Optional Title]` (not from chart directives).

## Cell Structure

A notebook is a sequence of cells.

A **cell** is:

- optional **cell configuration** (comment directives immediately before the cell body, with no blank line), plus
- a **cell body**, which is either:
  - **SQL**, terminated by a semicolon, or
  - **no SQL** (an empty statement) for chart-only cells (see “Chart-only cells”).

End-of-file acts as an implicit final terminator.

```sql
select 1;

select 2;
```

This produces two cells. Blank lines between cells are ignored.

### Semicolons in strings/comments

Semicolons inside SQL string literals or SQL comments do not terminate a cell. Implementations must parse statements accordingly.

## Notebook Configuration

Configuration at the top of the file, before any SQL. A blank line separates notebook configuration from the first cell.

```sql
-- connection: analytics_warehouse
-- cache: 1h

select * from users;
```

| option       | type     | default   | description                      |
| ------------ | -------- | --------- | -------------------------------- |
| `connection` | string   | `default` | Default connection for all cells |
| `cache`      | duration | `0`       | Default cache duration           |

### Configuration syntax

**Single-line:** `-- key: value`

```sql
-- connection: warehouse
-- cache: 1h
```

**Block (YAML-ish):** For complex configuration, use SQL block comments:

```sql
/*
connection: warehouse
cache: 1h
*/
```

## Cell Configuration

Configuration immediately before a cell’s body (no blank line).

```sql
-- connection: replica
select * from logs;
```

| option       | type                      | description                                                           |
| ------------ | ------------------------- | --------------------------------------------------------------------- |
| `connection` | string                    | Override notebook connection                                          |
| `@name`      | -                         | Name this cell (creates a dataset reference)                          |
| `time`       | string \| `auto` \| `none` | Canonical time column for global time filtering (optional; see below) |
| `dataset`    | string                    | Reference an upstream named dataset (chart-only cells; see below)     |

### Naming cells

Format: `-- @name` or `-- @name [Optional Title]`

- The name is used for dataset references.
- The display title defaults to the name transformed to title case with underscores replaced by spaces (`@daily_revenue` → “Daily Revenue”).
- Override display title with brackets: `@daily_revenue [Revenue by Day]`.

### Scope

Connection scope is determined by blank lines:

```sql
-- connection: warehouse

select * from users;

-- connection: replica
select * from logs;
```

The first connection is notebook-level (blank line before SQL). The second is cell-level (no blank line).

### Global time filtering metadata (optional)

`time` declares the canonical date/time column for the cell, for tools that support a global date-range UI.

- `time: <column>` explicitly sets the canonical time column.
- `time: auto` (default) lets implementations infer it (commonly from the chart `x` field if it’s temporal, or the only temporal column in the result).
- `time: none` opts out (never apply global time filtering to this cell).

SQLNB does not standardize how filtering is applied (SQL rewrite vs post-filter vs semantic-layer). It only standardizes how to declare/override/disable the canonical time column.

## Directive syntax

Cell configuration supports directives for visualization.

**Single-line:** `-- directive { key: value, key: value }`

```sql
-- bar { x: "month", y: "revenue" }
select * from orders;
```

- Directive syntax uses comma-separated `key: value` pairs inside braces.
- String values use double quotes.
- Numbers, booleans, and arrays are unquoted.
- Enum-like values (e.g. legend positions) are strings, so they must be quoted in single-line form.

```sql
-- line { x: "date", y: ["revenue", "profit"], legend: "right", stacked: true }
```

### Block directives (YAML-ish)

For multi-line config, use a SQL block comment. The block may include a name directive (`@name`) plus YAML keys.

```sql
/*
@daily_revenue [Revenue by Day]
time: date
bar:
  x: date
  y: revenue
  color: region
*/
select date, region, sum(amount) as revenue
from orders
group by 1, 2;
```

## Visualization

Directives control how cell results render. Without a directive, results render as a table.

### table

```sql
-- table { freeze_columns: 2, freeze_rows: 1 }
select * from employees;
```

| option           | type   | description               |
| ---------------- | ------ | ------------------------- |
| `freeze_columns` | number | Columns to freeze on left |
| `freeze_rows`    | number | Rows to freeze on top     |

### Charts

Types: `bar`, `line`, `area`, `scatter`, `pie`, `donut`

#### Title behavior (important)

Charts do not have a `title` option.

- Use `@name [Optional Title]` to set the cell display title.
- If a cell is unnamed, the chart has no title by default.

#### Example

```sql
/*
@financial_performance [Financial Performance]
line:
  x: date
  y: [revenue, profit]
  legend: right
  xFormat: "%Y-%m-%d"
  yFormat: ",.0f"
*/
select date, revenue, profit
from financial_metrics;
```

#### Options

| option      | type                                                        | description                                                      |
| ----------- | ----------------------------------------------------------- | ---------------------------------------------------------------- |
| `x`         | string                                                      | Field for X axis                                                 |
| `y`         | string \| string[]                                          | Field(s) for Y axis                                              |
| `series`    | ChartSeries[]                                               | Additional series with per-series chart type                     |
| `color`     | string                                                      | Field for color encoding                                         |
| `stacked`   | boolean                                                     | Stack bars/areas (default: true for bar/area)                    |
| `legend`    | `auto` \| `top` \| `bottom` \| `left` \| `right` \| `none`  | Legend position (default: `auto`)                                |
| `xLabel`    | string                                                      | X axis label                                                     |
| `yLabel`    | string                                                      | Y axis label                                                     |
| `xFormat`   | string                                                      | X axis format (d3-time-format)                                   |
| `yFormat`   | string                                                      | Y axis format (d3-format)                                        |
| `sort`      | `none` \| `x-asc` \| `x-desc` \| `value-asc` \| `value-desc` | Sort order (default: x-asc for bar)                              |
| `yScale`    | `linear` \| `log`                                           | Y axis scale (default: linear)                                   |
| `size`      | string                                                      | Field for bubble size (scatter only)                             |
| `opacity`   | string \| number                                            | Field for opacity or fixed value (0-1)                           |
| `labels`    | string                                                      | Field to display as text labels on marks                         |
| `maxSeries` | number                                                      | Max categories/series before grouping into “Other” (default: 12) |
| `showOther` | boolean                                                     | Show “Other” bucket (default: true for stacked/pie)              |

**Legend `auto` semantics:**
- If `color` is set, show legend.
- Else if there are multiple series (via `y: [...]` and/or `series: [...]`), show legend.
- Else no legend.

#### Multi-series sugar

If `y` is an array, it behaves as sugar for multiple series of the directive type:

```yaml
line:
  x: date
  y: [revenue, profit]
```

…is equivalent to two implicit line series.

`series: [...]` can be used to add more series and/or mix types (combo charts) without introducing dual y-axes.

ChartSeries:

| option  | type                                   | description                                  |
| ------- | -------------------------------------- | -------------------------------------------- |
| `y`     | string                                 | Field for this series                        |
| `type`  | `bar` \| `line` \| `area` \| `scatter`  | Chart type for this series                   |
| `label` | string                                 | Display label for this series (legend/tooltip) |

#### View-only transform semantics

The following options affect rendering only and do not change the underlying dataset produced by the cell for downstream references:

- `stacked`
- `sort`
- `maxSeries`
- `showOther`

Format strings:
- Dates: d3-time-format (`%Y-%m-%d`, `%b %d`, `%H:%M`)
- Numbers: d3-format (`,.0f` for thousands, `.2f` for decimals, `$,.2f` for currency)

### Chart-only cells

Reference a named cell's dataset without executing SQL:

```sql
/*
line:
  x: date
  y: revenue
dataset: daily_revenue
*/
;
```

Rules:

- `dataset` references an upstream cell named with `@name` (reference uses the name without `@`).
- The cell must not include SQL when `dataset` is present.
- The cell body must be an empty statement (a lone `;`) unless the file ends immediately after the block.
- No SQL is executed; the cell renders the referenced dataset.
- Chaining is supported: cell C can reference cell B which references cell A.

#### Naming chart-only cells (optional)

A chart-only cell may also be named (e.g. to give it a different display title). This creates a dataset alias that references the same underlying dataset.

```sql
/*
@revenue_trend [30-day Trend]
line:
  x: date
  y: revenue
dataset: daily_revenue
*/
;
```

## Complete Example

```sql
-- connection: analytics_warehouse
-- cache: 1h

/*
@daily_revenue [Daily Revenue]
time: date
bar:
  x: date
  y: revenue
*/
select
  date(order_timestamp) as date,
  sum(amount) as revenue
from orders
where order_timestamp >= current_date - interval '30 days'
group by 1
order by 1;

/*
line:
  x: date
  y: revenue
dataset: daily_revenue
*/
;
```

## Types Reference

### duration

Format: `<number><unit>` where unit is `s` (seconds), `m` (minutes), `h` (hours), or `d` (days).

Examples: `30s`, `5m`, `1h`, `7d`
