Data Definition Language (DDL)
DDL is the set of features that create, modify, or delete schema entities. This is the SQL definition.
Delightql’s DDL encompasses SQL’s DDL and features with no direct SQL mapping (higher-order predicates, ER-context rules). The thematic concern is reusability: definitions, tables, and data that can be authored, loaded, and referenced.
This section covers:
- Relational rules. Views, higher-order views, ER-context rules, and sigma predicates.
- Function rules. Reusable domain functions.
- Facts. Axiomatic ground data.
- Namespaces. Organization and visibility of all the above.
Basics
Assertion mode vs query mode. Delightql, like Prolog, distinguishes two programming modes:
- Query mode. Expressions entered into a REPL that execute immediately. This was the subject of the first book in this reference.
- Assertion mode. Files that contain definitions for later use.
The features in this DDL section are assertion-mode constructs.
Assertion mode has two general syntactic forms: rules and facts.
Rules
The general form of a rule is
<HEAD> <NECK> <BODY>
young_users(*)
:- adults(*), age < 20
In the above example:
young_users(*)is theHEAD:-is theNECKadults(*), age <20is theBODY. The body may use any DQL feature–the entire previous book applies here.
Facts
Facts are functor forms with grounded data.
parent("Abraham", "Isaac")
This looks like a query. The syntax is context dependent. In assertion mode, i.e. in a file with rules and definitions, this syntax defines extensional data – axiomatic truths with no derivation.
The Two Necks
The neck separates a rule’s head from its body. Delightql has two necks, each defining a different scope and extent:
| Neck | Name |
|---|---|
: |
Shadow neck |
:- |
Rule neck |
The two neck operators
Extent vs Scope
There is a difference between when and where a rule definition is available. The terminology is known by the names extent and scope.
Scope refers to the spatial visibility of a definition. It asks where an abstraction – a name – is available for use.
Extent is temporal lifetime. It asks when an abstraction is available for use. In contrast to regular programming languages, extent matters more for databases where tables and views outlive a process.
The necks in delightql map to different types of scope and extent:
- Query-extent (
:) – Exists only for the duration of one query. These are CTEs. These also have a scope limited by a query. - Session-extent (
:-) – Exists for the duration of the connection. These are temporary views, tables or inlinable definitions. Their scope is determined by namespacing – the subject of a later chapter. - Permanent-extent – Persists after disconnection. These are tables and views.
Shadow Neck :
The shadow neck defines a momentary definition with limited scope. The definition exists only for the single query in which it appears:
young(x) : users(x), age < 30
young(*)
A shadow-neck definition may shadow an existing table or view of the same name for the duration of the query. This scoping behavior is unique to the shadow neck.
Shadow-neck definitions are not DDL – they are reviewed here for syntactic similarity to rules.
Rule Neck (:-)
young_users(*) :- valid_users(*), age < 30
The rule neck (:-) creates a definition that is equivalent to SQL’s CREATE TEMP VIEW. Delightql may choose to create temporary views to implement this abstraction, but may also choose to do expression rewriting.
Rules
The syntax of rules obey the general form:
<HEAD> <NECK> <BODY>
In delightql these rules may be used for any of the following:
- Views
- Tables
- Higher-order views
- ER-contexts
- Sigma predicates
- Functions
All of these use the :- neck. The first five define relations; functions define the special subset of relations that are functions.
Arity and Naming
A rule’s arity is the number of arguments in its head.
Definitions invoked with explicit arguments – functions, sigma predicates, higher-order rules – may share a name with different arities. The call site disambiguates:
add:(x) :- x + 1
add:(x, y) :- x + y
// Invocation is unambiguous
add:(5) // add/1
add:(5, 3) // add/2
Definitions queried with the glob – views, tables, facts – must have fixed arity. The glob presumes a single schema:
employee(id, name) :- ...
employee(*) // expects one schema
If you can write foo(*), all foo definitions must agree on arity. With argumentative heads, agreement is stricter: same arity AND same names at each position. See Head Semantics for the full rules.
Head Semantics
The head of a rule names the entity and declares what it exposes. How the head is written determines the output schema and how multi-clause entities combine.
Two Head Forms
Glob head
young(*)
:- people(*), age > 20
The glob * passes through whatever the body produces. The entity inherits its schema from the body. This is the permissive form – the head makes no claim about column names or count.
Argumentative head
young(name, age)
:- people(*), age > 20
The head declares the entity’s output schema: exactly name and age, in that order. Every clause must satisfy this contract – the body must produce columns with those names (among possibly others).
The body may be wider than the head. The head projects from the body by name, so wide source tables need not be narrowed in the body before the head can name the subset. If the body does not produce a column named in the head, then it is an error.
Ground Terms in the Head
A head position can hold a ground term (a literal) instead of a free variable (a column name).
bracket("old", last_name, first_name)
:- people(*), age > 40
Ground terms in the head inject constants into the output and provide choice semantics for multi-clause disjunctive rules.
Multi-Clause Rules
bracket("old", last_name, first_name)
:- people(*), age > 40
bracket("toddler", last_name, first_name)
:- people(*), age < 4
bracket(category, last_name, first_name)
:- people(*)
Multi-headed rules with the exact same named and shaped head are called disjunctive rules. The disjunctive in disjunctive rules is a reference to the logical OR and manifests in different ways depending on the context:
- sum types in algebraic data types
- tagged unions and/or variants in many programming languages
- union in set theory (and therefore SQL)
- choice in grammar rules
ORin logic
The meaning of multi-headed clauses is exactly UNION ALL.
Disjunctive Rules: Consistency of Head Forms
All clauses of the same entity must use the same head form – either all glob or all argumentative. Mixing is an error:
-- OK: all glob
data(*) :- source_a(*)
data(*) :- source_b(*)
-- OK: all argumentative
data(x, y) :- source_a(*)
data(x, y) :- source_b(*)
-- ERROR: mixed head forms
data(*) :- source_a(*)
data(x, y) :- source_b(*)
Disjunctive Rules: Union Semantics by head form
Glob heads and argumentative heads use different union strategies:
| Strategy | Alignment |
|---|---|
| Glob head (smart union corresponding) | by name, NULLs for gaps |
| Argumentative head | positional AND by name |
Glob heads use smart union corresponding: columns align by name across clauses, and columns missing from a clause are filled with NULL. This permits clauses with different source tables and column sets.
Column naming with ground terms
Argumentative heads require strict agreement: every clause must have the same number of positions, and free variables at each position must use the same name across all clauses. If clause 1 has (name, age) and clause 2 has (age, name) then this is an error.
Ground terms (constants) are nameless – they do not contribute a column name for a position. Free variables provide names. The rules:
- If a position has free variables in one or more clauses, all those free variables must use the same name. Constants in other clauses are compatible (they provide a value but no name).
- Conflicting names at the same position is an error.
-- OK: "category" names position 1; constants in other clauses are compatible
bracket("old", last_name, first_name)
:- people(*), age > 40
bracket("toddler", last_name, first_name)
:- people(*), age < 4
bracket(category, last_name, first_name)
:- people(*)
-- ERROR: position 1 named "motto" in one clause, "city" in another
bracket("old", last_name, first_name)
:- people(*), age > 40
bracket(motto, last_name, first_name)
:- people(*)
bracket(city, last_name, first_name)
:- people(*)
Higher-Order Rules
Higher-order rules accept tables or scalars as parameters. SQL calls these table-valued functions; Prolog would call them higher-order predicates.
Syntax
A higher-order rule is very easy to see. It has two sets of parentheses: the first for parameters – input values, the second for output values.
department_employee_count(E(*), D(*))(department, employee_count) :-
E(*), D(*.(DepartmentId))
|> %(D.department ~> count:(*) as employee_count)
In the above example, the parameters E(*) and D(*) are inner glob functors – they accept whatever tables are passed at invocation. The (*) signals that the body will reference these tables’ columns by name.
Direct Invocation
As discussed in the section on DQL, tables can be passed in directly:
department_employee_count(employee_2019(*), department_2019(*))(*)
The call site mirrors the definition head: each table parameter is a full functor expression. This makes the call self-documenting – you can see which arguments are tables and which are scalars without looking up the definition.
Because call-site arguments are relation expressions, they can compose:
department_employee_count(
employee_2019(*, Salary > 50000),
department_2019(*)
)(*)
Here the first argument is a filtered relation – only high earners are counted.
Piped Invocation
Pipes can be used on any higher-order predicate that takes a table-valued parameter:
clean_employees(T(*))(*) :-
T(*) as t
|> $(trim:())(t.LastName, t.FirstName)
|> $(to_iso:())(t.BirthDate, t.HireDate)
employee_2019(*)
|> clean_employees(*)
The piped relation fills the first parameter. The (*) after the rule name is the output schema, not an input functor.
Chaining is possible:
mask_ssn(T(*), mask_value)(*) :-
T(*) |> $$(mask_value as ssn)
employee_2019(*)
|;| employee_2018(*)
|;| employee_2017(*)
|> clean_employees(*)
|> mask_ssn("***-**-****")(*)
Note. As with function pipes, the relation is piped into the first parameter of the higher-order predicate. If the higher-order predicate has multiple parameters, the other values must be set.
Multi-parameter piped invocation. When the piped relation is not the first parameter, use @ (the f-param placeholder) to mark where it goes – the same syntax as function pipes:
-- Definition: scalar first, table second
tagged(label, T(*))(*) :- T(*), ...
-- Direct invocation (always works):
tagged("young", users(*))(*)
-- Piped invocation with @:
users(*) |> tagged("young", @)(*)
The @ tells the compiler which parameter receives the piped relation. Without @, the piped relation fills the first parameter by default – which fails when the first parameter is a scalar.
Scalar Parameters
A bare identifier without parentheses is a scalar parameter. It binds a single value used directly in body expressions:
high_earners(T(*), salary_floor, min_count)(*) :-
T(*), Salary > salary_floor,
department(*.(DepartmentId))
|> %(department ~> count:(*) as employee_count),
employee_count > min_count
high_earners(employee(*), 50000, 10)(*)
Scalar parameters accept single values only – not tables, not pipes, not multi-row inline data. Functor expressions are visually distinct from scalar literals, so comma separation is unambiguous.
Inner Functors
Higher-order parameters come in four flavors, distinguished by syntax in the definition head:
| Form | Kind | Name |
|---|---|---|
T(*) |
table, structurally typed | inner glob functor |
T(a, b) |
table, positionally typed | inner argumentative functor |
n |
scalar value | scalar parameter |
f:() |
function value | function parameter |
The syntax alone tells the language what each parameter accepts. Capitalization is conventional – programmers should uppercase table parameters and lowercase scalars for readability, but the language does not require it.
Inner Glob Functors
An inner glob functor T(*) is structurally typed: the body references columns by name, and any table that has those columns is accepted regardless of width.
clean_employees(T(*))(*) :-
T(*) as t
|> $(trim:())(t.LastName, t.FirstName)
|> $(to_iso:())(t.BirthDate, t.HireDate)
The parameter T(*) accepts any table with LastName, FirstName, BirthDate, and HireDate columns – it may have other columns too. This is duck typing: if it has the right columns, it fits.
Inner Argumentative Functors
An inner argumentative functor T(a, b) is positionally typed: the input must have exactly two columns, and they are renamed to a and b inside the body. The caller’s original column names are overwritten.
foo(T(label, value))(*) :-
T(*), value > 10 |> (label)
The names label and value are column aliases available in the body. The definition simultaneously declares the arity (two columns) and provides names for positional access.
The advantage of the argumentative functor is in the calling convention, called scalar lifting. Because the definition declares a positional contract, a call site may pass bare scalars instead of a table:
foo("first", 2)(*)
The scalars are positionally matched to the declared columns label and value and lifted into a one-row table. This cascades to stacked notation:
foo("first", 2; "second", 20)(*)
which sugars explicit anonymous tables:
foo(_("first", 2; "second", 20))(*)
but still allows pipe invocation:
two_column_table(*)
|> foo(*)
Or explicit functor invocation:
foo(two_column_table(*))(*)
Scalar lifting requires a positional contract – an inner glob functor cannot accept inline scalars because there is no declared arity to match against.
The & Rule
& is required only when using scalar lifting with an argumentative functor alongside other parameters.
When every table argument is passed as a functor expression, the parentheses disambiguate each argument. Commas separate parameters as usual – no & needed. & is the cost of the scalar-lifting shorthand: when bare scalars fill an argumentative functor, & marks where one argument ends and the next begins.
| Definition | Functor call site | &? |
|
|---|---|---|---|
f(T(*), V(*)) |
f(users(*), orders(*))(*) |
no | |
f(T(*), V(*)) |
users(*) | > f(orders(*))(*) |
no | |
f(T(*), n) |
f(users(*), 10)(*) |
no | |
f(T(*), V(*), n) |
f(users(*), orders(*), 10)(*) |
no | |
f(T(x, y)) |
f(data(col1, col2))(*) |
no | |
f(T(x, y)) |
f(_(1, 2; 10, 20))(*) |
no | |
f(T(x, y)) |
_(1, 2; 10, 20) | > f(*) |
no | |
f(T(*), V(x, y)) |
f(users(*), _(1, 2))(*) |
no | |
f(T(*), V(x, y)) |
f(users(*), data(col1, col2))(*) |
no | |
f(T(*), V(x, y)) |
users(*) | > f(_(1, 2))(*) |
no | |
f(T(*), V(x, y), n) |
f(users(*), _(1, 2), 10)(*) |
no | |
f(T(*), V(x, y), n) |
f(users(*), _(1, 2; 10, 20), 10)(*) |
no | |
f(::ns, n, V(x, y)) |
f(data::prod, "t", _(1, 2))(*) |
no | |
| Scalar lifting (shorthand) | |||
f(T(x, y)) (single) |
f(1, 2)(*) |
no | |
f(T(x, y)) |
f(1, 2; 10, 20)(*) |
no | |
f(T(*), V(x, y)) |
f(users(*) & 1, 2)(*) |
yes | |
f(T(*), V(x, y)) |
f(users(*) & 1, 2; 10, 20)(*) |
yes | |
f(T(x, y), n) |
f("a", "b" & 10)(*) |
yes | |
f(T(x, y), V(a, b)) |
f(1, 2 & 3, 4)(*) |
yes | |
f(::ns, n, V(x, y)) |
f(data::prod & "t" & 1, 2)(*) |
yes |
The table is divided into two regions. In the top region, every table argument uses functor syntax – & is never needed. In the bottom region (scalar lifting), bare scalars fill argumentative functors and & marks the boundaries.
Functor syntax is always available and always unambiguous. Scalar lifting is an optional shorthand for inline data – use & when you use it, or wrap in _() to avoid it.
Parameter Grounding and Multi-Clause HO Entities
The two parentheses partition a single relation into “positions the caller sets” and “columns the body produces.” Both sets follow the same head semantics (see Head Semantics), applied independently.
First parentheses: always argumentative. The parameter parentheses must explicitly declare parameters.
Second parentheses: any head form. The output parentheses follow all standard head rules: glob or argumentative, consistent across clauses, smart union corresponding for globs, strict positional+name agreement for argumentative.
Parameter modes
Table parameters are input-moded – they must be grounded because the body computes over them. There is no way to enumerate “all tables.” Scalar parameters are bidirectional – they can be grounded (filter to matching clauses) or left free (project as a column).
| Parameter kind | Mode |
|---|---|
| Scalar | bidirectional |
| Table | input-only |
A scalar parameter can be left free (unbound at the call site) only when every clause has a ground term at that position.
schema("employees")(name, type) :-
_(name, type ---- "id", "INT"; "name", "TEXT")
schema("departments")(name, type) :-
_(name, type ---- "dept_id", "INT"; "dept_name", "TEXT")
The ground constants define the enumeration domain: schema(entity)(*) enumerates "employees" and "departments" at the parameter position because those are the constants in the clauses.
If any clause has a free variable at that position, the caller must provide a concrete value – either a literal or a variable bound from context (a pipe, CTE, join, etc.).
Scalar parameters as discriminators
When a higher-order entity has multiple clauses with different ground scalar values in a parameter position, invocation with a matching ground term selects the relevant clauses. This is clause selection via equality – the same mechanism as argumentative grounding.
schema("employees")(name, type) :-
_(name, type ---- "id", "INT"; "name", "TEXT")
schema("departments")(name, type) :-
_(name, type ---- "dept_id", "INT"; "dept_name", "TEXT")
Grounded query – filter:
schema("employees")(*)
The ground term "employees" selects the matching clause. Output: (name, type) – two columns. The ground parameter filters but does not project.
Free query – project:
schema(entity)(*)
The first position is free. All clauses contribute (UNION ALL). The free parameter appears as a column:
entity | name | type
------------|----------|--------
employees | id | INT
employees | name | TEXT
departments | dept_id | INT
departments | dept_name| TEXT
This is the same behavior as argumentative grounding on regular relations where stock_ownership(1, stock_id, ...) filters on position 1 and drops it from the output.
Mixed ground/free across clauses
The same scalar position can be ground in some clauses and free in others. This follows standard relational semantics: every clause that matches contributes rows.
foo("a", y)(*) :- ... -- pos 1 ground, pos 2 free (parameter)
foo("b", y)(*) :- ... -- pos 1 ground, pos 2 free (parameter)
foo(x, "c")(*) :- ... -- pos 1 free (parameter), pos 2 ground
Invocation:
_(z ---- "c"; "d"; "e") |> foo("a", z)(*)
For each row, both positions are concrete. Each clause either matches or doesn’t:
- Clause 1:
"a"matches"a",ybinds toz. Selected. - Clause 2:
"a"doesn’t match"b". Excluded. - Clause 3:
xbinds to"a",zmust equal"c". Selected for that row only.
There is no ambiguous dispatch. The relational world has no “which clause wins?” conflict – every matching clause contributes rows.
Sigma Rules
Sigma rules encapsulate reusable boolean logic:
is_high_value(amount) :- amount > 1000
orders(*),
+is_high_value(total),
+like(description, '%ipod')
Disjunctive Clauses
Multiple clauses with the same head are OR-ed together:
no_data("NA"; "N/A"; "UNKNOWN")
empty(column) :- null = column
empty(column) :- trim:(column) = ""
empty(column) :- +no_data(upper:(column))
employee(*),
+empty(LastName),
+empty(FirstName)
SELECT *
FROM employee
WHERE (LastName IS NULL
OR trim(LastName) = ''
OR upper(LastName)
IN ('NA', 'N/A', 'UNKNOWN'))
AND (FirstName IS NULL
OR trim(FirstName) = ''
OR upper(FirstName)
IN ('NA', 'N/A', 'UNKNOWN'));
Requirements.
To create a sigma rule:
- The head is a relational functor with arguments
- The neck is
:- - The body consists of conjoined sigma predicates
- Each parameter must appear at least once in the body
- In disjunctive form, each clause must reference at least one parameter
Sigma predicates include:
- Infix comparisons:
age < 20,LastName = 'Johnson'- Functor predicates:
+like(description, 'ipod%'),+between(Salary, 50000, 100000)instatements:state in ("MA"; "TX"; "CA")- Existence tests:
+other_table(...),\+other_table(...)
ER-Context Rules
ER-context rules are an answer to the question: what if we could define entity relationships that inform queries?
Normalized schemas encode relationships with foreign keys but query expressions do not take advantage of this. SQL requires repeating join conditions in every query – unaware of DDL constraints.
ER-context rules define these relationships once. The & and && operators reference them concisely.
Defining Relationships
An ER-rule declares how two tables join. The head uses & between table names; the body is the join expression:
users&orders(*) within normal :-
users(*), orders(*), users.id = orders.user_id
orders&items(*) within normal :-
orders(*), items(*), orders.id = items.order_id
items&products(*) within normal :-
items(*), products(*), items.product_id = products.id
The within clause assigns the rule to a named context.
Multiple Contexts
The same table pair can have different join semantics in different contexts:
users&orders(*) within normal :-
users(*), orders(*), users.id = orders.user_id
users&orders(*) within audit :-
users(*), orders(*), users.id = orders.created_by
orders&audit_log(*) within audit :-
orders(*), audit_log(*), orders.id = audit_log.order_id
The context name is any valid identifier.
Using Contexts
The under directive activates a context. It must be the first token in the query:
under normal: users(*) & orders(*)
under audit: users(*) & orders(*)
The directive applies to the entire query scope. Mixing contexts in one query is not permitted.
Direct Join (&)
The & operator performs a direct lookup in the current context:
under normal: users(*) & orders(*)
Equivalent to:
users(*), orders(*), users.id = orders.user_id
Multiple & operators chain left-to-right. Each consecutive pair must have a defined ER-rule:
under normal: users(*) & orders(*) & items(*)
Compiles to:
users(*), orders(*), items(*),
users.id = orders.user_id,
orders.id = items.order_id
Transitive Join (&&)
The && operator finds a path through the ER-graph:
under normal: users(*) && products(*)
No direct users&products rule exists, but the path does: users -> orders -> items -> products.
Ambiguity is an error. If multiple paths exist, the query fails:
users&orders(*) within normal :- ...
orders&items(*) within normal :- ...
users&items(*) within normal :- ... // creates a cycle
under normal: users(*) && items(*)
// Error: Ambiguous join path from 'users' to 'items':
// Path 1: users -> orders -> items
// Path 2: users -> items (direct)
Recursion in Rules
Recursion in delightql emerges from self-reference. When a predicate’s definition includes a clause that references the predicate itself, the definition is recursive. When a common table expression includes a clause that references the CTE itself, the CTE is recursive. Both transpile to SQL’s WITH RECURSIVE construct.
This chapter covers the semantics of recursion in delightql, how it maps to SQL’s execution model, and the constraints that model imposes.
Two Forms of Recursion
Delightql supports recursion in two contexts:
Recursive rules are defined in assertion mode and persist as reusable predicates:
ancestor(person, anc) :-
parent(person, anc)
ancestor(person, anc) :-
parent(person, p), ancestor(p, anc)
Recursive CTEs are defined inline in query mode, scoped to a single query:
ancestor(*) : parent(*) |> (person, parent as anc)
ancestor(*) : parent(*) as p, ancestor(*) as a, p.parent = a.person
|> (p.person, a.anc)
ancestor(*)
Both forms transpile to WITH RECURSIVE. The choice depends on whether the recursive logic is reusable (rule) or ad-hoc (CTE).
The Anatomy of Recursion
Every recursive definition has two components:
Base clauses provide initial rows without self-reference. These are SQL’s “anchor members”:
_(n @ 1) : counter // literal base case
org(*), title = "CEO" : mgmt // filtered base case
edge(*) |> (origin, dest) : reachable // projected base case
Recursive clauses reference the predicate or CTE being defined. These are SQL’s “recursive members”:
counter(*), n < 100 |> (n + 1 as n) : counter
mgmt(*) as m, org(*) as o, o.boss = m.name : mgmt
reachable(*) as r, edge(*) as e, r.dest = e.origin
|> (r.origin, e.dest) : reachable
Delightql does not require base clauses to precede recursive clauses in source order. The transpiler identifies which clauses are recursive (they reference the target name) and emits them in the order SQL requires.
Evaluation Model
SQL’s recursive CTEs evaluate using a working table algorithm:
- Execute all base clauses; their results form the initial working table
- Execute the recursive clause with the working table as input
- The output becomes the new working table
- Repeat until the working table is empty
- Return the union of all iterations
This is bottom-up or co-recursive evaluation: starting from known facts, derive new facts, repeat until fixed point. It resembles dynamic programming more than classical recursion.
The critical implication: the recursive clause sees only the previous iteration’s rows, not the full accumulated result. This is why certain operations are prohibited – they would require access to rows that haven’t been computed yet or have already been consumed.
What Recursion Can Express
SQL’s recursive model handles a well-defined class of problems:
Hierarchical traversal – org charts, bill of materials, folder structures:
folders(*), parent_id = null |> (id, name, 0 as depth) : tree
folders(*) as f, tree(*) as t, f.parent_id = t.id
|> (f.id, f.name, t.depth + 1) : tree
tree(*)
Transitive closure – reachability, ancestry, dependency graphs:
edge(*) |> (origin, dest) : reachable
reachable(*) as r, edge(*) as e, r.dest = e.origin
|> (r.origin, e.dest) : reachable
reachable(*) |> %(*) // deduplicate
Sequence generation – numeric ranges, date series, iteration:
_(d @ date:(2024, 1, 1)) : dates
dates(*), d < date:(2024, 12, 31)
|> (d + interval:(1, 'day') as d) : dates
dates(*)
Iterative computation – any algorithm expressible as “given previous state, compute next state”:
_(iter @ 0, x @ 1.0, target @ 2.0) : newton
newton(*), abs:(x*x - target) > 0.0001, iter < 100
|> (iter + 1, (x + target/x) / 2.0 as x, target) : newton
newton(*) |> %(target ~> max:(x) as sqrt)
What Recursion Cannot Express
The working-table model imposes fundamental limitations. These are not arbitrary restrictions – they follow from the evaluation semantics.
No Aggregation in Recursive Clauses
Aggregation requires access to multiple rows. The recursive clause sees only the working table (previous iteration), not the full accumulated result.
// INVALID -- cannot aggregate within recursion
subtree(*) as s, node(*) as n, n.parent = s.id
|> (n.id, s.total + n.value as total) : subtree // seems ok?
// But this fails:
subtree(*) as s, node(*) as n, n.parent = s.id
~> (s.id, sum:(n.value) as total) : subtree // aggregation -- NOT ALLOWED
No Subqueries Referencing the Recursive Target
A subquery inside the recursive clause cannot reference the CTE being defined:
// INVALID -- subquery references 'paths'
edge(*) as e, paths(*) as p, e.origin = p.dest,
\+ paths(*, e.dest = dest) // "dest not already reached" -- NOT ALLOWED
|> (p.origin, e.dest) : paths
The subquery paths(*, ...) would need to see all accumulated rows, which aren’t available.
No Mutual Recursion
Two predicates cannot reference each other:
// INVALID -- mutual recursion
even(0)
even(n) :- odd(m), n = m + 1
odd(n) :- even(m), n = m + 1
SQL’s WITH RECURSIVE processes one CTE at a time. There’s no mechanism for two CTEs to co-evolve.
Single Self-Reference
The recursive clause may reference the target exactly once:
// INVALID -- two self-references
paths(*) as p1, paths(*) as p2, p1.dest = p2.origin
|> (p1.origin, p2.dest) : paths
This would require joining the working table against itself, which SQL doesn’t support in recursive CTEs.
Termination
Recursive CTEs terminate when the recursive clause produces no new rows. This happens when:
- A
WHEREcondition filters out all candidates - A join finds no matches
- The depth limit (
#) is reached - The data is exhausted (finite traversal)
Ensuring termination:
For sequence generation, always include a bound:
nums(*), n < 1000 |> (n + 1 as n) : nums // terminates at 1000
For graph traversal over potentially cyclic data, track visited nodes:
edge(*) |> (origin, dest, ',' || origin || ',' as visited) : paths
edge(*) as e, paths(*) as p, p.dest = e.origin,
p.visited not like '%,' || e.dest || ',%'
|> (p.origin, e.dest, p.visited || e.dest || ',') : paths
For unknown depth, use # as a safety limit:
tree(*) as t, node(*) as n, n.parent = t.id, # < 100
|> (...) : tree
UNION vs UNION ALL
By default, delightql emits UNION ALL – duplicates across iterations are preserved. This is efficient and correct for most traversals.
For graph traversal where the same node may be reached via multiple paths, duplicates accumulate. To deduplicate the final result:
edge(*) |> (origin, dest) : reachable
edge(*) as e, reachable(*) as r, r.dest = e.origin
|> (r.origin, e.dest) : reachable
reachable(*) |> %(*) // deduplicate at the end
Higher-Order Recursive Predicates
Recursive rules can be parameterized, deferring the base case:
reports_to(boss)(name) :- employee(*), name = boss.
reports_to(boss)(name) :-
employee(*) as e,
reports_to(boss)(*) as r,
e.manager = r.name
|> (e.name).
Each invocation monomorphizes to a concrete WITH RECURSIVE:
reports_to("Alice")(*) // who reports to Alice?
reports_to("Bob")(*) // who reports to Bob?
The higher-order parameter boss is inlined into the anchor clause at query time. The recursive structure itself doesn’t change – only the starting point.
Example: Mandelbrot Set
This example demonstrates sequence generation, computational iteration, and post-recursion aggregation working together:
_([email protected]) : xaxis
xaxis(*), x < 1.2
|> (x + 0.05 as x) : xaxis
_([email protected]) : yaxis
yaxis(*), y < 1.0
|> (y + 0.1 as y) : yaxis
sq:(x):
x * x
xaxis(*), yaxis(*)
|> (0 as iter,
x as cx,
y as cy,
0.0 as x,
0.0 as y) : m
m(*), (sq:(x) + sq:(y)) < 4.0,
iter < 28
|> (iter + 1 as iter,
cx as cx,
cy as cy,
(sq:(x) - sq:(y)) + cx as x,
((2.0 * x) * y) + cy as y) : m
m(*)
|> %(cx,cy ~> max:(iter) as iter ) : m2
m2(*)
|> %(cy
~>
group_concat:(substr:(" .+*#", 1+min:(iter/7,4), 1), "") as t)
: a
a(*)
~> group_concat:(rtrim:(t),char:(0x0a))
The query generates a coordinate grid, runs the escape-time algorithm via recursive iteration, then aggregates the results into ASCII art – all in a single delightql expression.
Delightql Recursive Apology
A true fixed-point engine – like those in Datalog systems – would maintain the full set of derived facts and allow each iteration to query against it. SQL chose a simpler model. The restrictions on aggregation, subqueries, and mutual recursion all follow from this choice.
Delightql inherits these limitations because it transpiles to SQL and the semantics remain bound by the target. Where SQL’s recursive CTEs fall short – self-similar tree construction, recursive aggregation, shortest-path computation – delightql falls short as well.
Function Rules
Functions are relations with a functional dependency between input and output columns. Where a relation may have many outputs for a given input, a function has exactly one. Delightql supports several syntactic forms for defining functions, each suited to different use cases.
Stacked Notation (Named Case)
The stacked form defines functions as lookup tables with explicit input-output mappings:
department_kind(
department -> kind
------------------
"engineering" -> "tech";
"data science" -> "tech";
_ -> "other"
)
The -> separates inputs (left) from outputs (right). The header row names the columns; subsequent rows provide the mappings. The _ matches any input not explicitly listed.
Despite the visual similarity to anonymous table stacked notation, this is an assertion-mode construct – it defines a reusable function, not inline data.
Invocation:
employee(*) |> +(department_kind:(Department) as kind)
SELECT *,
CASE Department
WHEN 'engineering' THEN 'tech'
WHEN 'data science' THEN 'tech'
ELSE 'other'
END AS kind
FROM employee;
Multi-column inputs:
tax_rate(
state, category -> rate
--------------------------
"CA", "food" -> 0.0;
"CA", "electronics" -> 0.0825;
"TX", "food" -> 0.0;
"TX", "electronics" -> 0.0625;
_, _ -> 0.05
)
products(*) |> +(tax_rate:(state, category) as tax)
Rule Form
For computed functions, use the rule form:
plus_two:(x) :- x + 2
numbers(*) |> +(plus_two:(value) as incremented)
SELECT *, value + 2 AS incremented FROM numbers;
The body is any domain expression. The function returns its evaluation.
Disjunctive Clauses
Multiple clauses create conditional functions. Clauses are evaluated top-to-bottom; first match wins:
fizzbuzz:(n | n % 15 = 0) :- "fizzbuzz"
fizzbuzz:(n | n % 3 = 0) :- "fizz"
fizzbuzz:(n | n % 5 = 0) :- "buzz"
fizzbuzz:(n) :- n
The guard condition follows | in the head. If the guard fails, the next clause is tried.
generate_series(1, 100)(*) |> (fizzbuzz:(value) as result)
SELECT
CASE
WHEN value % 15 = 0 THEN 'fizzbuzz'
WHEN value % 3 = 0 THEN 'fizz'
WHEN value % 5 = 0 THEN 'buzz'
ELSE CAST(value AS TEXT)
END AS result
FROM generate_series(1, 100);
Hailstone sequence example:
next_hailstone:(x | x % 2 = 0) :- x / 2
next_hailstone:(x) :- (x * 3) + 1
Composition Notation
Point-free function composition uses the F-PIPE sigil:
clean:(@) :- trim:() /-> upper:()
Equivalent to:
clean:(x) :- upper:(trim:(x))
The piped form reads left-to-right, matching data flow.
With placeholder:
birth_year:(@) :- strftime:("%Y", @) /-> cast:(@ as int)
The @ marks where the piped value is inserted when the function takes multiple arguments.
Higher-Order Functions
Functions are inherently higher-order: any function can accept other functions as parameters. Mark function parameters with colon-functor syntax f:() in the signature to distinguish them from scalar parameters:
apply:(f:(), x) :- f:(x)
The f:() declares that the first parameter is a function. The body calls whatever function was passed in. Scalar parameters are bare names as usual.
Invocation:
users(*) |> (apply:(upper:(), first_name) as formatted)
The call site passes upper:() (a curried function) and first_name (a column) as two arguments. Arity matching works the same as regular functions: apply has arity 2, and the call provides 2 arguments.
Multiple function parameters:
chain:(f:(), g:(), x) :- x /-> f:() /-> g:()
users(*) |> (chain:(upper:(), trim:(), first_name) as cleaned)
Lambda as function argument:
apply_twice:(f:(), x) :- x /-> f:() /-> f:()
users(*) |> (apply_twice:(:(@ * 2), age) as quadrupled)
Mixed function and scalar parameters:
transform_and_compute:(f:(), g:(), value, multiplier) :-
f:(value) /-> g:() /-> :(@ * multiplier)
With conditional logic:
apply_if_long:(f:(), value) :-
_:(length:(value) > 5 -> f:(value); _ -> value)
No double parentheses. Unlike higher-order views, higher-order functions use a single set of parentheses. Views need double parens because they operate on two modal categories – input-only parameters (tables) and bidirectional columns. Functions have no such distinction: everything is a value in, scalar out. See Higher-Order Rules for the full rationale.
Contextual Functions
The .. sigil indicates a function that captures variables from its invocation context:
excess_index:(..) :-
(1 + total - (interest_rate / 252))
/-> greatest:(0.01)
/-> ln:()
/-> :(@ * 2)
/-> sum:(<~ #(date))
/-> exp:()
prices(*) |> (excess_index:(..) as idx)
The function analyzes its body for free variables (total, interest_rate, date) and expects them from the calling relation. This is structural typing for functions – any relation with those columns can use the function.
Mixed parameters:
Combine context capture with explicit arguments:
scaled_index:(.., scale_factor) :-
(1 + total - (interest_rate / 252))
/-> greatest:(0.01)
/-> ln:()
/-> :(@ * scale_factor)
/-> exp:()
prices(*) |> (
scaled_index:(.., 2) as double_scaled,
scaled_index:(.., 0.5) as half_scaled
)
Named context:
Explicitly declare captured variables:
scaled_index:(..{total, interest_rate}, scale_factor) :-
(1 + total - (interest_rate / 252))
/-> greatest:(0.01)
/-> :(@ * scale_factor)
/-> exp:()
This makes dependencies visible in the signature and allows overriding context with explicit values:
prices(*) |> (
scaled_index:(.., 2) as from_context,
scaled_index:(manual_total, manual_rate, 2) as explicit
)
Fact Form
Individual facts define point mappings:
department_kind:("engineering" -> "tech")
department_kind:("data science" -> "tech")
department_kind:(_ -> "other")
This is equivalent to the stacked form but spread across statements. Use it when mappings are added incrementally or loaded from external sources.
Restrictions
- Stacked notation and rule form cannot be mixed for the same function
- Stacked notation and individual facts cannot be mixed for the same function
- Disjunctive clauses (multiple rules with the same head) must be co-located in the source
- Textual order determines evaluation order for disjunctive clauses
Facts
Facts are body-less rules that define ground data. In Prolog terms, they represent the extensional (axiomatic) portion of a program – truths asserted without derivation.
Standard Facts
The notation matches Prolog (minus the terminating period):
person(0, "Gusti", "Parlor", "[email protected]")
person(1, "Diane-marie", "McHenry", "[email protected]")
person(2, "Ced", "Mainds", "[email protected]")
person(3, "Bren", "Berndsen", "[email protected]")
Standard facts sharing the same functor name must be co-located – no other definitions may appear between them.
Stacked Facts
Define tabular data with headers:
employee(
EmployeeId , FirstName , LastName
-------------------
0 , "Gusti" , "Parlor" ;
1 , "Diane-marie" , "McHenry" ;
2 , "Ced" , "Mainds"
)
The syntax mirrors anonymous tables, but anonymous tables are query-mode constructs (inline data) while stacked facts are assertion-mode constructs.
// Anonymous table (query mode)
_(first_name, last_name
--------------
"Gusti" , "Parlor" ;
"Diane-marie" , "McHenry"
)
// Stacked fact (assertion mode)
names(first_name, last_name
--------------
"Gusti" , "Parlor" ;
"Diane-marie" , "McHenry"
)
Default Implementation as Views
Delightql implements facts as views by default, not tables:
employee(
EmployeeId , FirstName , LastName
-------------------
0 , "Gusti" , "Parlor" ;
1 , "Diane-marie" , "McHenry" ;
2 , "Ced" , "Mainds"
)
CREATE TEMP VIEW employee AS
SELECT 0 AS EmployeeId, 'Gusti' AS FirstName, 'Parlor' AS LastName
UNION ALL SELECT 1, 'Diane-marie', 'McHenry'
UNION ALL SELECT 2, 'Ced', 'Mainds';
This seems counterintuitive – facts are data, so why not tables? The justification: typical delightql files contain only a handful of facts (test fixtures, configuration, lookup tables). For small datasets, the difference between a view over literal values and a table with inserted rows is negligible. Views avoid the overhead of table creation and cleanup.
Sparse Stacked Facts
Stacked facts support the same sparse column syntax as anonymous tables. Mark optional columns with ? in the header, then use _(col @ val) fills in data rows:
config(
key, value, description?, deprecated?
--------------------------------------
"timeout", 30 ;
"retries", 3, _(description @ "max retry count") ;
"legacy", 1, _(deprecated @ "true")
)
This is equivalent to the fully-expanded form:
config(
key, value, description, deprecated
------------------------------------
"timeout", 30, null, null ;
"retries", 3, "max retry count", null ;
"legacy", 1, null, "true"
)
Sparse columns reduce noise in metadata definitions and configuration facts where most rows only set a few optional properties.
Namespaces
Namespaces organize definitions and data. They provide isolation, qualification, and a mechanism for code reuse across databases.
What Namespaces Are
A namespace is a container for entities – tables, views, rules, functions. Every entity lives in exactly one namespace.
Namespaces are hierarchical, separated by :::
data::production
lib::analytics
scripts::etl
Entities within a namespace are accessed with the period .:
data::production.users(*)
lib::analytics.clean_name:(text)
scripts::etl.daily_load!(*)
The :: separates namespace parts. The . separates namespace from entity.
Namespace Types
Namespaces fall into four categories based on what they contain and how they’re used.
Pure Rules Namespaces
Contain functions, sigma predicates, transpilation rules, and higher-order views with no external references. Portable – they don’t depend on any database – but may depend on other pure namespaces.
// In lib::string
clean_name:(text) :- text /-> trim:() /-> upper:()
format_email:(name, domain) :- name ++ "@" ++ domain
Pure namespaces can be used anywhere. They have no data dependencies to resolve.
Derived Rules Namespaces
Contain rules that reference external tables. These namespaces come in two forms:
Groundable – has free variables (unqualified table references):
// In lib::analytics (groundable)
young_users(*) :- users(*), age < 30 // 'users' is a free variable
The reference to users must be resolved before use. See Grounding.
Pre-grounded – all references are qualified:
// In lib::analytics (pre-grounded)
young_users(*) :- data::production.users(*), age < 30
No free variables. Ready to use immediately, but tied to a specific data namespace.
Data Namespaces
Map to physical database connections. Contain tables and views introspected from the database.
mount!("sales.db", "data::sales")
// Now: data::sales.orders(*), data::sales.customers(*)
Data namespaces are the ground truth – they hold actual data.
Namespace Classification
A namespace’s type is determined by its contents, not its path:
| If it contains… | It’s classified as… |
|---|---|
| Only functions, sigma predicates, pure HO-views | Pure |
| Any rule referencing external tables | Derived |
Namespace classification by contents
Data namespaces are separate – they’re created by mount!() and contain database tables, not rules. They’re the target of grounding, not the subject.
Conventional Prefixes
By convention, namespace paths indicate their type:
| Prefix | Intended for | Typically created by |
|---|---|---|
data:: |
Database connections | mount!() |
lib:: |
Pure and derived rules | consult!() |
main |
Default working namespace | Implicit |
Conventional namespace path prefixes
These are conventions only, not constraints. The system determines namespace type by analyzing contents, independent of path.
Built-in Namespaces
Several namespaces exist automatically.
main
The default working namespace for the REPL. When you use the REPL interactively, you’re operating in main:
active_users(*) :- users(*), status = "active"
// Equivalent to: main.active_users(*)
When you engage!() a namespace in the REPL, you’re making its entities available in main without qualification.
More generally, every execution context has a working namespace. In the REPL, it’s main. During consult!("file.dql", "lib::foo"), the working context is lib::foo–definitions in that file go into lib::foo. During run!("file.dql"), the working context inherits from the caller.
lib::std::prelude
Core pseudo-predicates, universally available. This is a partial list – see [Standard Library Reference] for the complete set.
| Pseudo-predicate | Purpose |
|---|---|
mount!() |
Load database connection |
consult!() |
Load DQL rules file |
engage!() |
Enable unqualified access |
part!() |
Remove engaged namespace |
run!() |
Execute query file |
Core pseudo-predicates in lib::std::prelude
The DML pseudo-predicates (insert!(), update!(), delete!()) are covered in [DML] and [Scripted Mutations].
No explicit engage needed – these are available everywhere.
The pseudo-predicates that load and inspect namespaces are themselves defined in a namespace. This circularity is intentional – the system is self-describing. You can query sys::entities.entity(*) to see all built-in entities, including these pseudo-predicates.
lib::std::predicates
Built-in sigma predicates, universally available:
users(*), +like(name, "A%"), +between(age, 18, 65)
See [Standard Library Reference] for the complete list.
sys::* (Introspection)
Metadata namespaces for system introspection:
| Namespace | Contains |
|---|---|
sys::ns |
Namespaces, engaged relationships, activated entities |
sys::entities |
Entities, types, references, resolutions |
sys::cartridges |
Cartridges, source types, connections |
System introspection namespaces
Not auto-engaged. Query explicitly when needed:
sys::ns.namespace(*)
sys::entities.entity(*), type = 10 // database tables
sys::cartridges.cartridge(*)
Pseudo-predicates and “attaching” context
There are several functor forms ending with an exclamation point that are used to bring rules, facts, and data into scope and within a namespace.
mount!() – Database Connections
Opens a database and introspects its tables:
mount!("sales.db", "data::sales")
Side effects:
- Creates namespace
data::sales - Introspects tables and views
- Registers entities in namespace
After mounting, tables are accessible:
data::sales.orders(*)
data::sales.customers(*)
consult!() – DQL Rules
Loads a .dql file containing rules:
consult!("analytics.dql", "lib::analytics")
The file contains rule definitions:
// analytics.dql
young_users(*) :- users(*), age < 30
high_value(*) :- orders(*), total > 1000
Side effects:
- Creates namespace
lib::analytics - Parses file
- Creates session views/functions
- Registers entities in namespace
Rules are now accessible (qualified or via engage):
lib::analytics.young_users(*)
engage!() – Unqualified Access
Makes a namespace’s entities available without qualification:
engage!("lib::analytics")
// Now can write:
young_users(*)
// Instead of:
lib::analytics.young_users(*)
Engaging doesn’t load anything – the namespace must already exist.
part!() – Remove Engaged Namespace
Removes a namespace from engaged scope:
part!("lib::analytics")
young_users(*) // Error: not found
lib::analytics.young_users(*) // Still works (qualified)
Grounding
Groundable namespaces have free variables – references to tables that aren’t defined in the namespace. Grounding binds those variables to a data namespace.
Formal Rule
In the expression F^S.e(*):
- Only entities of S are visible. The entity
emust be defined in S. - F is never directly accessible. It supplies bindings for free variables inside S’s entity bodies.
- Grounding does not grant access to S’s other entities (e.g., functions defined in S are not made available in pipe expressions). Functions must be accessed via qualification (
S.func:(x)) orengage!("S").
Put differently: F^S.e(*) means “from S, give me e, and when e’s body references tables, find them in F.” It does not mean “merge F and S together.”
The Problem Again
// In lib::analytics (groundable)
young_users(*) :- users(*), age < 30
users is referenced but not defined. This namespace can’t be used until users is bound to an actual table.
Query-Time Grounding
Use ^ to ground at the point of use:
data::production^lib::analytics.young_users(*)
This binds users to data::production.users for this query.
Query-time grounding uses lazy validation – only the accessed entity and its dependencies are checked. Other entities in the namespace may have unresolved references; they won’t cause failure unless you use them.
// lib::analytics has:
// young_users(*) :- users(*), age < 30 // OK; users exists in production
// revenue_report(*) :- sales(*), amount > 0 // FAIL; sales doesn't exist
data::production^lib::analytics.young_users(*) // OK
data::production^lib::analytics.revenue_report(*) // FAIL: sales not found
Permanent Grounding
Query-time and engage-time grounding are temporary. For a permanent binding, use ground!():
ground!(data::production, lib::analytics, "lib::analytics_prod")
All three arguments are required. The first two are namespace paths; the third is a string literal naming the new namespace.
This:
- Validates all entities in
lib::analyticsagainstdata::production(strict validation). If any entity has an unresolved table reference, the entire operation fails – nothing is created. - Creates a new namespace
lib::analytics_prod - Copies all entities with free variables bound to
data::production - The new namespace is pre-grounded – no
^operator needed
The result is a new namespace, not a mutation of the original. This prevents stateful bugs and makes the operation idempotent.
Chained Grounding
Ground through multiple layers:
data::production^lib::base^lib::extended.final_view(*)
Each ^ binds the namespace to its right against the accumulated context to its left.
Grounding as Inverse Engage
Another way to think about grounding: engage!() brings a namespace’s entities into your scope; grounding injects bindings into a namespace’s scope.
// Engage: bring lib::analytics INTO main
engage!("lib::analytics")
// Ground: inject data::production INTO lib::analytics
data::production^lib::analytics
Grounding reaches into the groundable namespace and says “when you reference users, you mean data::production.users.”
Validation Summary
| Operation | Validation | Persistence |
|---|---|---|
data::ns^lib::ns.entity(*) |
Lazy (just this entity) | Query only |
engage!(data::ns^lib::ns) |
Strict (whole namespace) | Engage scope |
ground!(data, lib, "new") |
Strict (whole namespace) | Permanent (new namespace) |
Grounding validation summary
Constraints
No intersection. The ground namespace and groundable namespace cannot share entity names. If both define users, grounding is ambiguous and fails.
Same database technology. Cross-database grounding (e.g., SQLite namespace against PostgreSQL namespace) is not supported.
Imprinting
Session-scoped entities (created with :-) disappear when the session ends. Imprinting makes them permanent.
More so than most of what we’ve discussed before: this is where actual SQL DDL will be generated.
The Problem
consult!("schema.dql", "lib::schema")
// Creates session views
// Session ends... views are gone
The Solution
lib::schema(*) |> imprint!(data::production)
Imprinting:
- Validates that all entities can resolve against the target (strict validation)
- Generates DDL (
CREATE VIEW,CREATE TABLE) - Executes DDL on the target database
- Entities now exist permanently in the data namespace
Grounding and Imprinting
Grounding and imprinting are highly related: where grounding proves compatibility, imprinting makes it permanent.
If data::production^lib::analytics is valid grounding, then lib::analytics(*) |> imprint!(data::production) is valid imprinting. The grounding operation proves that the derived namespace can bind against the data namespace. Imprinting persists that binding.
// 1. Load database
mount!("prod.db", "data::production")
// 2. Load groundable rules
consult!("analytics.dql", "lib::analytics")
// 3. Ground and test (lazy validation)
data::production^lib::analytics.young_users(*) |> count:(*)
// 4. Commit to grounding (strict validation)
engage!(data::production^lib::analytics) as analytics
// 5. Work confidently
analytics.young_users(*)
analytics.revenue_report(*)
// 6. Persist (strict validation)
lib::analytics(*) |> imprint!(data::production)
Steps 4 and 6 both perform strict validation. If engage succeeds, imprint will succeed (assuming no concurrent changes).