Skip to content

Validation & diagnostics

This is the engine companion to How Keystone checks your book. That page is the author's mental model; this one is the mechanism — how the checks are implemented, and how a core forker adds their own. If you're only writing a book, you want the author page.

One path for filter diagnostics

Every diagnostic from Keystone's Lua filters and handlers goes through one library, .pandoc/filters/lib/errors.lua — no filter writes to stderr or exits the process on its own. (Shell resolvers and Pandoc emit their own; see strict mode.) That single owner is why the Lua messages are uniform: the WARN:/ERROR: prefix, the element-context suffix, and clean-fatal behavior are decided in one place.

Two functions, two severities:

  • warn(msg) — print, notify the strict-mode sink, and return. The caller continues with a fallback.
  • fatal(msg) — print, then exit the process cleanly. No Lua traceback into filter source reaches the author; fatal calls os.exit before Pandoc's own error handler can append one.

fatal is for author- and config-facing failures — a bad value, an undeclared mark. A bare error() is reserved for internal-invariant bugs (a handler that returned the wrong shape), where a traceback is the right tool. Handlers are pure author-content surface: they always report, never error() or io.stderr directly — so every message stays uniform and the fatal path stays traceback-free.

Element context: quoting the element, not a line

A diagnostic locates the offending element by quoting it, not by line number:

WARN: aside: unknown type 'todo' (in .aside "A callout whose type is not one of the d…")

That trailing (in …) suffix is built by kast.inspect.describe(el), which snapshots the element's classes, identifier, and a leading-text snippet (capped, cut on a UTF-8 boundary). The errors lib renders that as a CSS-like selector plus the snippet.

Why quote the element instead of pointing at file:line? Source positions reach the document tree only through Pandoc's sourcepos extension, which the commonmark/gfm readers support but the markdown reader — the one the whole fenced-div and shortcut system rests on — does not. Quoting the element is 100% correct with no reader switch, and it covers every handler. describe is also the seam where source positions would attach: if the markdown reader ever gains sourcepos, a pos field added here surfaces real file:line with the call sites untouched. (The one place a real line number appears today is Pandoc's own unclosed-div warning, which Keystone passes through untouched — see Finding the offender.)

The context is bound lazily: the describe walk runs only if a diagnostic actually fires, so the happy path pays nothing.

-- in the dispatcher, per element
local report = errors.bind(function() return kast.inspect.describe(el) end)
-- a handler then calls report.warn("…") / report.fatal("…")

The closed vocabulary

Keystone's class set is closed: there is no author CSS channel, so a class that resolves to nothing is a typo, not an extension point. shortcuts.lua holds the full vocabulary, so it owns the check — every div/span class must resolve against one of:

  • a handler class (bare or ks--prefixed),
  • a shortcut name (system or user), or
  • a short Pandoc-native allowlist (smallcaps, underline, ul, mark, unlisted).

Any class matching none of them warns, naming the class. A close match from the combined vocabulary — Levenshtein distance ≤ 2, ties broken lexicographically — is offered as a "Did you mean?" hint. The name is canonicalized (its ks- prefix stripped) before matching, so a broken private class steers to the public name, which is the stable surface.

Classless elements (an id-only ::: {#refs}) are skipped — citeproc fills those with csl-* classes after the filters run, so flagging them would be wrong.

Required fields

Whether a field is required is declared in the shortcut interface, not in handler code:

  • An interface entry may declare required: true. At expansion, a required field with no author value and no default is fatal — the field is the whole point of the construct (vspace.size, set.mark).
  • Each handler declares the attributes it cannot run without on its returned table: required_attributes = { "size" }. This is the handler's contract.
  • At load time, shortcuts.lua checks that every shortcut routing to a handler guarantees each required attribute — via a default or required: true. A shortcut that would let a required attribute through unset is a fatal misconfiguration, caught before any book builds. The guarantee extends transitively to shortcut bodies; only a bare ks-* handler used directly in the manuscript bypasses it (the documented escape hatch).

The split keeps handlers free of fatal presence checks — they assume the shortcut layer supplied what's required and validate only value validity (aside rejects an unknown type). The required-field guarantee lives in the interface, where aside.type defaults to note; a handler keeps only a soft guard for the bare-ks-* bypass path — aside warns no type given rather than crash.

Strict mode

KEYSTONE_WARNINGS_AS_ERRORS (accepted truthy: true, 1, yes, on) turns every warning fatal. Three worlds emit warnings, and each fails differently:

  • Shell resolvers (diagnostics.sh) fail fast — under strict mode the first warn stops the build immediately.
  • Pandoc runs with --fail-if-warnings, so its own reader warnings fail the pass they occur in.
  • Lua filters are the one world that aggregates. They run as separate Lua states — the main pass, the EPUB pre-scan, a standalone font-path invocation — so an in-process counter can't see them all. Instead publish.sh exports a sink file path, each state's errors lib appends its warnings there, and publish.sh checks the sink after the build. That's report-all-then-fail: one run surfaces every filter warning at once rather than dying on the first.

Whichever world trips, artifacts are promoted on success only — the build writes to a staging path and moves into artifacts/ only after a clean build with an empty sink, so a strict failure never leaves a half-built file behind.

EPUB diagnostics can print twice

EPUB builds run a pre-scan pass before the main build, and the filters run in both. A filter-level warning therefore prints once per pass. The sink de-duplicates before reporting, so strict mode lists each distinct warning once.

Adding a diagnostic to your handler

When you add a handler, it receives the element and a bound report handle:

local function my_div(el, report)
  local width = el.attributes["width"]
  if width and not valid(width) then
    report.warn("mydiv: unknown width '" .. width .. "'")
    -- fall through to a safe default
  end
  -- …emit output…
end
  • Use report.warn when you have a fallback and the book can still build; use report.fatal when proceeding would produce broken output.
  • The element context is attached automatically — don't repeat the class or a location in your message. State the problem and the offending value.
  • Declare anything you can't run without in required_attributes on the returned table, rather than hand-rolling a presence check — the interface then guarantees it, and the message is consistent with every other required field.

Match the wording of the existing handlers: lower-case handler name, the bad value quoted, and the accepted set named when it's small (invalid cols '9' (must be an integer from 2 to 4)).