Skip to content

Provenance — every claim traces to raw bytes

If you ask the agent “where did you get this?” it walks the chain back to the bytes. This page documents that chain.

The chain

Final story ──► final-story.md
│ cites story:42
Story ──► story.md
│ cites note_paths[postit:12345, postit:12346]
Post-it ──► agent_postits row
│ source_row_ids=[emails:7891, conversation_turns:55432]
Evidence ──► incidents:7891 → file:///incidents/raw/2026-05-18_INC-2026-0142.jsonl
conversation_turns:55432 → session.jsonl:421:198432
Raw bytes ──► absolute ground truth — never moves

Six layers. Each one cites the layer below. The bottom layer (raw bytes) is append-only and immutable — original files never rewritten, only superseded.

Why files are canonical (not DB columns)

Stories, post-its, and final stories live on disk as .md files first. The DB holds:

  • An index for fast retrieval
  • A hash of the file contents for tamper-detection
  • Cross-references between layers

But the canonical truth is the file. Eight reasons:

  1. Human-readable without DB access
  2. Git-able (every story has a history)
  3. Survives DB corruption
  4. Greppable
  5. Easy to back up
  6. Easy to inspect during incidents
  7. No ORM impedance
  8. Same shape across cloud and local modes

What this enables

NeedWhat provenance gives you
”Why did the agent say this?”Walk back to source — see exactly which post-it, which email, which sentence
”Has this been tampered with?”Hash check — DB hash vs file hash on every read
”What did the system know on date X?”Archive layer — every story has dated archives
”Should I trust this claim?”Importance scale on each post-it + source row depth

What it costs

LayerCost characteristic
Final storyOne file, daily rewrite, hash stored in DB
StoryOne file per casefile, rebuilt on new post-it cluster, prior versions archived
Post-itOne row per analyzer-lens per source, immutable once written
EvidenceOne row per ingested source (email, conversation turn, photo, ERP row)
RawAppend-only — original files stay in ~/inbox/, ~/library/raw/

Storage scales linearly with ingestion. At ~100 emails/day across an office, ~10MB/day total — a year of full provenance fits in 4GB.