Skip to content

AI incident dataset

The Incident Collector (Agent #1 of the AI Guardrail Lab) needs raw material. We curate a set of ten or more incidents drawn from four canonical sources.

Sources

SourceWhat it gives
OECD AI Incidents and Hazards Monitor (AIID)Government-tier catalogued incidents with structured fields
AIAAIC RepositoryIndependent journalism-style incident archive
Stanford AI Index — AI-related incidentsAcademic-tier catalogued cases
Damien Charlotin’s trackerPractitioner-curated, court-decision focus

What an incident record looks like

Each curated incident is one JSONL row:

{
"id": "INC-2024-0142",
"title": "...",
"date_occurred": "2024-08-15",
"system_type": "LLM chatbot",
"deployment_context": "customer support",
"harm_type": ["misinformation", "financial"],
"severity": 4,
"description": "...",
"sources": ["...", "..."],
"lessons": "...",
"related_incidents": ["INC-2024-0089"]
}

Why JSONL

The Incident Collector reads one record per line, processes, writes a .md sidecar (Stage 2), then Stage 3 analyzers fan out — Root Cause, Threat Modeling, Guardrail Designer all read the same record through different lenses.

JSONL keeps the pipeline streaming-friendly and grep-able.

Download

FormatLink
JSONL (canonical)link added on publication
CSV (Excel-friendly)(auto-generated from JSONL)
Markdown (human-readable)(auto-generated, one file per incident)

How to extend

NBS engineers can add their own incidents — internal post-mortems, near-misses, observability anomalies. The schema is open; the Incident Collector will pick them up if they land in the configured inbox.