コンテンツにスキップ

Pipeline overview

このコンテンツはまだ日本語訳がありません。

The Metaweave pipeline is a Python ETL that consumes the form’s emails, decrypts them, and upserts to PostgreSQL. It runs as a single CLI — python -m src.main — with no daemon, no queue, no message broker. Designed to be triggered on a schedule.

Data flow

Vessel-side form (HTML)
▼ Submit → AES-128-CBC encrypted JSON in email body
Microsoft Outlook ─── shared mailbox ────►
▼ fetch unread "Metaweave Forms:" emails
src/fetcher.py ← Microsoft Graph API + MSAL OAuth2
▼ body text + parsed subject (vessel, type, date)
src/parser.py ← extract BEGIN/END markers, AES decrypt → JSON dict
▼ ParseResult(form_data, report_type_raw, form_version)
src/mapper.py ← 92 scalars + 11 arrays → SQLAlchemy model objects
▼ dict { vessel_info, voyage_no, report, events, bunker_rob, … }
src/writer.py ← upsert Vessel, upsert Voyage, delete-then-insert Report + children
▼ CASCADE delete on report replacement
PostgreSQL (Cloud SQL)
17 metaweave_* tables: Vessel → Voyage → Report → 11 child tables

Stages

Designed for

  • Scheduled invocation — typically every 5–15 minutes via cron or a serverless scheduler
  • Idempotent reads — marks emails as read after fetch, so a re-run won’t re-process
  • Safe re-submissions — corrections replace previous rows by design (CASCADE delete + insert)
  • Single mailbox per fleet — one shared Outlook mailbox is the input; the script loops every unread email

Not designed for

  • High volume (>10,000 emails/run) — sequential processing, single thread
  • Multi-tenant — one Azure app registration → one mailbox → one Postgres database
  • Real-time delivery — there’s a polling delay between Submit and ingest (driven by your cron cadence)

See also