Skip to content

Agent frameworks

Turn a brain into a team.

A model alone can think. An agent framework gives it tools, memory, and the ability to delegate — turning a single brain into a whole workforce.

Claude Code · SDKPiCodexGoogle ADK · Jules

Agent frameworks

Framework comparison

FrameworkInteractionLicenseMulti-modelAgent teamsFree tier
Claude Code · SDK
Interactive CLI + APIProprietaryClaude onlySubagents + hooksNo
Pi
Interactive terminalMIT10+ providersVia extensionsOwn API key
Codex
Async cloudProprietaryGPT-5 onlyParallel sandboxesLimited
Gemini CLI
Interactive terminalApache 2.0Gemini onlyLimited1,000 req/day
Google ADK
Build-your-ownApache 2.0Any modelGraph orchestrationFramework free
Bold values indicate a differentiating advantage. Sources: Anthropic · Pi (GitHub) · OpenAI · Google ADK — May 2026.

SWE-bench performance

Agentic task resolution
SWE-bench Verified
87.688.776.2
Terminal task completion
Terminal-Bench 2.0
65.464.756.2
System prompt footprint
tokens (lower = leaner)
~10K<1K~8K

SWE-bench and Terminal-Bench scores reflect the underlying model (Claude Opus 4.7, GPT-5.5, Gemini 2.5 Pro). Pi’s score is model-dependent — it runs any provider. System prompt sizes: Pi <1,000 per mariozechner.at (Nov 2025); Claude Code ~10,000 per same source. Sources: Anthropic system card · Pi blog post.

Pick the right framework

01

Agent teams that need compliance, audit trails, and subagent control

Use Claude Code + Agent SDK — first-class subagent support with permissioned tool sets, PreToolUse/PostToolUse hooks for change-control logging, plan mode (read-only), and session branching. The only framework with a programmatic API that mirrors the interactive CLI exactly. Best for regulated, multi-agent workflows where you need full observability.

02

Background task queues — fire tasks, collect PRs

Use Codex — the only framework built for async, parallel cloud execution. Submit 20 tasks simultaneously across different repos; each runs in an isolated sandbox and returns a PR. RL-trained on real software engineering tasks to produce clean, human-style diffs. Best when you want agents working in the background while the team does other things.

03

Cost-sensitive, local, or air-gapped deployments

Use Pi — MIT licensed, model-agnostic (Anthropic, OpenAI, Gemini, Mistral, Ollama, and more), and a <1,000 token system prompt means ~10× less context overhead than alternatives. Fork it, embed it, run it fully offline with Ollama. Best when data sovereignty matters, budget is tight, or you need full control over the framework itself.

04

Custom multi-agent systems on Google Cloud

Use Google ADK — a full graph-based agent orchestration framework (Python, TypeScript, Go, Java) with 100+ enterprise connectors (SAP, Salesforce, Workday, BigQuery), an A2A (agent-to-agent) communication protocol, and native Vertex AI deployment. Best when building bespoke agent workflows on GCP with existing enterprise system integrations.

Going deeper