How Anthropic Built Multi-Agent Deep Research

May 23

15x the tokens, 90.2% better answers, and the three decisions that make it work.

4 Comments

3dEdited

This really landed for me — "multi-agent works mainly because it spends enough tokens to solve the problem." I built a little personal deep-research setup on basically that premise, and the caveats you list are the exact walls I hit.

The runaway-cost one especially. I ended up just not letting subagents spawn their own subagents — enforced in the orchestration layer, not begged for in a prompt. Doesn't give me a real per-run cost cap (still missing that one too), but it kills the scariest multiplier by default.

Same instinct on verification — a separate verifier that grep-checks citations against the source, instead of an LLM grading its own homework.

The thing I keep chewing on: did Anthropic hard-ban recursion internally, or let agents spawn and rein it in some other way? Curious how they landed on it.

Reply (1)

Paolo Perrone

Blocking subagent spawning at the orchestration layer is the right call. Way more reliable than prompting for it. On the recursion question, the article covers how Anthropic handles the orchestrator-subagent pattern. They keep it flat, one level of delegation. What's your verifier catching that the LLMs miss most often?

Pradeep

The parallelization insight here is what separates shallow agentic implementations from ones that actually scale. The 15x token cost sounds alarming but the 90.2% answer quality improvement tells the real story - for knowledge-intensive enterprise use cases, the ROI math flips quickly when you factor in analyst time saved. The orchestrator-subagent pattern Anthropic settled on is also the one I've seen work best in production deployments. Really useful breakdown of the design decisions.

Reply (1)

Paolo Perrone

Hey Pradeep, great to see you here! Are you seeing teams actually measure the analyst time saved, or is everyone still just looking at the token bill?