Discussion about this post

User's avatar
Butilca Bogdan's avatar

This really landed for me — "multi-agent works mainly because it spends enough tokens to solve the problem." I built a little personal deep-research setup on basically that premise, and the caveats you list are the exact walls I hit.

The runaway-cost one especially. I ended up just not letting subagents spawn their own subagents — enforced in the orchestration layer, not begged for in a prompt. Doesn't give me a real per-run cost cap (still missing that one too), but it kills the scariest multiplier by default.

Same instinct on verification — a separate verifier that grep-checks citations against the source, instead of an LLM grading its own homework.

The thing I keep chewing on: did Anthropic hard-ban recursion internally, or let agents spawn and rein it in some other way? Curious how they landed on it.

Pradeep's avatar

The parallelization insight here is what separates shallow agentic implementations from ones that actually scale. The 15x token cost sounds alarming but the 90.2% answer quality improvement tells the real story - for knowledge-intensive enterprise use cases, the ROI math flips quickly when you factor in analyst time saved. The orchestrator-subagent pattern Anthropic settled on is also the one I've seen work best in production deployments. Really useful breakdown of the design decisions.

2 more comments...

No posts

Ready for more?