9 Comments
User's avatar
ToxSec's avatar

“The agent stack is not the LLM stack. A chatbot needs inference and maybe RAG. An agent needs state management across multi-step execution, tool access governed by protocols, memory that persists across sessions, autonomous reasoning loops, and guardrails that constrain behavior in real time.”

i really like this post because you are doing a great job at explaining all of this while clearing up some really common misconceptions. this was a great read!

Pawel Jozefiak's avatar

The 37-point eval gap is the most honest number in the whole piece. 89% built observability, 52% built formal evals. Teams that can see the logs but have no idea if it's working.

But that gap is downstream of a harder problem: most teams haven't defined success at the task level. You can't build evals without a spec. The tooling exists. The bottleneck is articulating what "done" looks like.

I've been running a self-improving agent for months (nightshift planning, dayshift execution, feedback loops). The stack converged fast -> provider SDK, Postgres, MCP. Done in weeks. What took months was boundary design: where human judgment ends, what failure modes actually matter, what counts as a good output. The infrastructure is a solved problem.

The hard part is still human.

Violeta Klein, CISSP, CEFA's avatar

Really useful map. The layer most teams don't have on their radar is the regulatory one sitting right on top of Layer 6. The EU AI Act already attaches enforceable obligations to guardrails, logging, human oversight, and incident reporting for high-risk systems. Most teams building agents don't see that layer because it doesn't show up in the engineering stack. It shows up in the regulator's.

Paolo Perrone's avatar

Layer 7 being regulatory is a take more people need to hear. Are you seeing any teams actually building compliance into the stack from day one, or is it always retrofit?

Violeta Klein, CISSP, CEFA's avatar

Almost always retrofit. The teams that are building it in from day one tend to be in regulated industries that already have compliance muscle - financial services, healthcare, insurance. They're not starting from scratch, they're extending what they already do for DORA or NIS2. Everyone else discovers the regulatory layer the hard way: after deployment, after an incident, or after someone asks them to prove their agent is governed. By then the retrofit cost is an order of magnitude higher than building it in would have been.

Paolo Perrone's avatar

Makes sense that regulated industries have a head start. Are you seeing any of those teams open-sourcing their compliance patterns, or is everyone building it proprietary behind closed doors?

Violeta Klein, CISSP, CEFA's avatar

Mostly closed doors. The compliance patterns are treated as competitive advantage in regulated industries - nobody wants to show their hand before enforcement starts. The closest thing to open-source compliance infrastructure is ForHumanity's certification schemes, which are published under Creative Commons. On the standards side, the harmonized standards that would give everyone a shared baseline are still in development and won't land before enforcement deadlines hit. So right now every organization is building in isolation against the same regulation, solving the same problems independently. That's the gap your stack could help close - if Layer 7 had a default the way Layer 2 has MCP.

Tiffany Teasley's avatar

Start simple. Most teams are jumping to complex AI stacks before they have a real problem that requires it.

Paolo Perrone's avatar

100%. What's the simplest agent setup you've seen actually work in production?