6 Comments
User's avatar
JP's avatar

Solid comparison. The bit about CLAUDE.md acting as institutional memory is the real unlock that most people gloss over. I've written extensively about configuring it and once you've got your project conventions, architecture decisions, and preferred patterns dialled in there, the agent behaves completely differently. It stops guessing and starts following your actual codebase norms. That single file is what turns Claude Code from a clever autocomplete into something that genuinely understands your project.

Pawel Jozefiak's avatar

The METR study finding keeps haunting me - 19% slower despite feeling faster. I tested a similar split but with Codex instead of Cursor. The head-to-head on a real codebase audit revealed something the benchmarks miss: Codex reads holistically across files while Claude Code stays laser-focused on what you point it at.

When fixing Discord error handling in my agent codebase, Codex rewrote four related scripts I hadn't even mentioned. Claude Code fixed only the function I asked about. For autonomous overnight execution though, Claude Code has no competition - it handles plan-execute-deploy cycles for hours without human intervention.

Full comparison with specific examples: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026

Paolo Perrone's avatar

The holistic vs focused distinction is real. Codex rewriting files you didn't mention sounds great until it touches something it shouldn't. How often does that broad reach create problems vs catching things you missed?

Pawel Jozefiak's avatar

I would say it it catch problems more. Codex is really good. App is fantatsic. For some reason I am still going back to Opus…

Paolo Perrone's avatar

That says a lot about Opus. What's the thing that keeps pulling you back, the reasoning quality or something else about the workflow?

Pawel Jozefiak's avatar

I would say "the vibes" :D

It is really smart + has this thing that I nejoy chatting with it.