Solid comparison. The bit about CLAUDE.md acting as institutional memory is the real unlock that most people gloss over. I've written extensively about configuring it and once you've got your project conventions, architecture decisions, and preferred patterns dialled in there, the agent behaves completely differently. It stops guessing and starts following your actual codebase norms. That single file is what turns Claude Code from a clever autocomplete into something that genuinely understands your project.
The METR study finding keeps haunting me - 19% slower despite feeling faster. I tested a similar split but with Codex instead of Cursor. The head-to-head on a real codebase audit revealed something the benchmarks miss: Codex reads holistically across files while Claude Code stays laser-focused on what you point it at.
When fixing Discord error handling in my agent codebase, Codex rewrote four related scripts I hadn't even mentioned. Claude Code fixed only the function I asked about. For autonomous overnight execution though, Claude Code has no competition - it handles plan-execute-deploy cycles for hours without human intervention.
The holistic vs focused distinction is real. Codex rewriting files you didn't mention sounds great until it touches something it shouldn't. How often does that broad reach create problems vs catching things you missed?
Solid comparison. The bit about CLAUDE.md acting as institutional memory is the real unlock that most people gloss over. I've written extensively about configuring it and once you've got your project conventions, architecture decisions, and preferred patterns dialled in there, the agent behaves completely differently. It stops guessing and starts following your actual codebase norms. That single file is what turns Claude Code from a clever autocomplete into something that genuinely understands your project.
The METR study finding keeps haunting me - 19% slower despite feeling faster. I tested a similar split but with Codex instead of Cursor. The head-to-head on a real codebase audit revealed something the benchmarks miss: Codex reads holistically across files while Claude Code stays laser-focused on what you point it at.
When fixing Discord error handling in my agent codebase, Codex rewrote four related scripts I hadn't even mentioned. Claude Code fixed only the function I asked about. For autonomous overnight execution though, Claude Code has no competition - it handles plan-execute-deploy cycles for hours without human intervention.
Full comparison with specific examples: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026
The holistic vs focused distinction is real. Codex rewriting files you didn't mention sounds great until it touches something it shouldn't. How often does that broad reach create problems vs catching things you missed?
I would say it it catch problems more. Codex is really good. App is fantatsic. For some reason I am still going back to Opus…
That says a lot about Opus. What's the thing that keeps pulling you back, the reasoning quality or something else about the workflow?
I would say "the vibes" :D
It is really smart + has this thing that I nejoy chatting with it.