Discussion about this post

User's avatar
JP's avatar

Solid comparison. The bit about CLAUDE.md acting as institutional memory is the real unlock that most people gloss over. I've written extensively about configuring it and once you've got your project conventions, architecture decisions, and preferred patterns dialled in there, the agent behaves completely differently. It stops guessing and starts following your actual codebase norms. That single file is what turns Claude Code from a clever autocomplete into something that genuinely understands your project.

Pawel Jozefiak's avatar

The METR study finding keeps haunting me - 19% slower despite feeling faster. I tested a similar split but with Codex instead of Cursor. The head-to-head on a real codebase audit revealed something the benchmarks miss: Codex reads holistically across files while Claude Code stays laser-focused on what you point it at.

When fixing Discord error handling in my agent codebase, Codex rewrote four related scripts I hadn't even mentioned. Claude Code fixed only the function I asked about. For autonomous overnight execution though, Claude Code has no competition - it handles plan-execute-deploy cycles for hours without human intervention.

Full comparison with specific examples: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026

4 more comments...

No posts

Ready for more?