The Codebase Is a Living Document — How AI Changed What Comments Should Be

A backend engineer was investigating a subtle data corruption bug in May 2026. The function with the bug had a comment from 2022 that said "TODO: fix race condition." The code that followed had been refactored extensively since the comment was added; the race condition the comment referred to no longer existed in the function. But the comment had survived four major refactors.

This pattern — comments outliving the code they describe — has been a quiet failure of engineering organizations for decades. In 2026, with AI agents actively reading and writing code, the pattern has measurable cost: agents trust the comment and produce bad code from it. The discipline of comment maintenance has been forced to evolve.

How AI Agents Read Comments

The structural shift is that comments now have two distinct audiences.

Human readers, occasional and skeptical. Humans skim code, often ignore comments, and trust the actual code more than the comment when they conflict. This was always the case.

AI readers, frequent and credulous. Agents read comments carefully and weight them heavily in their understanding. A misleading comment leads to misleading agent behavior. The agent doesn't have the same skepticism a human reader brings.

This asymmetry has consequences. Stale comments that humans would have rolled their eyes at now actively produce bad agent output. Accurate comments that humans would have skimmed past now actively help agents produce good output.

What Good Comments Look Like Now

The new discipline emphasizes a few specific patterns.

Comments that explain non-obvious why, not what. A comment that explains why a counterintuitive design decision was made — "we batch these requests because the upstream service rate-limits aggressively" — is high-value for both human and AI readers. A comment that just describes what the code does ("loop over items") is noise.

Constraints and invariants stated explicitly. "This function assumes the input is sorted." "This must be called before initializing the cache." "This is the only place that mutates the state." These statements help both human readers and AI agents avoid mistakes that the code's surface doesn't reveal.

Edge cases and known issues called out. "When count is zero, this returns null rather than empty list — by design, to avoid masking upstream errors." This kind of comment prevents future modifications that would "fix" the design without understanding it.

Out-of-scope statements. "This function does NOT handle pagination." "This module is not thread-safe." Negative statements that prevent incorrect assumptions are particularly valuable.

Pinned references to specs or decisions. "See ADR-0042 for the rationale on this approach." A pointer to durable documentation outside the code is more useful than trying to recap the decision in inline comments.

What Bad Comments Look Like Now

The patterns to avoid are also clearer.

Comments that restate the code. "Increment the counter" above counter++; is noise. It adds nothing and creates maintenance burden.

Stale TODOs. TODOs that have outlived their context. A TODO from 2023 in 2026 code is either resolved (delete the comment) or no longer accurate (rewrite it). Letting them accumulate is a small but real liability.

Comments that disagree with the code. A comment that says "returns the count of active users" above code that returns the count of all users is actively misleading. AI agents may write code based on the comment, not the implementation.

Apologetic comments. "I know this is hacky but..." rarely helps anyone. The hack is in the code; explaining it is useful, apologizing for it is not.

Comments that capture conversation. "Per Slack discussion with Bob 2/15/24" — context that lives in chat is unreliable forever. Pin durable references, not ephemeral ones.

What This Means for Comment Maintenance

The maintenance discipline has changed in shape.

Comments are reviewed in PRs as carefully as code. Comment changes are part of the review. Reviewers explicitly check whether comments still match the code after a change. The "drive-by comment update" is a meaningful contribution.

Stale comments are bugs, not cosmetic issues. When an audit finds a comment that no longer matches the code, that's filed as a defect, not as a low-priority cleanup. The cost of misleading agent behavior is real.

Comment ownership follows code ownership. When a file's owner changes, the new owner inherits responsibility for the comments. The comment is not a separate artifact maintained by a separate process.

Linting catches certain comment failures. Tools that flag TODOs older than a threshold, comments referring to deleted functions, or comments out of sync with function signatures are increasingly part of CI.

How AI Tools Are Changing the Production of Comments

The other side of the AI-comment equation is that AI tools are now part of comment production.

AI agents writing comments tend to over-comment. Without explicit instruction, agents add comments to every function. Most of these comments restate what the code does. Teams using Claude Code typically include in their style guides: "Don't add comments unless they explain a non-obvious why."

AI agents reviewing comments can flag staleness. A subagent can check whether comments match the code they describe. Used in PR review, this catches stale comments before they merge.

AI agents can suggest comment improvements. Where a comment is unclear or actively wrong, AI tools can suggest better phrasing. The human still makes the final decision, but the surfacing of issues is automated.

What's Different From Pre-AI Best Practices

The advice "write good comments" hasn't changed in essence. What's changed is the cost-benefit calculation.

The cost of bad comments has gone up. Bad comments now produce bad agent output, not just human confusion. The downstream cost has multiplied.

The cost of producing comments has gone down. AI tools can draft comment improvements quickly. The marginal cost of having comments be accurate is lower than it used to be.

The discipline of maintenance is more rigorous. Where comment quality used to be an aesthetic preference, it's now a functional requirement. Teams that don't maintain comments well see their AI tools produce worse work.

What Engineering Leaders Should Do

Three concrete recommendations.

Update your style guide. Most style guides have section on comments that haven't been refreshed in years. The current guidance — comments for non-obvious why, not what; explicit constraints; out-of-scope statements; pinned references — should be in the guide.

Make comment review part of PR review. Reviewers should explicitly note comment quality, not just code correctness. Set the expectation that bad comments are blocking, not advisory.

Tool for comment auditing. Run automated checks for stale TODOs, comments referencing deleted code, and signature mismatches. The tooling is cheap and the upside is real.

Train the team on AI-aware commenting. Engineers who haven't internalized that comments are now consumed by AI agents continue writing comments the old way. A 30-minute internal session on the new practices pays back quickly.

The discipline of code commenting has been slowly broken for decades and is now being repaired by force. AI agents reading and writing code with humans require comments that are accurate, useful, and maintained. The teams that have made this transition are running on codebases that work well for both human and AI engineers. The teams that haven't are running on codebases where the comments quietly mislead their agents into producing worse output. The discipline isn't optional anymore. The cost of skipping it is in the code that ships.