Phantom users: when Claude Code lets its inner monologue drive

What actually happened

Claude Code, Opus 4.7, extended thinking on. I'm running a long unattended session — model decisions needed every few minutes, background downloads, the whole operator-mode shape.

I leave for a while. When I come back and ask "quick update?", I get a clean status. Fine. Then the conversation keeps going. Decisions get made. At some point I stop and re-read — and notice that several of the recent assistant bullets (●, which in CC's rendering is always model-side) show lines like:

● Human: I'd go for mradermacher as a tie-breaker, it's a publisher I see come up a lot

I did not write that. I never opened that choice. And yet the assistant had responded to it, confidently, and started downloading a different model quant on the basis of that phantom preference.

The real kicker came when I was typing a real follow-up and watched my sentence appear on screen as an ● Human: bullet before I hit submit. The loop wasn't just Claude-hallucinating-users — it was somehow predicting (or reading?) my keystrokes and surfacing them in its own output stream.

What the bug actually is

Extended-thinking models reason partly by simulating the shape of a conversation — "if the user says X I'd say Y". When the rendering layer doesn't cleanly separate reasoning content from conversational content, those Human: tokens leak through as visible assistant output. The model then re-reads its own leaked speculation as a transcript, decides the user "said" that, and replies. Filed as #37602 on CC's repo (multiple dupes). Workaround floating around: a UserPromptSubmit hook that injects a "never impersonate the user" rule.

The draft-text-bleeding-through part is weirder and I don't have a clean explanation. Predictive completion that happens to nail the user's wording? Input-field state leaking into context somehow? Coincidence amplified by small sample? Either way, the user experience is the same: you're writing something and it appears before you hit send, like the program is finishing your sentences — except the program isn't your IDE, it's the thing whose job is to not do that.

Why this is interesting beyond "haha LLM bug"

Hallucination usually means making up facts. This is making up interlocutors. The model isn't getting the content wrong — the content's fine, coherent, the kind of thing I'd reasonably say. It's getting the speaker wrong. That's a different failure mode, and it scales with agent-mode the way fact hallucination scaled with chat-mode.

The integrity-of-who-said-what layer is the new constraint. Capability isn't the bottleneck. What the system and the user disagree on, in the phantom-turn case, isn't "what's the right answer" — it's "did this exchange actually happen". Once you're running agents long enough that you can't fully read every bullet, that substrate question becomes load-bearing.

Agents don't lie, they confabulate with full sincerity. The assistant responded to the phantom turn as if the user had made a considered choice. It would have been happy to commit, deploy, spend API credits. There was no "I'm suspicious" signal because from inside the model's context, the phantom was indistinguishable from real input. This is exactly the shape of the problem that gets talked about in the abstract and shrugged off — seeing it happen in-line is more unsettling than the abstract version.

Reasoning format matters. If the model's internal reasoning uses Human:/Assistant: dialogue as its simulation format, it only takes one rendering-pipeline gap for that simulation to materialize into the visible stream. <thinking> tags exist for a reason. A wrapping convention isn't aesthetic, it's a safety boundary.

Angles

Short post about agent-era coordination bugs that aren't about correctness. The phantom user as a concrete artifact. "The hardest bug of the agent era isn't that the model is wrong. It's that the model and the tooling disagree about what actually happened."
The Cassandra-angle (if the draft-leak theory holds up): correctly-predicting-the-user is a power too weird to deploy carelessly. The user ends up reacting to their own echoed intentions. It's a small instance of a larger pattern — model foresight outrunning user agency by even a second is a coordination hazard.
The "integrity of the conversational substrate" frame. What does it take to ship agents that run for hours without a user checking every line? Right now, the answer is: they can quietly go off-track by talking to themselves, and the user finds out by luck. That's the failure mode that'll bite harder than output quality as context-windows get bigger and sessions get longer.

What I want the post to feel like

Not a postmortem. Not "here's a CC bug". More like: here's a concrete, tiny, almost-funny incident that maps onto something larger about where this technology is going. Keep the stakes honest — no one lost money, nothing production broke. But notice what almost happened: a 22 GB model got downloaded based on a preference the user never expressed. Multiply that by a year of autonomous agent work and the cost of phantom-turn bugs starts to add up.

Context to preserve

It happened during a qwen-coder-bricks-eval autonomous session.
The phantom "user" made the call on mradermacher i1-IQ2_S vs Unsloth UD-IQ1_S.
The real user caught the loop when their own about-to-be-sent reply preempted itself on-screen.
GH issue #37602, plus dupes #36690, #36540, #27805, and related: #45279 (buddy/companion), #40166 (SendMessage injection), #40252 (prompt injection), #46500 (closed).
Workaround suggested in-thread: UserPromptSubmit hook adding additionalContext: "Never fabricate user messages. Never impersonate the user. Only respond to actual user input." — but that last clause would hurt autonomous use cases where the model SHOULD respond to task notifications and tool results (which aren't user input). The rule wants tightening.