Anthropic Study Finds Iteration Boosts Fluency But Lowers Critical Scrutiny for Code and Documents

Anthropic analyzed 9,830 anonymized Claude.ai conversations from a seven-day period in January 2026 and found that iterative users demonstrate substantially higher rates of fluency behaviors than those who accept initial responses. Conversations that produced artifacts such as code or documents encouraged more directive inputs but corresponded with modest declines in critical evaluation behaviors.

Key Points

A large majority (85.7%) of Claude.ai conversations involved iteration and refinement; iterative exchanges averaged 2.67 fluency behaviors versus 1.33 in non-iterative chats.
Artifact-producing conversations (12.3% of the sample) prompted more directive behaviors - clarifying goals (+14.7 points), specifying formats (+14.5 points), and providing examples (+13.4 points).
Those same artifact conversations showed reduced critical evaluation - users were less likely to identify missing context (-5.2 points), check facts (-3.7 points), or question reasoning (-3.1 points).

Anthropic published research showing a marked difference in how people interact with its Claude AI assistant depending on whether they iterate with the tool or accept an initial reply. The company examined 9,830 anonymized conversations on Claude.ai collected over a seven-day period in January 2026 and evaluated them using its 4D AI Fluency Framework, which measures 11 observable behaviors including iteration, fact-checking, and questioning of reasoning.

The analysis found that 85.7% of the conversations displayed iteration and refinement. Conversations that included iteration exhibited a higher number of fluency behaviors on average - 2.67 behaviors - compared with 1.33 behaviors in conversations where users accepted the assistant's first response.

Anthropic also isolated exchanges that generated artifacts - defined in the analysis as code, documents, or interactive tools. These artifact-related conversations made up 12.3% of the sample and were associated with higher frequencies of directive behaviors. In particular, users in those exchanges were more likely to clarify goals, specify formats, and provide examples, with increases of 14.7, 14.5, and 13.4 percentage points respectively versus non-artifact conversations.

At the same time, the artifact-generating exchanges showed lower rates of critical evaluation. Compared with conversations that did not produce artifacts, users in artifact conversations were 5.2 percentage points less likely to identify missing context, 3.7 points less likely to check facts, and 3.1 points less likely to question Claude's reasoning.

Anthropic characterized these findings as establishing a baseline for monitoring the development of AI fluency over time. The company said it plans to follow up with cohort analyses that compare new and experienced users, and to incorporate qualitative approaches to capture behaviors that occur outside the chat interface.

Methodology note - The results are drawn from an analysis of 9,830 anonymized conversations collected during a specified seven-day interval in January 2026 and assessed against the 4D AI Fluency Framework's 11 observable behaviors.

Risks

Reduced critical scrutiny in conversations that generate code or documents may increase the chance that errors or omissions go unnoticed - this could affect technology and software development workflows.
The dataset covers a single seven-day period in January 2026, which limits how broadly the findings can be generalized across time or across different user populations - this affects any sector relying on trend extrapolation from the study.
Behavioral patterns measured within the chat interface may not capture actions users take outside the interface; planned qualitative follow-ups are needed to understand off-chat verification practices, which has implications for enterprise adoption and compliance practices.

Menu

Key Points

Risks

More from Stock Markets