Agentic Shopping: How Far Can AI Go in Buying for Consumers?

Summary: A Bernstein analysis tested five agentic shopping tools to assess whether AI agents can fully carry out shopping tasks end to end. The study compared general-purpose large language models with retail-native assistants, finding foundational models excel at discovery across a fragmented web but lack access to structured catalogue, pricing, inventory, and fulfilment data. Retail-native agents offer transactional accuracy within their own assortments but do not bridge the cross-retailer gap. Bernstein concluded that "until payment is embedded directly into the experience, AI remains one step removed from making shopping decisions on behalf of users," and that "there is currently no agentic shopping tool capable of buying a product in an unsupervised fashion. Human engagement is needed at many steps."

GOOG AMZN WMT UBER

Summarize with

ChatGPT Perplexity Claude Grok Gemini

Key Points

Foundational AI models are strong at cross-platform discovery but lack structured catalogue, real-time pricing, inventory, and delivery data, impacting retail platforms and comparison services.
Retail-native AI agents offer precise, near-transaction-ready experiences within a single retailer’s assortment, affecting e-commerce marketplaces and large retailers.
Embedding payment and fulfilment into agentic experiences is essential for fully autonomous shopping; current gaps primarily affect payments and logistics providers.

Overview

Agentic shopping refers to the deployment of AI agents that actively perform parts of the buying process for customers rather than simply suggesting options. Under protocols such as Google’s agentic shopping framework, agents can operate across different phases of the purchase lifecycle - from product discovery to post-purchase support - with the objective of carrying a shopper through to a completed transaction.

Bernstein put five prominent agentic shopping tools through a practical test to determine how closely they approach an end-to-end, unsupervised shopping experience. The tools evaluated were ChatGPT, Gemini (NASDAQ: GOOG), Claude, Amazon’s Alexa (NASDAQ: AMZN) and Walmart’s Sparky (NASDAQ: WMT).

What Bernstein tested

The analysts set out to observe how each tool handled three discrete stages of the shopping funnel: product identification, validation of pricing, and purchase enablement. The goal was to identify where performance breaks down and to measure how far each agent comes toward a fully autonomous e-commerce workflow.

Bernstein’s work highlighted the contrast in capabilities between two broad types of AI approaches: foundational large language models that are not embedded within a retail ecosystem, and retail-native AI agents built on top of a specific retailer’s infrastructure.

Key distinctions in performance

Foundational AI tools that are not retailer-tied can pull information from a range of web sources, compare offerings across platforms, and surface deals in a fragmented retail environment. That breadth of coverage makes them useful for product discovery and comparative shopping.

However, Bernstein pointed out a material limitation: those foundational models lack access to structured catalogue data, current pricing feeds, inventory levels, and delivery options. Because they do not have direct connections to retail back-end systems, they often rely on web-scraping and third-party articles for information. As a result, these agents can struggle with SKU-level identification and pricing accuracy, and they do not reliably handle fulfilment and payment processes.

By contrast, retail-native AI assistants deliver a more precise, transaction-ready experience because they operate within the confines of their own retailer’s assortment and systems. That precision makes them better at accurate pricing and fulfilment, but their results are necessarily constrained to what that retailer sells.

As Bernstein summarized, "Bridging this gap, from either end, will be key as the technology evolves."

Practical findings

Across the test sample, the overarching conclusion was clear: "there is currently no agentic shopping tool capable of buying a product in an unsupervised fashion. Human engagement is needed at many steps." Generalist models such as ChatGPT, Claude, and Gemini do support product discovery, but they lack native access to retailers’ assortments and thus use workarounds like web-scraping. Those workarounds introduce variability in SKU identification and pricing accuracy, and they fall short on fulfilment and payment flow completion.

Bernstein noted that the industry is moving toward integrations intended to reduce those limitations. For example, ChatGPT’s Agentic Commerce Protocol (ACP) and Claude’s integration with Uber Eats (NYSE: UBER) represent steps toward allowing these agents to place grocery and food delivery orders. But even with those links, transactions still require customers to complete purchases by following through to the retailer’s interface or by completing steps outside the conversational agent.

The same practical constraint applies to retail-native assistants. Amazon’s Alexa and Walmart’s Sparky can navigate their respective ecosystems with higher transactional accuracy inside their own assortments, but neither could complete purchases entirely within the assistant during Bernstein’s assessment.

Implications for stakeholders

For retailers and e-commerce platforms, the findings underscore two competing development paths: broaden discovery and cross-retailer compatibility through foundational models, or deepen transactional integration within a retailer’s own systems. Payments providers and logistics partners are central to closing the remaining gaps, because embedding payment and fulfilment capabilities into the agentic flow is required for fully autonomous transactions.

Conclusion

In its current state, agentic shopping can significantly aid product discovery and streamline parts of the ordering process, but it stops short of completing purchases without human participation. Until payment and fulfilment are natively embedded within the conversational experience - and until agents have reliable real-time access to retailer catalogue and inventory data - AI remains "one step removed from making shopping decisions on behalf of users." The technology is progressing through partnerships and integrations, but Bernstein’s test indicates that fully unsupervised agentic commerce has not yet been achieved.

Key points

Foundational AI models excel at cross-platform product discovery but lack direct access to catalogue, inventory, pricing, and fulfilment data, limiting transactional capability - sectors impacted: retail platforms, comparison services.
Retail-native AI agents provide more precise, transaction-ready experiences within their own assortments but cannot operate across retailers - sectors impacted: e-commerce marketplaces, grocers, big-box retail.
Embedding payment and delivery capabilities into agentic experiences is necessary to achieve fully autonomous shopping; until then, human intervention remains required - sectors impacted: payments, logistics, and delivery services.

Risks and uncertainties

Dependence on web-scraped or third-party content by non-retailer AI models raises risks of SKU and pricing inaccuracy, affecting consumer confidence and pricing-sensitive retail segments - impacted: online retail and marketplaces.
The limitation of retail-native agents to their own assortments could reinforce platform concentration and reduce cross-retailer competition unless interoperability improves - impacted: e-commerce competition and consumer choice.
Until payments are embedded into agentic flows, the user experience will require additional steps, which could slow adoption and limit benefits for convenience-oriented services like grocery and food delivery - impacted: payments and last-mile delivery sectors.

Risks

Reliance on web-scraping and third-party articles by foundational models can produce SKU identification and pricing errors, posing risks to online retail accuracy.
Retail-native agents’ limitation to their own assortments may restrict cross-retailer competition and consumer choice unless interoperability improves.
Lack of embedded payment and fulfilment within agentic flows requires human steps to complete transactions, slowing adoption for convenience-focused services such as grocery and food delivery.

Menu

Agentic Shopping: How Far Can AI Go in Buying for Consumers?

Key Points

Risks

More from Stock Markets