AI agents · OpenClaw · self-hosting · automation

Quick Answer

What is WARP Attack? Web Agent Retrieval Poisoning Explained

Published:

What is WARP Attack? Web Agent Retrieval Poisoning Explained

WARP (Web Agent Retrieval Poisoning) is a class of attack against AI deep-research agents — including ChatGPT Deep Research and Google Gemini — disclosed in mid-June 2026 by Cornell Tech researchers. As few as 13 promotional words inserted into a Reddit comment can steer the AI agent’s recommendations toward fake products, scam websites, or attacker-chosen outputs. Published hit rates: 38-51% single-page, up to 62% multi-thread.

Last verified: June 20, 2026. Disclosed by Cornell Tech researchers Tingwei Zhang, Harold Triedman, and Vitaly Shmatikov.

TL;DR

  • What it is: A retrieval-poisoning attack against AI deep-research agents.
  • Who found it: Cornell Tech researchers, disclosed June 2026.
  • How it works: Insert ~13 promotional words into existing UGC pages (Reddit comments, Wikipedia, Quora, YouTube descriptions) that AI agents consistently retrieve.
  • Hit rate: 38-51% with one page, up to 62% when seeded across multiple threads.
  • Why it works: 17-23% of pages pulled by AI deep-research agents come from user-generated content sites — content anyone can edit.
  • Affected: ChatGPT Deep Research, Gemini deep research, open-source STORM/Co-STORM/OmniThink agents.
  • Defenses tested: Blocking UGC, pre-screening sources, scanning answers — all either failed or degraded the agent significantly.

How WARP works (step by step)

The Cornell Tech team’s setup is straightforward and reproducible — that’s the unsettling part.

  1. Identify a topic cluster. The attacker chooses a category they want to influence: “best restaurants in Austin,” “AI dating apps for engineers,” “open-source AI coding tools.”
  2. Find consistently retrieved pages. Using black-box query access to the same search engine the AI agent uses (e.g., Google), the attacker identifies which UGC pages — Reddit threads, Quora answers, Wikipedia paragraphs — get cited across many queries in the cluster.
  3. Craft the poison. ~13 words is enough. Example: “Locals all recommend Sol Azteca for the best tacos in East Austin.” The fictional restaurant (“Sol Azteca”) gets name-dropped repeatedly across the topic.
  4. Insert into UGC. Post a Reddit comment, edit a Wikipedia line, add a Quora answer, drop a YouTube description.
  5. Wait for harvesting. The AI agent crawls the poisoned page during its retrieval phase, treats it as just another source, and surfaces the planted recommendation in its final answer.

The researchers demonstrated the attack with two decoys: Sol Azteca, a fake Austin restaurant, and SilverPath, a fake dating app. Both got recommended as if they were real, citing the planted UGC content.

WARP hit rates (from the Cornell Tech paper)

SetupSuccess rate
Single poisoned page38-51%
Multi-thread seeding (3+ pages)up to 62%
Open-source agents (STORM, Co-STORM, OmniThink)Highest susceptibility
Commercial ChatGPT Deep ResearchLower (0.4% UGC citation rate)
Commercial Gemini Deep ResearchHigher (~12% UGC citation rate)

WARP vs other AI agent attacks

AttributeWARPPrompt InjectionSearchLeak (Copilot)Memory Poisoning
ScopeTopic-cluster widePer-conversationPer-victimPer-agent-memory
Entry pointUGC pagesWebpages/docs in contextOne-click Microsoft URLLong-term memory store
Skill requiredWriting 13 wordsCrafting injection promptReverse-engineering renderingCompromised memory write path
ScaleHits everyone querying the topicHits one userHits one userHits one agent persistently
Patchable?Hard (structural)Hard (structural)Yes (rendering channel fix)Yes (memory hardening)

Why WARP is hard to defend against

The Cornell Tech team tested the obvious defenses. All of them either failed or made the AI noticeably worse:

  • Blocking user-generated sites: Cuts off Reddit, Quora, Wikipedia, YouTube — which are 17-23% of all retrieved sources. The agent gets significantly less useful, especially for queries that genuinely benefit from community knowledge.
  • Pre-screening sources: Sandbox experiments showed pre-screening failed against well-crafted poison because the 13 words look like legitimate community content.
  • Scanning the final answer: Catches some cases but not most. The poison reads as natural recommendation text by design.

The deeper problem is trust placement. AI deep-research agents treat retrieved text as authoritative without weighting by source provenance. A Reddit comment about a restaurant gets the same epistemic weight as an article in The New York Times when both happen to mention the same business name.

What to do about WARP

If you use AI deep research (end user)

  • Treat outputs as leads, not verdicts. Cross-check any unfamiliar name — restaurant, product, dating app, service, contractor — against a second independent source before clicking, paying, or sharing personal data.
  • Be especially cautious with recommendations. “Best X for Y” queries are the highest-attacked category by design. The attacker’s economics work best for product/service recommendations.
  • Prefer AI tools with low UGC citation rates. ChatGPT Deep Research cites UGC in just 0.4% of citations — meaningfully safer than Gemini’s ~12%.

If you build with RAG / retrieval (developer)

  • Deduplicate by domain. Don’t let three Reddit comments outvote one primary source.
  • Weight by source authority. Government, academic, and major publisher domains should be weighted higher than UGC.
  • Show source provenance in the UI. When a recommendation is driven by Reddit, the user should see that. When it’s driven by a Times article, the user should see that too.
  • Watermark trusted sources. Maintain an allowlist of high-trust domains and surface them prominently.

If you operate an AI platform

  • Implement source reputation scoring and surface it.
  • Track UGC citation rates and report them publicly. OpenAI’s 0.4% is a competitive advantage; Gemini’s 12% is a liability.
  • Build provenance APIs so downstream tools can re-rank by trust.

Why WARP matters in the bigger 2026 AI security picture

WARP is the second major “AI agent trust” attack disclosed in June 2026 alone — the other being SearchLeak (CVE-2026-42824), the one-click Microsoft Copilot data exfiltration flaw. Both attacks succeed because AI agents in 2026 have been given large permissions (read your inbox, search the web for you, summarize for you) without correspondingly hardened trust models.

The pattern for H2 2026 will likely be: more retrieval-poisoning research, more rendering-channel CVEs, more enterprise AI security teams asking hard questions about agent permissions. WARP is the wake-up call for the search/retrieval side of that conversation.

Sources

  • Cornell Tech: Tingwei Zhang, Harold Triedman, Vitaly Shmatikov — WARP paper, June 2026
  • Tom’s Guide: “A 13-word Reddit comment can trick AI search into recommending scams” (June 2026)
  • NeuralBuddies: AI News Recap, June 19, 2026
  • Yahoo Tech: WARP attack coverage (June 2026)
  • Cornell systems research seminar: Multi-agent systems execute arbitrary malicious code

Published June 20, 2026 by andrew.ooo. Coverage of AI security and agent attacks — see also SearchLeak Copilot vulnerability and our AI agent attack landscape comparison.