Технології
🇺🇸 США
Beyond Semantic Similarity
Computer Science > Information Retrieval
arXiv:2605.05242 (cs)
[Submitted on 3 May 2026]
Title:Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
Authors:Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang View a PDF of the paper titled Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction, by Zhuofeng Li and 18 other authors
View PDF
[v1] Sun, 3 May 2026 19:13:11 UTC (5,193 KB)
Full-text links:
view license
new | recent | 2026-05 Change to browse by: cs
cs.AI
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs
arXivLabs: experimental projects with community collaborators
Abstract:Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning. Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence. To tackle the limitation, we study direct corpus interaction (DCI), where an agent searches the raw corpus directly with general-purpose terminal tools (e.g., grep, file reads, shell commands, lightweight scripts), without any embedding model, vector index, or retrieval API. This approach requires no offline indexing and adapts naturally to evolving local corpora. Across IR benchmarks and end-to-end agentic search tasks, this simple setup substantially outperforms strong sparse, dense, and reranking baselines on several BRIGHT and BEIR datasets, and attains strong accuracy on BrowseComp-Plus and multi-hop QA without relying on any conventional semantic retriever. Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus, with which DCI opens a broader interface-design space for agentic search.
| Subjects: | Information Retrieval (cs.IR); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.05242 [cs.IR] |
| (or arXiv:2605.05242v1 [cs.IR] for this version) | |
| https://doi.org/10.48550/arXiv.2605.05242 Focus to learn more arXiv-issued DOI via DataCite |
Submission history
From: Zhuofeng Li [view email][v1] Sun, 3 May 2026 19:13:11 UTC (5,193 KB)
Full-text links:
Access Paper:
-
View a PDF of the paper titled Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction, by Zhuofeng Li and 18 other authors
- View PDF
- TeX Source
Additional Features
Current browse context:
cs.IR < prev | next >new | recent | 2026-05 Change to browse by: cs
cs.AI
References & Citations
export BibTeX citation Loading...BibTeX formatted citation
× loading... Data provided by:Bookmark
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
Джерело
Читати оригінал
Поділитися
Схожі новини
Технології
Технології
Investors say they want Trump and Xi to stay out of AI’s way
Japan Times
·
Технології
OpenAI boss Sam Altman says Musk wanted control of company
DW (Deutsche Welle)
·
Технології
OpenAI boss Sam Altman says Musk wanted control of company
DW Society
·
'Indians hiring Indians': Former Google contractor says he was asked to train his replacement
Times of India — World
·