SHARED INTEL Q&A: AI retrieval systems can still hallucinate; deterministic logic offers a fix
好的,我现在需要帮用户总结这篇文章的内容,控制在100字以内。首先,我得仔细阅读文章,理解其主要内容和关键点。 文章主要讨论了AI生成中的幻觉问题,特别是企业在部署AI时面临的信任问题。第一代RAG(检索增强生成)虽然在一定程度上缓解了幻觉问题,但仍然存在不准确和不稳定的问题。作者提到RAGTruth研究显示,即使有准确的来源,AI仍可能错误陈述或得出错误结论。 接着,文章引入了DivideGraph创始人Tyler Messa的解决方案——第二代RAG,即确定性RAG。这种方法不再依赖概率匹配,而是基于权威框架构建结构化逻辑图,确保答案可重复、可验证,并且符合监管要求。 Messa用Autocorrect和Google Maps的类比来解释两种方法的区别:传统RAG像Autocorrect,依赖概率预测;而确定性RAG像Google Maps,提供明确的路径和审计轨迹。这种方法特别适用于高风险领域如金融、合规等。 最后,文章提到这种确定性RAG可能成为AI的信任层,并对审计流程产生重大影响,使企业能够更准确地追踪和验证AI决策过程。 总结下来,文章的核心内容是:AI幻觉问题在企业应用中不可忽视;第一代RAG虽有帮助但存在缺陷;第二代确定性RAG通过结构化逻辑和权威框架解决了这些问题,并在高风险领域展现了潜力。 </think> 文章探讨了AI生成中的幻觉问题及其对企业部署的影响。第一代检索增强生成(RAG)通过引用可信来源缓解了部分问题,但存在不准确和不稳定的风险。 Tyler Messa提出的确定性RAG通过构建结构化逻辑图确保答案可重复、可验证,并符合监管要求。这种方法特别适用于高风险领域如金融和合规,并可能成为AI的信任层。 2026-1-21 10:46:34 Author: securityboulevard.com(查看原文) 阅读量:0 收藏

By Byron V. Acohido

AI hallucination is still the deal-breaker.

Related: Retrieval Augmented Generation (RAG) strategies

As companies rush AI into production, executives face a basic constraint: you cannot automate a workflow if you cannot trust the output. A model that fabricates facts becomes a risk exposure. CISOs now have to explain this clearly to boards, who expect assurances that fabrication risk will be controlled, not hoped away.

Earlier this year, retrieval-augmented generation, or RAG, gained attention as a practical check on hallucination. The idea was straightforward: before answering, the model retrieves grounding material from a trusted source and uses that to shape its response. This improved reliability in many early use cases.

But first-generation RAG had a hidden weakness. A major academic study (“RAGTruth”) showed that even when RAG retrieves accurate source material, AI systems can still misstate it or draw the wrong conclusion. The research comes from the ACL Anthology, the leading global library for peer-reviewed AI language research.

More broadly, today’s RAG systems rely on probabilistic similarity. Small changes in how a question is asked can push the model toward different source material, meaning two users may receive different answers with no clear audit trail. That instability limits trust in regulated environments.

A second wave of RAG innovation argues for something more deterministic. Instead of inferring relationships among documents, the system traverses only the links defined in authoritative frameworks, such as regulations or internal controls. Same question. Same source path. Verifiable answer.

If this approach holds, regulated enterprises may gain a new way to trust AI in production. The Q&A that follows examines this emerging direction through the work of DivideGraph founder Tyler Messa.

LW: For leaders new to the topic, what problem was first-generation RAG trying to solve?

Messa: Think of early RAG as turning AI into an “open-book test.” Before it, models hallucinated because they were pulling answers from memory. RAG let them reference source material before responding.

For many low-risk business tasks, that was good enough. But in regulated environments, “good enough” isn’t a standard. Boards and regulators expect accuracy that can be demonstrated, not hoped for.

LW: Where did first-generation RAG fall short?

Messa

Messa: The weakness showed up when I tried using it with the Cyber Risk Institute Profile — a framework that harmonizes more than 2,500 cybersecurity requirements. I didn’t need creativity. I needed accuracy.

Instead, the AI treated the framework like searchable text rather than structured logic. Worse, it often invented relationships between requirements that didn’t exist. It could take the right source material and hallucinate itself into the wrong conclusion.

The other problem was instability. I couldn’t reliably get the same result twice, and I couldn’t get a complete audit trail. In compliance, that’s fatal. Regulators don’t accept “the AI thinks so.” They expect systems anchored to authoritative frameworks with verifiable reasoning.

LW: How is your approach different?

Messa: The analogy I use is Autocorrect vs. Google Maps.

Traditional RAG behaves like Autocorrect — it predicts what’s likely based on probability. That’s dangerous when the cost of being wrong can be billions of dollars.

DivideGraph works like Google Maps for compliance. We decomposed regulations into precise components and rebuilt the intended logic as a navigable system. When the system answers, it follows that map with a turn-by-turn audit trail.

The AI isn’t “thinking.” It’s the voice reading directions. The graph calculates the path. That means every answer is repeatable, verifiable, and anchored to frameworks regulators already recognize.

LW: Where does deterministic RAG make the biggest impact?

Messa: Anywhere a wrong answer creates real risk: fines, legal exposure, outages, breaches. More broadly, it closes the gap between policy and operations.

Compliance can become continuous instead of episodic. Change management becomes safer because the system understands dependencies. And leadership finally gets an accurate, real-time understanding of risk posture.

LW: Is anyone else doing this?

Messa: To my knowledge, no. Most platforms are still trying to predict compliance. Banks are uniquely positioned as early adopters because the industry already did foundational work: the CRI Profile provides a harmonized framework to compute against.

To adopt this model, two conditions matter: the cost of being wrong has to be high, and there has to be a standardized framework to anchor to.

LW: If this gains traction, how do you see it spreading?

Messa: Deterministic systems will become the trust layer for AI. You can’t responsibly build financial decisioning or fraud systems on probabilistic guesswork. We have decades of regulatory intelligence trapped in PDFs. Deterministic RAG operationalizes that intelligence.

This isn’t about replacing human oversight. It’s about making oversight computational.

LW: What would this change for auditors?

Messa: Everything. Today, AI compliance claims are hard to prove. You can show prompts and documents, but you can’t show reasoning because probabilistic systems don’t have explicit reasoning.

With a graph, every answer has a chain of logic. Auditors can see exactly which regulation required which control and how it maps to evidence. That levels the playing field for smaller banks who can’t afford armies of consultants. And it gives regulators the ability to examine systemic risk at a sector level.

LW: What proof will enterprises need before trusting deterministic RAG?

Messa: The most important signal is the ability to say “no.” A trustworthy system refuses requests that violate law, logic, or safe operation. It understands time, so it doesn’t reference rescinded rules. It understands concepts rather than just matching words. And it produces complete, verifiable traceability.

Confidence comes from precision.

Acohido

Pulitzer Prize-winning business journalist Byron V. Acohido is dedicated to fostering public awareness about how to make the Internet as private and secure as it ought to be.


January 21st, 2026 | Q & A | Top Stories


文章来源: https://securityboulevard.com/2026/01/shared-intel-qa-ai-retrieval-systems-can-still-hallucinate-deterministic-logic-offers-a-fix/
如有侵权请联系:admin#unsafe.sh