Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Abstract
When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.
Community
- A simple approach leverages only the attention maps (weights) in LLaMA to detect if the generated content contains contextual hallucinations -- cases where LLMs generate fake facts that do not exist in the provided documents.
- Using the detector to guide LLMs' text generation can help reduce hallucinations. The detector can be transferred across tasks and models.
Code & Trained Classifier & Data: https://github.com/voidism/Lookback-Lens
Wow, it is really interesting idea for LLM hallucination problem.
The idea of using self-attention patterns to detect hallucination is similar to the paper "Attention Satisfies..." (https://arxiv.org/abs/2309.15098).
However, this paper proposes a new decoding strategy that can mitigate hallucination, whereas the paper above provides only the analysis on the cause of hallucination.
Thanks for sharing! This paper is super interesting!
But I found that this paper still focuses on close-book hallucination settings -- they make LLMs answer questions without any given documents. Our paper focuses on the setting that the correct facts exist in the document, but the LLM still hallucinates. We believe that in such cases the attention patterns on the context would be more meaningful as it records how the LLMs look at the context information.
We will include this paper in our related work in the next version!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper