[논문 간단 정리] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Paper Review

oneonlee 2024. 9. 30. 17:34

(EMNLP 2023) SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Hallucination Detection
- 기존의 fact verification 방법은 ChatGPT와 같은 블랙박스 모델에서는 작동하지 않을 수 있으므로 외부 리소스 없이도 Hallucination을 Detection 할 수 있는 새로운 접근 방식이 필요함

intrinsic uncertainty metrics (e.g., token probability or entropy)
- information may not be available to users when systems are accessed through limited external APIs
fact-verification approaches
- facts can only be assessed relative to the knowledge present in the database
- hallucinations are observed over a wide range of tasks beyond pure fact verification

외부 리소스에 의존하지 않고 블랙박스 LLM에서 환각을 감지하기 위한 샘플링 기반 접근 방식인 'SelfCheckGPT'를 소개
- BERTScore, question-answering, n-gram 분석, NLI, LLM 프롬프트 등 다양한 변형을 사용하여 샘플링된 여러 응답에 걸쳐 일관성을 측정
The motivating idea
- When an LLM has been trained on a given concept, the sampled responses are likely to be similar and contain consistent facts.
- However, for hallucinated facts, stochastically sampled responses are likely to diverge and
  may contradict one another.
zero-resource hallucination detection solution that can be applied to black-box systems