반응형
(EMNLP 2023) SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
arXiv: https://arxiv.org/abs/2303.08896
code: https://github.com/potsawee/selfcheckgpt
1. Problem
- Hallucination Detection
- 기존의 fact verification 방법은 ChatGPT와 같은 블랙박스 모델에서는 작동하지 않을 수 있으므로 외부 리소스 없이도 Hallucination을 Detection 할 수 있는 새로운 접근 방식이 필요함
2. Related Works
- intrinsic uncertainty metrics (e.g., token probability or entropy)
- information may not be available to users when systems are accessed through limited external APIs
- fact-verification approaches
- facts can only be assessed relative to the knowledge present in the database
- hallucinations are observed over a wide range of tasks beyond pure fact verification
3. Proposed Key Ideas: SelfCheckGPT (sampling-based approach)
- 외부 리소스에 의존하지 않고 블랙박스 LLM에서 환각을 감지하기 위한 샘플링 기반 접근 방식인 'SelfCheckGPT'를 소개
- BERTScore, question-answering, n-gram 분석, NLI, LLM 프롬프트 등 다양한 변형을 사용하여 샘플링된 여러 응답에 걸쳐 일관성을 측정
- The motivating idea
- When an LLM has been trained on a given concept, the sampled responses are likely to be similar and contain consistent facts.
- However, for hallucinated facts, stochastically sampled responses are likely to diverge and
may contradict one another.
- zero-resource hallucination detection solution that can be applied to black-box systems
반응형