Paper Review

[논문 간단 정리] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

oneonlee 2024. 9. 30. 17:34
반응형

(EMNLP 2023) SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

arXiv: https://arxiv.org/abs/2303.08896

code: https://github.com/potsawee/selfcheckgpt


 

1. Problem

  • Hallucination Detection
    • 기존의 fact verification 방법은 ChatGPT와 같은 블랙박스 모델에서는 작동하지 않을 수 있으므로 외부 리소스 없이도 Hallucination을 Detection 할 수 있는 새로운 접근 방식이 필요함

 

2. Related Works

  • intrinsic uncertainty metrics (e.g.,  token probability or entropy)
    • information may not be available to users when systems are accessed through limited external APIs
  • fact-verification approaches
    • facts can only be assessed relative to the knowledge present in the database
    • hallucinations are observed over a wide range of tasks beyond pure fact verification

 

3. Proposed Key Ideas: SelfCheckGPT (sampling-based approach)

  • 외부 리소스에 의존하지 않고 블랙박스 LLM에서 환각을 감지하기 위한 샘플링 기반 접근 방식인 'SelfCheckGPT'를 소개
    • BERTScore, question-answering, n-gram 분석, NLI, LLM 프롬프트 등 다양한 변형을 사용하여 샘플링된 여러 응답에 걸쳐 일관성을 측정
  • The motivating idea
    • When an LLM has been trained on a given concept, the sampled responses are likely to be similar and contain consistent facts.
    • However, for hallucinated facts, stochastically sampled responses are likely to diverge and
      may contradict one another.
  • zero-resource hallucination detection solution that can be applied to black-box systems

 

반응형