반응형
Motivation
Distributional semantics: 단어의 의미는 주변에서 자주 나타나는 단어에 의해 부여된다.
-> Representing words by their context
Notation
- $t$: position in the text
- $c$: center word
- $o$: context words (outside words)
- $P(w_{t+j}|w_{t})$: the probability of o given c (or vice versa).
- $\theta$: all variables to be optimized
- $L(\theta)$: likelihood
- $J(\theta)$: objective function (average negative log likelihood)
Objective Function
For each position $t = 1, \cdots , T$, predict context words within a window of fixed size $m$, given center word $w_j$.
$$L(\theta)=\prod_{t=1}^{T} \prod_{-m\leq j\leq m \ (j\neq 0)} P(w_{t+j}|w_{t}; \theta)$$
The objective function $J(\theta)$ is the (average) negative log likelihood:
$$J(\theta) = -\frac{1}{T} \log{L(\theta)} = -\frac{1}{T} \sum_{t=1}^{T} \sum_{-m\leq j\leq m \ (j\neq 0)} \log{P(w_{t+j}|w_{t}; \theta)}$$
How to calculate $P(w_{t+j}|w_{t})$?
We will use two vectors per word $w$:
- $v_{w}$ when $w$ is a center word
- $u_{w}$ when $w$ is a context word
Then for a center word $c$ and a context word $o$:
$$P(o|c)=\frac{\exp(u_{o}^{T}v_c)}{\sum_{w\in V} \exp(u_{w}^{T}v_c) }$$
반응형
'NLP' 카테고리의 다른 글
RNN & LSTM (Vanishing Gradient, Exploding Gradient) (0) | 2024.04.08 |
---|---|
PPL (Perplexity) (0) | 2024.04.05 |
Long-tail knowledge 개념 (0) | 2024.03.28 |
NLP 분야의 도전 과제들 (벤치마크 데이터셋) (0) | 2024.01.20 |
Topics for Language Modeling (0) | 2024.01.17 |