Word Vectors

NLP

oneonlee 2024. 4. 1. 22:52

Motivation

Distributional semantics: 단어의 의미는 주변에서 자주 나타나는 단어에 의해 부여된다.

-> Representing words by their context

For each position $t = 1, \cdots , T$, predict context words within a window of fixed size $m$, given center word $w_j$.

$$L(\theta)=\prod_{t=1}^{T} \prod_{-m\leq j\leq m \ (j\neq 0)} P(w_{t+j}|w_{t}; \theta)$$

The objective function $J(\theta)$ is the (average) negative log likelihood:

$$J(\theta) = -\frac{1}{T} \log{L(\theta)} = -\frac{1}{T} \sum_{t=1}^{T} \sum_{-m\leq j\leq m \ (j\neq 0)} \log{P(w_{t+j}|w_{t}; \theta)}$$

We will use two vectors per word $w$:

Then for a center word $c$ and a context word $o$:
$$P(o|c)=\frac{\exp(u_{o}^{T}v_c)}{\sum_{w\in V} \exp(u_{w}^{T}v_c) }$$