NLP

Word Vectors

oneonlee 2024. 4. 1. 22:52
반응형

Motivation

Distributional semantics: 단어의 의미는 주변에서 자주 나타나는 단어에 의해 부여된다.

-> Representing words by their context

Notation

  • $t$: position in the text
  • $c$: center word
  • $o$: context words (outside words)
  • $P(w_{t+j}|w_{t})$: the probability of o given c (or vice versa).
  • $\theta$: all variables to be optimized
  • $L(\theta)$: likelihood
  • $J(\theta)$: objective function (average negative log likelihood)

Objective Function

For each position $t = 1, \cdots , T$, predict context words within a window of fixed size $m$, given center word $w_j$.

$$L(\theta)=\prod_{t=1}^{T} \prod_{-m\leq j\leq m \ (j\neq 0)} P(w_{t+j}|w_{t}; \theta)$$

The objective function $J(\theta)$ is the (average) negative log likelihood:

$$J(\theta) = -\frac{1}{T} \log{L(\theta)} = -\frac{1}{T} \sum_{t=1}^{T} \sum_{-m\leq j\leq m \ (j\neq 0)} \log{P(w_{t+j}|w_{t}; \theta)}$$

How to calculate $P(w_{t+j}|w_{t})$?

We will use two vectors per word $w$:

  • $v_{w}$ when $w$ is a center word
  • $u_{w}$ when $w$ is a context word

Then for a center word $c$ and a context word $o$:
$$P(o|c)=\frac{\exp(u_{o}^{T}v_c)}{\sum_{w\in V} \exp(u_{w}^{T}v_c) }$$

반응형