Softmax, Negative Sampling, and Noise Contrastive Estimation
\nabla_\theta J_\theta = \: \nabla_\theta \mathcal{E}(w) - \sum_{w_i \in V} P(w_i) \nabla_\theta \mathcal{E}(w_i)
Softmax, Negative Sampling, and Noise Contrastive Estimation
\nabla_\theta J_\theta = \: \nabla_\theta \mathcal{E}(w) - \sum_{w_i \in V} P(w_i) \nabla_\theta \mathcal{E}(w_i)