词嵌入 网络嵌入
什么是词嵌入? (What is Word Embedding ?)
Word Embedding is a technique in Natural Language Processing which is used to represent words in a Deep Learning environment.
单词嵌入是自然语言处理中的一种技术,用于在深度学习环境中表示单词。
为什么要嵌入单词? (Why Word Embedding ?)
The main advantage of using word embedding is that it allows words of similar context to be grouped together and dissimilar words are positioned far away from each other. This is done with the help of an Embedding MatrixThe similarity of 2 words can be found with the help of Cosine Similarity
使用单词嵌入的主要优点是,它允许将上下文相似的单词分组在一起,而相异的单词彼此之间的位置要远一些。 这是借助嵌入矩阵完成的。借助余弦相似度可以找到2个单词的相似度
嵌入矩阵 (Embedding Matrix)

Embedding matrix is a randomly initialized matrix whose dimensions are N * (Size of the vocabulary + 1), Where N is the number that we have to manually choose and Size of the Vocabulary is the total number of unique words in our Document. So, Each Column of the embedding matrix represents a particular word from the document
嵌入矩阵是一个随机初始化的矩阵,其维度为N *(词汇表大小+ 1),其中N是我们必须手动选择的数字,词汇表大小是文档中唯一词的总数。 因此, 嵌入矩阵的每一列代表文档中的特定单词
The embedding matrix will be trained over time using gradient descent to learn the values of the matrix in such a way that the words with similarity are placed together. for example , King and Queen are highly royal whereas a boy need not be that much loyal. and King and the boy are male so the value corresponding to male is very high for both the king and the boy.
随时间推移,将使用梯度下降训练嵌入矩阵,以将具有相似性的单词放在一起的方式来学习矩阵的值。 例如,国王和王后是高度皇室,而一个男孩不必那么忠诚。 国王和男孩都是男性,因此国王和男孩的男性对应值都很高。
It is important to know that , as shown in the picture, we don’t explicitly define these features (Royal, Male, Age,etc..). This is just a randomly initialized matrix that learns these features with and the corresponding values with the help of gradient descent.
重要的是要知道,如图所示,我们没有明确定义这些功能(皇家,男性,年龄等)。 这只是一个随机初始化的矩阵,借助梯度下降学习这些特征以及相应的值。
嵌入矩阵的预处理 (Pre-Processing for Embedding Matrix)
We know that we cannot use non-numerical data for machine learning and guess what, words are of course, non-numerical. So, let’s see how we have to convert them before the forward propagation.
我们知道,我们不能使用非数值数据进行机器学习,而无法猜测哪些单词当然是非数值的。 因此,让我们看看在前向传播之前我们必须如何转换它们。
There are a lot of algorithms for this:1. One Hot Encoding2. Term Frequency-Inverse Document Frequency3. Tokenization (Text to Sequence)
为此,有很多算法:1。 一种热编码2。 术语“频率-反向文档频率” 3。 标记化(文本到序列)
But, for this purpose, Tokenization is the most preferred and you will understand why in a few minutes.
但是,为此目的,令牌化是最优选的,几分钟后您将理解为什么。
标记化:为语料库中的每个唯一单词分配一个数字称为标记化。 (Tokenization: Assigning a number for each unique word in the corpus is called as tokenization.)
Example:let’s assume that we have a training set with 3 training examples[“What is your name”,”how are you”,”where are you”]if we have to tokenize this data, the result would be this:
示例:假设我们有一个包含3个训练示例的训练集[“你叫什么名字”,“你好吗”,“你在哪里”],如果我们必须标记这些数据,结果将是:
What : 1, is : 2, your : 3, name: 4, how:5, are : 6, you : 7, where : 8
什么:1,是:2,您的:3,名称:4,如何:5,是:6,您:7,其中:8
Tokenized form of first sentence: [1,2,3,4]Tokenized form of second sentence : [5,6,7]Tokenized form of third sentence : [8,6,7]
第一句的标记形式:[1,2,3,4]第二句的标记形式:[5,6,7]第三句的标记形式:[8,6,7]
Now , The data is pre-processed. let’s move on to the forward pass
现在,数据已经过预处理。 让我们继续前进
正向传播 (Forward Propagation)

As we have already seen , each column represents a word from the out training set. N is manually picked and it represents the dimension of each word. for the below example let’s consider the vocabulary size is 1000 and the N is 15
正如我们已经看到的那样,每一列代表出训练集中的一个词。 N是手动选取的,它代表每个单词的维数。 对于下面的示例, 让我们考虑词汇量为1000,N为15
let us consider an Input example : The Weather is Nice
让我们考虑一个输入示例: 天气很好
So, Each word is given a number when we tokenize it. Hence, The Tokenized representation of “The Weather is Nice” might be something like this array [123,54,792,205].
因此,当我们对每个单词进行标记时,会为其分配一个数字。 因此,“天气很好”的标记化表示可能类似于此数组[123,54,792,205]。
Now, when we pass this array of tokens into the neural network for the forward pass, the embedding matrix contains of 1000 columns. as the input is [123,554,792,205] . the columns 123, 554, 792, 205 are selected from this embedded matrix.
现在,当我们将此令牌数组传递到神经网络以进行正向传递时,嵌入矩阵将包含1000列。 因为输入是[123,554,792,205]。 从该嵌入矩阵中选择列123、554、792、205。
Each of these 4 columns have 15 rows(N). we take all these 4 columns and stack them on each other ( Flattening these 4 tensors to form a single tensor of size 15*4)
这4列中的每列都有15行(N)。 我们将所有这4列合并在一起(将这4个张量展平以形成大小为15 * 4的单个张量)
The Flattened Tensor is then passed to an RNN or Dense Layer and finally predicts an output
然后将展平的张量传递到RNN或密集层,并最终预测输出
嵌入矩阵值 (Embedding Matrix Values)

The embedding matrix values are nothing but the parameters that are learnt over time with the help of gradient descent like any other supervised learning algorithms.
嵌入矩阵值不过是像其他任何监督学习算法一样借助梯度下降随时间而学习的参数。
It happens to learn these values in such a way that the cosine similarity of similar words is pretty high compared to those words that are dissimilar.
碰巧以这样的方式学习这些值,即与不相似的单词相比,相似单词的余弦相似度非常高。
结论 (Conclusion)
Word embeddings is one of the most effective algorithms in Natural Language Processing and is used for creating great chatbots and it’s main advantage lies in It’s ability to group similar words together with the help of the values learnt by the embedding matrix
词嵌入是自然语言处理中最有效的算法之一,用于创建出色的聊天机器人,它的主要优势在于它能够借助嵌入矩阵学习的值将相似的词组合在一起
Thanks for reading
谢谢阅读
词嵌入 网络嵌入