Building a document topic classifier
In order to show you how to utilize t graph structure, we will focus in the following subsections on using the topological information and the connections between entities and documents provided by the bipartite entity-document graph to train multi-label classifiers that are able to predict the document topics.
In order to do this, we will analyze two different approaches:
- A shallow machine learning approach, where we will use the embeddings extracted from the bipartite network to train traditional classifiers, such as a RandomForest classifier
- A more integrated and differentiable approach using a graph neural network, which is applied on heterogeneous graphs (such as the bipartite graph)
In the following code block, we will consider the 10 most common topics for which we have enough documents to train and evaluate our models:
from collections import Counter
topics = Counter(
[label
for document_labels...