Performing extractive summarization
In the chapter’s introduction, we mentioned that extractive summarization identifies important words or phrases and stitches them together to produce a condensed version of the original text. In this section, we use the previously created books.json file and employ different methods to extract summaries for an input document. Due to space limitations and the need to focus on state-of-the-art techniques, we do not present the theory behind the methods. However, there is a plethora of online resources that can be consulted. A good starting point is the following link: https://miso-belica.github.io/sumy/summarizators.html.
Let’s begin by loading the data from the file and printing a few examples:
import pandas as pd
df = pd.read_json('books.json')
df.head()
>> title product_description...