What this book covers
Chapter 1, Getting Started with spaCy, covers an overview of the library, how to install it, and also how to visualize your texts with displaCy.
Chapter 2, Core Operations with spaCy, covers the basics of spaCy’s processing pipelines. We will dive deep into Tokenizer
, usually the first component of our text processing pipelines. We’ll also get to know the main spaCy containers, such as Doc
, Token
, and Span
.
Chapter 3, Extracting Linguistic Features, covers (you guessed it) linguistic features. We’ll learn how to use spaCy’s part-of-speech tags and dependency parsing tags to analyze and extract information from text.
Chapter 4, Mastering Rule-Based Matching, explores how to use linguistic features to create extraction modules using spaCy’s matchers and the SpanRuler
component.
Chapter 5, Extracting Semantic Representations with spaCy Pipelines, covers a use case of carrying out semantic parsing of utterances using everything you learned in Chapter 3 and Chapter 4 to create your first custom spaCy pipeline component to extract the intent of texts.
Chapter 6, Utilizing spaCy with Transformers, teaches you how to train a custom spaCy component using spacy-transformers
. In this chapter, you’ll be introduced to spaCy config files and spaCy’s CLI, which are very useful for maintaining and reproducing NLP pipelines.
Chapter 7, Enhancing NLP Tasks Using LLMs with spacy-llm, covers how to use LLMs in your NLP pipelines with spacy-llm
.
Chapter 8, Training an NER Component with Your Own Data, will focus on how to label your own data to train a NER component. We’ll see how to prepare the data and also learn about annotation tools, including Prodigy, the annotation tool from Explosion, the team behind spaCy.
Chapter 9, Creating End-to-End spaCy Workflows with Weasel, covers how to use Weasel, a tool to help us create reproducible and well-structured project workflows. We’ll also see how to use DVC Studio to manage models. DVC is another cool open source project for data/model versioning and experiment tracking.
Chapter 10, Training an Entity Linker Model with spaCy, covers the entity linking task in NLP and also the best practices to create high-quality datasets for NLP training. You’ll also learn how to use a custom corpus reader to train a spaCy component.
Chapter 11, Integrating spaCy with Third-Party Libraries, covers how to integrate spaCy projects with Streamlit to build beautiful web interfaces and FastAPI to build APIs for NLP models.