You're reading from Mastering spaCy Build structured NLP solutions with custom components and models powered by spacy-llm

Product type Paperback

Published in Feb 2025

Publisher Packt

ISBN-13 9781835880463

Length 238 pages

Edition 2nd Edition

Languages

Python

Tools

Spacy

Concepts

GPT/LLMs

Authors (2):

Déborah Mesquita

Duygu Altınok

View More author details

Table of Contents (17) Chapters

Preface

1. Part 1: Getting Started with spaCy

2. Chapter 1: Getting Started with spaCy FREE CHAPTER

3. Chapter 2: Core Operations with spaCy

4. Part 2: Advanced Linguistic and Semantic Analysis

5. Chapter 3: Extracting Linguistic Features

6. Chapter 4: Mastering Rule-Based Matching

7. Chapter 5: Extracting Semantic Representations with spaCy Pipelines

8. Chapter 6: Utilizing spaCy with Transformers

9. Part 3: Customizing and Integrating NLP Workflows

10. Chapter 7: Enhancing NLP Tasks Using LLMs with spacy-llm

11. Chapter 8: Training an NER Component with Your Own Data

12. Chapter 9: Creating End-to-End spaCy Workflows with Weasel

13. Chapter 10: Training an Entity Linker Model with spaCy

14. Chapter 11: Integrating spaCy with Third-Party Libraries

15. Index

Why subscribe?

16. Other Books You May Enjoy

Token-based matching

Some NLU tasks can be solved without the help of any statistical model. One of those ways is a regex, which we use to match a predefined set of patterns to our text.

A regex is a sequence of characters that specifies a search pattern. A regex describes a set of strings that follows the specified pattern. These patterns can include letters, digits, and characters with special meanings, such as ?, ., and *. Python’s built-in library, re, provides great support to define and match regexes.

What does a regex look like, then? The following regex matches the following strings:

"Barack Obama"
"Barack Obama"
"Barack Hussein Obama"
reg = r"Barack\s(Hussein\s)?Obama"

This pattern can be read as follows: the string Barack can be followed optionally by the string Hussein (the ? character in a regex means optional; that is, 0 or 1 occurrence) and should be followed by the string Obama. The inter-word spaces can be a...