Summary
In this chapter, we explored how to train spaCy NER components with our own domain and data. First, we learned the key points of deciding whether we really need custom model training. Then, we went through an essential part of model training – data collection and labeling.
We learned about two annotation tools – Prodigy and nertk – and learned how to convert the data for training and how to train the component using spaCy’s CLI. Then, we used spaCy CLI commands to train the component and create a Python package for the pipeline.
Finally, we learned how to combine different NER components into a single pipeline. In the next chapter, we will learn how to manage and share end-to-end workflows for different use cases and domains using spaCy and Weasel.