Getting started with data preparation
spaCy out-of-the-box models are very successful for general NLP purposes but sometimes we have to work on very specific domains that require custom training.
Training models requires time and effort. Before even starting the training process, you should decide whether the training is necessary. To determine whether you really need custom training, a good starting point is to ask yourself the following questions:
- Do spaCy models perform well enough on your data?
- Does your domain include many labels that are absent in spaCy models?
- Is there a pre-trained model/application in Hugging Face Hub or elsewhere already? (We wouldn’t want to reinvent the wheel.)
Let’s discuss these two first questions in detail in the following sections.
Do spaCy models perform well enough on your data?
In general, if the model performs well enough (generally, something above 0.75 accuracy), then you can customize the model output...