Solving NER with BERT for any entity type with very little training data (compared to past approaches)

Ajit Rajasekharan
4 min readMay 20, 2019

One of the roadblocks to entity recognition for any entity type other than person, location, organization, disease, gene, drugs, and species is the absence of labeled training data.

BERT offers a solution that works in practice for entity recognition of a custom type with very little labeled data — sometimes even about 300 examples of labeled data may suffice to get a first cut working solution.

This is possible because we can leverage off the model’s unsupervised learning on a large corpus to then fine tune the model to recognize a specific entity type with very little labeled data.

Here are the sequence of steps to perform entity recognition with BERT

  • Labeled data acquisition/preparation. For entity types like location, person, disease etc. we can leverage off existing labeled data sets. However for a custom entity type, we can in some instances, get reasonable results with as little as 300 labeled examples. One advantage of using BERT model is that we can train it not just on sentences containing the entity of interest, but also train on instances of the entities themselves. For instance, for a person tagger, we can train the model on just the names of persons alone in addition to their mention in sentences. Since BERT composes words using subwords, we can leverage off the learning from single entity mentions to generalize to other entity instance that share…

--

--