Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences

Domain adaptation or transfer learning using pre-trained language models such as BERT has proven to be an effective approach for many natural language processing tasks. In this work, we propose to formulate word sense disambiguation as a relevance ranking task, and fine-tune BERT on sequence-pair ranking task to select the most probable sense definition given a context sentence and a list of candidate sense definitions… We also introduce a data augmentation technique for WSD using existing example sentences from WordNet. […]