pySBD: Hidden Gem for Sentence Boundary Detection
Although it may seem simple, human language is noisy and complex. Only up to a certain point does dividing text into sentences based only on punctuation make sense. The best thing about pySBD is that it can handle a wide range of edge cases, including abbreviations, decimal numbers, and other challenging situations that are frequently seen in corpora from the legal, financial, and biomedical fields. PySBD recognises sentence boundaries using a rule-based method, in contrast to the majority of other libraries that use neural networks