(De)ToxiGen: Leveraging large language models to build more robust hate speech detection tools

An abstract image in pastel colors showing a vortex of vectors.

It’s a well-known challenge that large language models (LLMs)—growing in popularity thanks to their adaptability across a variety of applications—carry risks. Because they’re trained on large amounts of data from across the internet, they’re capable of generating inappropriate and harmful language based on similar language encountered during training.  

Content moderation tools can be deployed to flag or filter such language in some contexts, but unfortunately, datasets available to train these tools often fail to capture the

 

 

To finish reading, please visit source site

Leave a Reply