BARTScore: Evaluating Generated Text as Text Generation
BARTScore Evaluating Generated Text as Text Generation. Background There is a recent trend that leverages neural models for automated evaluation in different ways, as shown in Fig.1. (a) Evaluation as matching task. Unsupervised matching metrics aim to measure the semantic equivalence between the reference and hypothesis by using a token-level matching functions in distributed representation space (e.g. BERT) or discrete string space (e.g. ROUGE). (b) Evaluation as regression task. Regression-based metrics (e.g. BLEURT) introduce a parameterized regression layer, which would […]
Read more