Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch
KoSimCSE
Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch
Installation
git clone https://github.com/BM-K/KoSimCSE.git
cd KoSimCSE
git clone https://github.com/SKTBrain/KoBERT.git
cd KoBERT
pip install -r requirements.txt
pip install .
cd ..
pip install -r requirements.txt
Training – only supervised
bash run_example.sh
Pre-Trained Models
- Using BERT [CLS] token representation
- Pre-Trained model check point
Performance
Model | Cosine Pearson | Cosine Spearman | Euclidean Pearson | Euclidean Spearman | Manhattan Pearson | Manhattan Spearman | Dot Pearson | Dot Spearman |
---|---|---|---|---|---|---|---|---|
KoSBERT_SKT* | 78.81 | 78.47 | 77.68 | 77.78 | 77.71 | 77.83 | 75.75 | 75.22 |
KoSimCSE_SKT | 81.55 | 82.11 | 81.70 | 81.69 | 81.65 | 81.60 | 78.19 | 77.18 |
Example Downstream Task
Semantic Search
python SemanticSearch.py
import numpy as np
from model.utils import pytorch_cos_sim
from data.dataloader import convert_to_tensor, example_model_setting
def main():
model_ckpt = './output/nli_checkpoint.pt'
model, transform, device = example_model_setting(model_ckpt)