A sentence embeddings method that provides semantic representations

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language inference data and generalizes well to many different tasks. We provide our pre-trained English sentence encoder from our paper and our SentEval evaluation toolkit. Recent changes: Removed train_nli.py and only kept pretrained models for simplicity. Reason is I do not have time anymore to maintain the repo beyond simple scripts to get sentence embeddings. Dependencies This code is written in […]

Read more

Pytorch implementation of Google AI’s 2018 BERT with simple annotation

BERT-pytorch Pytorch implementation of Google AI’s 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper URL : https://arxiv.org/abs/1810.04805 Google AI’s BERT paper shows the amazing result on various NLP task (new 17 NLP tasks SOTA),including outperform the human F1 score on SQuAD v1.1 QA task.This paper proved that Transformer(self-attention) based encoder can be powerfully used asalternative of previous language model with proper language model training method.And more importantly, they showed us that […]

Read more

The multitask and transfer learning toolkit for natural language processing research

The multitask and transfer learning toolkit for natural language processing research. Why should I use jiant? A few additional things you might want to know about jiant: jiant is configuration file driven jiant is built with PyTorch jiant integrates with datasets to manage task data jiant integrates with transformers to manage models and tokenizers. Getting Started Installation To import jiant from source (recommended for researchers): git clone https://github.com/nyu-mll/jiant.git cd jiant pip install -r requirements.txt # Add the following to your […]

Read more

A library for Multilingual Unsupervised or Supervised word Embeddings

MUSE: Multilingual Unsupervised and Supervised Embeddings A library for Multilingual Unsupervised or Supervised word Embeddings. MUSE is a Python library for multilingual word embeddings, whose goal is to provide the community with: state-of-the-art multilingual word embeddings (fastText embeddings aligned in a common space) large-scale high-quality bilingual dictionaries for training and evaluation We include two methods, one supervised that uses a bilingual dictionary or identical character strings, and one unsupervised that does not use any parallel data (see Word Translation without […]

Read more

A modular framework for vision & language multimodal research

MMF MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-the-art vision and language models and has powered multiple research projects at Facebook AI Research. See full list of project inside or built on MMF here. MMF is powered by PyTorch, allows distributed training and is un-opinionated, scalable and fast. Use MMF to bootstrap for your next vision and language multimodal research project by following the installation instructions. Take […]

Read more

Sequence to Sequence Framework in PyTorch

nmtpytorch Sequence to Sequence Framework in PyTorch This project is not actively maintained so issues created are unlikely to be addressed in a timely way. If you are interested, there’s a recent fork of this repository called pysimt which includes Transformer-based architectures as well. nmtpytorch allows training of various end-to-end neural architectures includingbut not limited to neural machine translation, image captioning and automaticspeech recognition systems. The initial codebase was in Theano and wasinspired from the famous dl4mt-tutorialcodebase. nmtpytorch received valuable […]

Read more

An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (training and validation/test set) from all sound files (.wav, .aiff, .mp3) in a directory Efficient multithreaded data loading Logging to TensorBoard (Training loss, validation loss, validation accuracy, parameter and gradient histograms, generated samples) Fast generation, as introduced here Requirements python 3 pytorch 0.3 numpy […]

Read more

Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Data I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code. File description hyperparams.py includes all hyper parameters that are needed. data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory. module.py contains all methods, including […]

Read more

A deep learning nlp library inspired by the fast.ai library

Quick NLP Quick NLP is a deep learning nlp library inspired by the fast.ai library It follows the same api as fastai and extends it allowing for quick and easy running of nlp models Features Python 3.6 code Tight-knit integration with Fast.ai library: Fast.ai style DataLoader objects for sentence to sentence algorithms Fast.ai style DataLoader objects for dialogue algorithms Fast.ai style DataModel objects for training nlp models Can run a seq2seq model with a few lines of code similar to […]

Read more

Neural speaker diarization with pyannote-audio

Neural speaker diarization with pyannote-audio Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines: pyannote.audio also comes with pretrained models covering a wide range of domains for voice activity detection, speaker change detection, […]

Read more
1 514 515 516 517 518 913