June 15, 2021 Machine Learning

Multimodal Neural Script Knowledge Models

merlot

MERLOT is a model for learning what we are calling “neural script knowledge” — representations about what is going on in videos, spanning multiple video frames with associated captions.

What’s here

We are releasing the following:

Code for the MERLOT model (in model/, with data processing in data/
Code for running MERLOT over visual story ordering.

We plan to release:

Information about the videos used in this work
Code for adapting the model to other tasks (not strictly needed, but just to make things easier)

This is somewhat ongoing — we hope to make it somewhat easier to adapt MERLOT to other tasks, please follow if interested!

Enviroment and setup

There are two different ways of running MERLOT right now

Pretraining

To finish reading, please visit source site

Categories
Categories

Search for:

Recent Posts

Getting Started With Python IDLE

Research Focus: Week of April 21, 2025

MySQL Databases and Python

Quiz: Shallow vs Deep Copying of Python Objects

Shallow vs Deep Copying of Python Objects

Tags
Attention blogathon Calculus Command-line Tools Data Preparation data science data visualization Deep Learning Deep Learning for Computer Vision Deep Learning for Natural Language Processing Deep Learning for Time Series Deep Learning Performance Deep Learning with PyTorch Ensemble Learning Generative Adversarial Networks Imbalanced Classification Linear Algebra Long Short-Term Memory Networks machine learning Machine Learning Algorithms Machine Learning Process Machine Learning Resources machine translation Matplotlib Natural language processing Natural Language Processing & Speech Neural MT nlp NMT opencv Optimization pandas Probability python Python for Machine Learning Python Machine Learning Resources R Machine Learning scikit-learn sentiment analysis Start Machine Learning Statistics Time Series Weka Machine Learning XGBoost

Categories
Categories

Archives
Archives

Powered by WordPress and Rubine.