Articles About Machine Learning

Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

ABINet Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral). ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy. Runtime Environment We provide a pre-built docker image using the Dockerfile from docker/Dockerfile Running in Docker $ [email protected]:FangShancheng/ABINet.git $ […]

Read more

Simulate the notspot quadrupedal robot using Gazebo and ROS with python

Notspot robot simulation – Python version This repository contains all the files and code needed to simulate the notspot quadrupedal robot using Gazebo and ROS. The software runs on ROS noetic and Ubuntu 20.04. If you want to use a different ROS version, you might have to do some changes to the source code. Setup cd src && catkin_init_workspace cd .. && catkin_make source devel/setup.bash roscd notspot_controller/scripts && chmod +x robot_controller_gazebo.py cp -r RoboticsUtilities ~/.local/lib/python3.8/site-packages roscd notspot_joystick/scripts && chmod +x […]

Read more

Fly DCS without a joystick for python

DCSNoJoy Fly DCS without a joystick for python. Usage Delete all mouse view axis Install DCSEasyControlExports to your “Saved Games/DCS/” Path python DCSEasyControl/main.py Set DCS to F12 view. Implement Details The reference and cooridnate system for DCS api please see this doc. TODO Parameter system for different aircrafts. GitHub https://github.com/xuhao1/DCSNoJoy    

Read more

Orientation independent Möbius CNNs

MobiusCNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks. Background (tl;dr) All derivations and a detailed description of the models are found in Section 5 of our paper. What follows is an informal tl;dr, summarizing the central aspects of Möbius CNNs. Feature fields on the Möbius strip: A key characteristic of the Möbius strip is its topological twist, making it a non-orientable manifold. Convolutional weight sharing on the […]

Read more

Keep CALM and Improve Visual Feature Attribution

calm Keep CALM and Improve Visual Feature Attribution Abstract The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks. Its simplicity and effectiveness have led to wide applications in the explanation of visual predictions and weakly-supervised localization tasks. However, CAM has its own shortcomings. The computation of attribution maps relies on ad-hoc calibration steps that are not part of the training computational graph, making it difficult for us to understand the real […]

Read more

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN We present Space-Time Correspondence Networks (STCN) as the new, effective, and efficient framework to model space-time correspondences in the context of video object segmentation. STCN achieves SOTA results on multiple benchmarks while running fast at 20+ FPS without bells and whistles. Its speed is even higher with mixed precision. Despite its effectiveness, the network itself is very simple with lots of room for improvement. See the paper for technical details. A Gentle Introduction There are two main contributions: STCN […]

Read more

Neural Scene Flow Fields using pytorch-lightning with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation of NSFF results and discussions with the authors. For discussion details, please see the issues of the original repo. The code is based on my previous implementation. The main modifications are the followings: Remove the blending weight in static NeRF. I adopt the addition strategy in NeRF-W. Compose static dynamic also in image warping. Implementation details are in models/rendering.py. These […]

Read more

Optimal Model Design for Reinforcement Learning

omd JAX code for the paper “Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation” Summary Model based reinforcement learning typically trains the dynamics and reward functions by minimizing the error of predictions.The error is only a proxy to maximizing the sum of rewards, the ultimate goal of the agent, leading to the objective mismatch.We propose an end-to-end algorithm called Optimal Model Design (OMD) that optimizes the returns directly for model learning.OMD leverages the implicit function theorem to optimize the model parameters […]

Read more

Multimodal Neural Script Knowledge Models

merlot MERLOT is a model for learning what we are calling “neural script knowledge” — representations about what is going on in videos, spanning multiple video frames with associated captions. What’s here We are releasing the following: Code for the MERLOT model (in model/, with data processing in data/ Code for running MERLOT over visual story ordering. We plan to release: Information about the videos used in this work Code for adapting the model to other tasks (not strictly needed, […]

Read more

Multi-Speaker Adaptive Text-to-Speech Generation with python

StyleSpeech PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. [x] StyleSpeech (naive branch) [x] Meta-StyleSpeech (main branch) Dependencies You can install the Python dependencies with pip3 install -r requirements.txt Inference You have to download pretrained models and put them in output/ckpt/LibriTTS/. For English single-speaker TTS, run python3 synthesize.py –text “YOUR_DESIRED_TEXT” –ref_audio path/to/reference_audio.wav –restore_step 200000 –mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml The generated utterances will be put in output/result/. Your synthesized speech will have ref_audio‘s style. Batch Inference […]

Read more
1 54 55 56 57 58 226