Self-Damaging Contrastive Learning with python

SDCLR The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervised training on real-world data applications. However, unlabeled data in reality is commonly imbalanced and shows a long-tail distribution, and it is unclear how robustly the latest contrastive learning methods could perform in the practical scenario. This paper proposes to explicitly tackle this challenge, via a principled framework called Self-Damaging Contrastive Learning (SDCLR), to automatically balance the representation learning without knowing the classes. Our main inspiration is […]

Read more

A novel attention-based architecture for vision-and-language navigation

Episodic Transformers (E.T.) Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. This code reproduces the results obtained with E.T. on ALFRED benchmark. To learn more about the benchmark and the original code, please refer to ALFRED repository. Quickstart Clone repo: $ git clone https://github.com/alexpashevich/E.T..git ET $ export ET_ROOT=$(pwd)/ET $ export ET_LOGS=$ET_ROOT/logs $ export ET_DATA=$ET_ROOT/data $ export […]

Read more

Self-Supervised Learning for Sketch and Handwriting

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song, “Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. Abstract Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from unlabelled data that achieve excellent performance on many challenging downstream tasks. However, supervision-free pre-text tasks are challenging to design and usually […]

Read more

LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling. More information: https://research.csiro.au/robotics/locus-pr/ Paper Pre-print: https://arxiv.org/abs/2011.14497 Method overview. Locus is a global descriptor for large-scale place recognition using sequential 3D LiDAR point clouds. It encodes topological relationships and temporal consistency of scene components to obtain a discriminative and view-point invariant scene representation. Usage Set up environment This project has been tested on Ubuntu 18.04 (with Open3D 0.11, tensorflow 1.8.0, pcl […]

Read more

Towards Part-Based Understanding of RGB-D Scans

part-based-scan-understanding Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object. Download Paper (.pdf) Demo samples Get started The core of this repository is a network, which takes as input preprocessed scan voxel crops and produces voxelized part […]

Read more

Deep Networks from the Principle of Rate Reduction

redunet_paper Deep Networks from the Principle of Rate ReductionThis repository is the official NumPy implementation of the paper Deep Networks from the Principle of Rate Reduction (2021) by Kwan Ho Ryan Chan* (UC Berkeley), Yaodong Yu* (UC Berkeley), Chong You* (UC Berkeley), Haozhi Qi (UC Berkeley), John Wright (Columbia), and Yi Ma (UC Berkeley). For PyTorch version of ReduNet, please visit https://github.com/ryanchankh/redunet. What is ReduNet? ReduNet is a deep neural network construcuted naturally by deriving the gradients of the Maximal […]

Read more

A GAN implemented with the Perceptual Simplicity and Spatial Constriction constraints

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction constraints) introduced in the paper Where and What? Examining Interpretable Disentangled Representations. The code for computing the TPL for model checkpoints from disentanglemen_lib can be found in this repository. Abstract Capturing interpretable variations has long been one of the goals indisentanglement learning. However, unlike the independence assumption,interpretability has rarely been exploited to encourage disentanglementin the unsupervised setting. […]

Read more

Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

ABINet Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition The official code of ABINet (CVPR 2021, Oral). ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy. Runtime Environment We provide a pre-built docker image using the Dockerfile from docker/Dockerfile Running in Docker $ [email protected]:FangShancheng/ABINet.git $ […]

Read more

Prioritized Architecture Sampling with Monto-Carlo Tree Search

NAS-Bench-Macro This repository includes the benchmark and code for NAS-Bench-Macro in paper “Prioritized Architecture Sampling with Monto-Carlo Tree Search”, CVPR2021. NAS-Bench-Macro is a NAS benchmark on macro search space. The NAS-Bench-Macro consists of 6561 networks and their test accuracies, parameters, and FLOPs on CIFAR-10 dataset. Each architecture in NAS-Bench-Macro is trained from scratch isolatedly. Benchmark All the evaluated architectures are stored in file nas-bench-macro_cifar10.json with the following format: { arch1: { test_acc: [float, float, float], // the test accuracies of […]

Read more

Polygonal Building Segmentation by Frame Field Learning

Polygonization-by-Frame-Field-Learning This repository contains the code for our fast polygonal building extraction from overhead images pipeline. We add a frame field output to an image segmentation neural network to improve segmentation quality and provide structural information for the subsequent polygonization step. Figure 1: Close-up of our additional frame field output on a test image. Figure 2: Given an overhead image, the model outputs an edge mask, an interior mask,and a frame field for buildings. The total loss includes terms that […]

Read more
1 590 591 592 593 594 912