6 Exciting Open Source Data Science Projects you Should Start Working on Today

Overview Here are six open-source data science projects to enhance your skillset These projects cover a diverse set of domains, from computer vision to natural language processing (NLP), among others Pick your favorite open-source data science project(s) and get coding!   Introduction I recently helped out in a round of interviews for an open data scientist position. As you can imagine, there were candidates from all kinds of backgrounds – software engineering, learning and development, finance, marketing, etc. What stood […]

Read more

spaCy Tutorial to Learn and Master Natural Language Processing (NLP)

Introduction spaCy is my go-to library for Natural Language Processing (NLP) tasks. I’d venture to say that’s the case for the majority of NLP experts out there! Among the plethora of NLP libraries these days, spaCy really does stand out on its own. If you’ve used spaCy for NLP, you’ll know exactly what I’m talking about. And if you’re new to the power of spaCy, you’re about to be enthralled by how multi-functional and flexible this library is. The factors […]

Read more

MobileBERT: BERT for Resource-Limited Devices

For a second, let’s focus solely on the teacher. If we continuing the path past the MHA-block, things remain the same compared to a vanilla transformer block until we reach the second “Add & Norm” operation. After this layer, we have a bottleneck transform, this time to reduce the dimension back to that of the input. This allows us to perform another Add & Norm operation with the transformer block input before feeding the result onto the next block. Stacked […]

Read more

NLP Applications in Support Call Centers

This article was published as a part of the Data Science Blogathon. Introduction This article is in continuation of my previous article on using Machine learning in Support environments. I shared my views on, how using simple python code we can enrich our call centers/support division activities in our own organization or customer organization. In that article, I shared an insight into what and how we can make a difference to the current environment using ML in giving better service […]

Read more

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple aspects that make up speaker identity… In this work, utilizing speaker age as an auxiliary variable in US Supreme Court recordings and speaker nationality with VoxCeleb, we show that by leveraging additional speaker attribute information in a multi task learning setting, deep speaker embedding performance can be increased for […]

Read more

Memory Optimization for Deep Networks

Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x… This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the […]

Read more

Succinct and Robust Multi-Agent Communication With Temporal Message Control

Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). However, existing communication schemes often require agents to exchange an excessive number of messages at run-time under a reliable communication channel, which hinders its practicality in many real-world situations… In this paper, we present textit{Temporal Message Control} (TMC), a simple yet effective approach for achieving succinct and robust communication in MARL. TMC applies a temporal smoothing technique to drastically reduce […]

Read more

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties… In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of […]

Read more

Wearing a MASK: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels

High dimensionality poses many challenges to the use of data, from visualization and interpretation, to prediction and storage for historical preservation. Techniques abound to reduce the dimensionality of fixed-length sequences, yet these methods rarely generalize to variable-length sequences… To address this gap, we extend existing methods that rely on the use of kernels to variable-length sequences via use of the Recurrent Neural Tangent Kernel (RNTK). Since a deep neural network with ReLu activation is a Max-Affine Spline Operator (MASO), we […]

Read more

Acoustic echo cancellation with the dual-signal transformation LSTM network

This paper applies the dual-signal transformation LSTM network (DTLN) to the task of real-time acoustic echo cancellation (AEC). The DTLN combines a short-time Fourier transformation and a learned feature representation in a stacked network approach, which enables robust information processing in the time-frequency and in the time domain, which also includes phase information… The model is only trained on 60~h of real and synthetic echo scenarios. The training setup includes multi-lingual speech, data augmentation, additional noise and reverberation to create […]

Read more
1 745 746 747 748 749 911