NLP Essentials: Removing Stopwords and Performing Text Normalization using NLTK and spaCy in Python

Overview Learn how to remove stopwords and perform text normalization in Python – an essential Natural Language Processing (NLP) read We will explore the different methods to remove stopwords as well as talk about text normalization techniques like stemming and lemmatization Put your theory into practice by performing stopwords removal and text normalization in Python using the popular NLTK, spaCy and Gensim libraries   Introduction Don’t you love how wonderfully diverse Natural Language Processing (NLP) is? Things we never imagined […]

Read more

An Exhaustive Guide to Detecting and Fighting Neural Fake News using NLP

Overview Neural fake news (fake news generated by AI) can be a huge issue for our society This article discusses different Natural Language Processing methods to develop robust defense against Neural Fake News, including using the GPT-2 detector model and Grover (AllenNLP) Every data science professional should be aware of what neural fake news is and how to combat it   Introduction Fake news is a major concern in our society right now. It has gone hand-in-hand with the rise […]

Read more

What is Tokenization in NLP? Here’s All You Need To Know

Highlights Tokenization is a key (and mandatory) aspect of working with text data We’ll discuss the various nuances of tokenization, including how to handle Out-of-Vocabulary words (OOV)   Introduction Language is a thing of beauty. But mastering a new language from scratch is quite a daunting prospect. If you’ve ever picked up a language that wasn’t your mother tongue, you’ll relate to this! There are so many layers to peel off and syntaxes to consider – it’s quite a challenge. […]

Read more

A Comprehensive Step-by-Step Guide to Become an Industry-Ready Data Science Professional

Introduction to Artificial Intelligence and Machine Learning Artificial Intelligence (AI) and its sub-field Machine Learning (ML) have taken the world by storm. From face recognition cameras, smart personal assistants to self-driven cars. We are moving towards a world enhanced by these recent upcoming technologies. It’s the most exciting time to be in this career field! The global Artificial Intelligence market is expected to grow to $400 billion by the year 2025. From Startups to big organizations, all want to join […]

Read more

PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input, such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with considerable noise, such as User-Generated Contents (UGC) on the Internet… To make better use of NMT for cross-cultural communication, one of the most promising directions is to develop a model that correctly handles these expressions. Though its importance has been recognized, it is still […]

Read more

Handwriting Classification for the Analysis of Art-Historical Documents

Digitized archives contain and preserve the knowledge of generations of scholars in millions of documents. The size of these archives calls for automatic analysis since a manual analysis by specialists is often too expensive… In this paper, we focus on the analysis of handwriting in scanned documents from the art-historic archive of the WPI. Since the archive consists of documents written in several languages and lacks annotated training data for the creation of recognition models, we propose the task of […]

Read more

Pixel-wise Dense Detector for Image Inpainting

Recent GAN-based image inpainting approaches adopt an average strategy to discriminate the generated image and output a scalar, which inevitably lose the position information of visual artifacts. Moreover, the adversarial loss and reconstruction loss (e.g., l1 loss) are combined with tradeoff weights, which are also difficult to tune… In this paper, we propose a novel detection-based generative framework for image inpainting, which adopts the min-max strategy in an adversarial process. The generator follows an encoder-decoder architecture to fill the missing […]

Read more

Lightweight Model For The Prediction of COVID-19 Through The Detection And Segmentation of Lesions in Chest CT Scans

We introduce a lightweight Mask R-CNN model that segments areas with the Ground Glass Opacity and Consolidation in chest CT scans. The model uses truncated ResNet18 and ResNet34 nets with a single layer of Feature Pyramid Network as a backbone net, thus substantially reducing the number of the parameters and the training time compared to similar solutions using deeper networks… Without any data balancing and manipulations, and using only a small fraction of the training data, COVID-CT-Mask-Net classification model with […]

Read more

BGGAN: Bokeh-Glass Generative Adversarial Network for Rendering Realistic Bokeh

A photo captured with bokeh effect often means objects in focus are sharp while the out-of-focus areas are all blurred. DSLR can easily render this kind of effect naturally… However, due to the limitation of sensors, smartphones cannot capture images with depth-of-field effects directly. In this paper, we propose a novel generator called Glass-Net, which generates bokeh images not relying on complex hardware. Meanwhile, the GAN-based method and perceptual loss are combined for rendering a realistic bokeh effect in the […]

Read more

Stochastic Hard Thresholding Algorithms for AUC Maximization

In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization… We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (texttt{SHT-AUC}) is developed. To our best knowledge, this is the first attempt to provide stochastic hard thresholding algorithms for AUC maximization with a per-iteration cost […]

Read more
1 733 734 735 736 737 911