Articles About Machine Learning

The Attention Mechanism from Scratch

The attention mechanism was introduced to improve the performance of the encoder-decoder model for machine translation. The idea behind the attention mechanism was to permit the decoder to utilize the most relevant parts of the input sequence in a flexible manner, by a weighted combination of all the encoded input vectors, with the most relevant vectors being attributed the highest weights. In this tutorial, you will discover the attention mechanism and its implementation. After completing this tutorial, you will know: […]

January 19, 2024 Machine Learning

A Tour of Attention-Based Architectures

As the popularity of attention in machine learning grows, so does the list of neural architectures that incorporate an attention mechanism. In this tutorial, you will discover the salient neural architectures that have been used in conjunction with attention. After completing this tutorial, you will better understand how the attention mechanism is incorporated into different neural architectures and for which purpose. Kick-start your project with my book Building Transformer Models with Attention. It provides self-study tutorials with working code to […]

January 19, 2024 Machine Learning

Adding a Custom Attention Layer to a Recurrent Neural Network in Keras

Deep learning networks have gained immense popularity in the past few years. The “attention mechanism” is integrated with deep learning networks to improve their performance. Adding an attention component to the network has shown significant improvement in tasks such as machine translation, image recognition, text summarization, and similar applications. This tutorial shows how to add a custom attention layer to a network built using a recurrent neural network. We’ll illustrate an end-to-end application of time series forecasting using a very […]

January 19, 2024 Machine Learning

The Bahdanau Attention Mechanism

Conventional encoder-decoder architectures for machine translation encoded every source sentence into a fixed-length vector, regardless of its length, from which the decoder would then generate a translation. This made it difficult for the neural network to cope with long sentences, essentially resulting in a performance bottleneck. The Bahdanau attention was proposed to address the performance bottleneck of conventional encoder-decoder architectures, achieving significant improvements over the conventional approach. In this tutorial, you will discover the Bahdanau attention mechanism for neural machine […]

January 19, 2024 Machine Learning

The Luong Attention Mechanism

The Luong attention sought to introduce several improvements over the Bahdanau model for neural machine translation, notably by introducing two new classes of attentional mechanisms: a global approach that attends to all source words and a local approach that only attends to a selected subset of words in predicting the target sentence. In this tutorial, you will discover the Luong attention mechanism for neural machine translation. After completing this tutorial, you will know: The operations performed by the Luong attention […]

January 19, 2024 Machine Learning

An Introduction to Recurrent Neural Networks and the Math That Powers Them

When it comes to sequential or time series data, traditional feedforward networks cannot be used for learning and prediction. A mechanism is required to retain past or historical information to forecast future values. Recurrent neural networks, or RNNs for short, are a variant of the conventional feedforward artificial neural networks that can deal with sequential data and can be trained to hold knowledge about the past. After completing this tutorial, you will know: Recurrent neural networks What is meant by […]

January 19, 2024 Machine Learning

Understanding Simple Recurrent Neural Networks in Keras

This tutorial is designed for anyone looking for an understanding of how recurrent neural networks (RNN) work and how to use them via the Keras deep learning library. While the Keras library provides all the methods required for solving problems and building applications, it is also important to gain an insight into how everything works. In this article, the computations taking place in the RNN model are shown step by step. Next, a complete end-to-end system for time series prediction […]

January 19, 2024 Machine Learning

The Transformer Attention Mechanism

Before the introduction of the Transformer model, the use of attention for neural machine translation was implemented by RNN-based encoder-decoder architectures. The Transformer model revolutionized the implementation of attention by dispensing with recurrence and convolutions and, alternatively, relying solely on a self-attention mechanism. We will first focus on the Transformer attention mechanism in this tutorial and subsequently review the Transformer model in a separate one. In this tutorial, you will discover the Transformer attention mechanism for neural machine translation. After […]

January 19, 2024 Machine Learning

The Transformer Model

We have already familiarized ourselves with the concept of self-attention as implemented by the Transformer attention mechanism for neural machine translation. We will now be shifting our focus to the details of the Transformer architecture itself to discover how self-attention can be implemented without relying on the use of recurrence and convolutions. In this tutorial, you will discover the network architecture of the Transformer model. After completing this tutorial, you will know: How the Transformer architecture implements an encoder-decoder structure […]

January 19, 2024 Machine Learning

A Gentle Introduction to Positional Encoding in Transformer Models, Part 1

In languages, the order of the words and their position in a sentence really matters. The meaning of the entire sentence can change if the words are re-ordered. When implementing NLP solutions, recurrent neural networks have an inbuilt mechanism that deals with the order of sequences. The transformer model, however, does not use recurrence or convolution and treats each data point as independent of the other. Hence, positional information is added to the model explicitly to retain the information regarding […]

« 1 … 27 28 29 30 31 … 226 »