A Gentle Introduction to Batch Normalization for Deep Neural Networks

Last Updated on December 4, 2019 Training deep neural networks with tens of layers is challenging as they can be sensitive to the initial random weights and configuration of the learning algorithm. One possible reason for this difficulty is the distribution of the inputs to layers deep in the network may change after each mini-batch when the weights are updated. This can cause the learning algorithm to forever chase a moving target. This change in the distribution of inputs to […]

September 29, 2020 Machine Learning

How to Accelerate Learning of Deep Neural Networks With Batch Normalization

Last Updated on August 25, 2020 Batch normalization is a technique designed to automatically standardize the inputs to a layer in a deep learning neural network. Once implemented, batch normalization has the effect of dramatically accelerating the training process of a neural network, and in some cases improves the performance of the model via a modest regularization effect. In this tutorial, you will discover how to use batch normalization to accelerate the training of deep learning neural networks in Python […]

September 29, 2020 Machine Learning

How to Control the Stability of Training Neural Networks With the Batch Size

Last Updated on August 28, 2020 Neural networks are trained using gradient descent where the estimate of the error used to update the weights is calculated based on a subset of the training dataset. The number of examples from the training dataset used in the estimate of the error gradient is called the batch size and is an important hyperparameter that influences the dynamics of the learning algorithm. It is important to explore the dynamics of your model to ensure […]

September 29, 2020 Machine Learning

How to Configure the Learning Rate When Training Deep Learning Neural Networks

Last Updated on August 6, 2019 The weights of a neural network cannot be calculated using an analytical method. Instead, the weights must be discovered via an empirical optimization procedure called stochastic gradient descent. The optimization problem addressed by stochastic gradient descent for neural networks is challenging and the space of solutions (sets of weights) may be comprised of many good solutions (called global optima) as well as easy to find, but low in skill solutions (called local optima). The […]

September 29, 2020 Machine Learning

Understand the Impact of Learning Rate on Neural Network Performance

Last Updated on September 12, 2020 Deep learning neural networks are trained using the stochastic gradient descent optimization algorithm. The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging as a value too small may result in a long training process that could get stuck, whereas a value too large may result in learning a sub-optimal set […]

September 29, 2020 Machine Learning

Loss and Loss Functions for Training Deep Learning Neural Networks

Last Updated on October 23, 2019 Neural networks are trained using stochastic gradient descent and require that you choose a loss function when designing and configuring your model. There are many loss functions to choose from and it can be challenging to know what to choose, or even what a loss function is and the role it plays when training a neural network. In this post, you will discover the role of loss and loss functions in training deep learning […]

September 29, 2020 Machine Learning

How to Choose Loss Functions When Training Deep Learning Neural Networks

Last Updated on August 25, 2020 Deep learning neural networks are trained using the stochastic gradient descent optimization algorithm. As part of the optimization algorithm, the error for the current state of the model must be estimated repeatedly. This requires the choice of an error function, conventionally called a loss function, that can be used to estimate the loss of the model so that the weights can be updated to reduce the loss on the next evaluation. Neural network models […]

September 29, 2020 Machine Learning

How to Use Greedy Layer-Wise Pretraining in Deep Learning Neural Networks

Last Updated on August 25, 2020 Training deep neural networks was traditionally challenging as the vanishing gradient meant that weights in layers close to the input layer were not updated in response to errors calculated on the training dataset. An innovation and important milestone in the field of deep learning was greedy layer-wise pretraining that allowed very deep neural networks to be successfully trained, achieving then state-of-the-art performance. In this tutorial, you will discover greedy layer-wise pretraining as a technique […]

September 29, 2020 Machine Learning

How to use Data Scaling Improve Deep Learning Model Stability and Performance

Last Updated on August 25, 2020 Deep learning neural networks learn how to map inputs to outputs from examples in a training dataset. The weights of the model are initialized to small random values and updated via an optimization algorithm in response to estimates of error on the training dataset. Given the use of small weights in the model and the use of error between predictions and expected values, the scale of inputs and outputs used to train the model […]

September 29, 2020 Machine Learning

How to Avoid Exploding Gradients With Gradient Clipping

Last Updated on August 28, 2020 Training a neural network can become unstable given the choice of error function, learning rate, or even the scale of the target variable. Large updates to weights during training can cause a numerical overflow or underflow often referred to as “exploding gradients.” The problem of exploding gradients is more common with recurrent neural networks, such as LSTMs given the accumulation of gradients unrolled over hundreds of input time steps. A common and relatively easy […]

« 1 2 3 4 5 6 »