[ Neural Networks for Machine Learning by Geoffrey Hinton ]

Video Lectures Help Center
Having trouble viewing lectures? Try changing players. Your current player format is html5. Change to flash.
 CompletedWhy do we need machine learning? [13 min]
Slides for Why do we need machine learning? [13 min]Slides (pdf) for Why do we need machine learning? [13 min]Subtitles (text) for Why do we need machine learning? [13 min]Subtitles (srt) for Why do we need machine learning? [13 min]Video (MP4) for Why do we need machine learning? [13 min]
 CompletedWhat are neural networks? [8 min]
Subtitles (text) for What are neural networks? [8 min]Subtitles (srt) for What are neural networks? [8 min]Video (MP4) for What are neural networks? [8 min]
 CompletedSome simple models of neurons [8 min]
Subtitles (text) for Some simple models of neurons [8 min]Subtitles (srt) for Some simple models of neurons [8 min]Video (MP4) for Some simple models of neurons [8 min]
 CompletedA simple example of learning [6 min]
Subtitles (text) for A simple example of learning [6 min]Subtitles (srt) for A simple example of learning [6 min]Video (MP4) for A simple example of learning [6 min]
 CompletedThree types of learning [8 min]
Subtitles (text) for Three types of learning [8 min]Subtitles (srt) for Three types of learning [8 min]Video (MP4) for Three types of learning [8 min]
 CompletedTypes of neural network architectures [7 min]
Slides for Types of neural network architectures [7 min]Slides (pdf) for Types of neural network architectures [7 min]Subtitles (text) for Types of neural network architectures [7 min]Subtitles (srt) for Types of neural network architectures [7 min]Video (MP4) for Types of neural network architectures [7 min]
 CompletedPerceptrons: The first generation of neural networks [8 min]
Subtitles (text) for Perceptrons: The first generation of neural networks [8 min]Subtitles (srt) for Perceptrons: The first generation of neural networks [8 min]Video (MP4) for Perceptrons: The first generation of neural networks [8 min]
 CompletedA geometrical view of perceptrons [6 min]
Subtitles (text) for A geometrical view of perceptrons [6 min]Subtitles (srt) for A geometrical view of perceptrons [6 min]Video (MP4) for A geometrical view of perceptrons [6 min]
 CompletedWhy the learning works [5 min]
Subtitles (text) for Why the learning works [5 min]Subtitles (srt) for Why the learning works [5 min]Video (MP4) for Why the learning works [5 min]
 CompletedWhat perceptrons can’t do [15 min]
Subtitles (text) for What perceptrons can’t do [15 min]Subtitles (srt) for What perceptrons can’t do [15 min]Video (MP4) for What perceptrons can’t do [15 min]
 Learning the weights of a linear neuron [12 min]
Slides for Learning the weights of a linear neuron [12 min]Slides (pdf) for Learning the weights of a linear neuron [12 min]Subtitles (text) for Learning the weights of a linear neuron [12 min]Subtitles (srt) for Learning the weights of a linear neuron [12 min]Video (MP4) for Learning the weights of a linear neuron [12 min]
 The error surface for a linear neuron [5 min]
Subtitles (text) for The error surface for a linear neuron [5 min]Subtitles (srt) for The error surface for a linear neuron [5 min]Video (MP4) for The error surface for a linear neuron [5 min]
 Learning the weights of a logistic output neuron [4 min]
Subtitles (text) for Learning the weights of a logistic output neuron [4 min]Subtitles (srt) for Learning the weights of a logistic output neuron [4 min]Video (MP4) for Learning the weights of a logistic output neuron [4 min]
 The backpropagation algorithm [12 min]
Learning representations by backpropagating errors for The backpropagation algorithm [12 min]Subtitles (text) for The backpropagation algorithm [12 min]Subtitles (srt) for The backpropagation algorithm [12 min]Video (MP4) for The backpropagation algorithm [12 min]
 Using the derivatives computed by backpropagation [10 min]
Subtitles (text) for Using the derivatives computed by backpropagation [10 min]Subtitles (srt) for Using the derivatives computed by backpropagation [10 min]Video (MP4) for Using the derivatives computed by backpropagation [10 min]
 Learning to predict the next word [13 min]
Slides for Learning to predict the next word [13 min]Slides (pdf) for Learning to predict the next word [13 min]Subtitles (text) for Learning to predict the next word [13 min]Subtitles (srt) for Learning to predict the next word [13 min]Video (MP4) for Learning to predict the next word [13 min]
 A brief diversion into cognitive science [4 min]
Subtitles (text) for A brief diversion into cognitive science [4 min]Subtitles (srt) for A brief diversion into cognitive science [4 min]Video (MP4) for A brief diversion into cognitive science [4 min]
 Another diversion: The softmax output function [7 min]
Subtitles (text) for Another diversion: The softmax output function [7 min]Subtitles (srt) for Another diversion: The softmax output function [7 min]Video (MP4) for Another diversion: The softmax output function [7 min]
 Neuroprobabilistic language models [8 min]
Neural probabilisic language models for Neuroprobabilistic language models [8 min]Subtitles (text) for Neuroprobabilistic language models [8 min]Subtitles (srt) for Neuroprobabilistic language models [8 min]Video (MP4) for Neuroprobabilistic language models [8 min]
 Ways to deal with the large number of possible outputs [15 min]
Word Map for Ways to deal with the large number of possible outputs [15 min]Subtitles (text) for Ways to deal with the large number of possible outputs [15 min]Subtitles (srt) for Ways to deal with the large number of possible outputs [15 min]Video (MP4) for Ways to deal with the large number of possible outputs [15 min]
 Why object recognition is difficult [5 min]
Lecture 5 slides in pptx for Why object recognition is difficult [5 min]Lecture 5 slides in pdf for Why object recognition is difficult [5 min]Subtitles (text) for Why object recognition is difficult [5 min]Subtitles (srt) for Why object recognition is difficult [5 min]Video (MP4) for Why object recognition is difficult [5 min]
 Achieving viewpoint invariance [6 min]
Subtitles (text) for Achieving viewpoint invariance [6 min]Subtitles (srt) for Achieving viewpoint invariance [6 min]Video (MP4) for Achieving viewpoint invariance [6 min]
 Convolutional nets for digit recognition [16 min]
Subtitles (text) for Convolutional nets for digit recognition [16 min]Subtitles (srt) for Convolutional nets for digit recognition [16 min]Video (MP4) for Convolutional nets for digit recognition [16 min]
 Convolutional nets for object recognition [17min]
(hard) Gradientbased learning applied to document recognition for Convolutional nets for object recognition [17min]Convolutional networks for images, speech, and time series for Convolutional nets for object recognition [17min]Subtitles (text) for Convolutional nets for object recognition [17min]Subtitles (srt) for Convolutional nets for object recognition [17min]Video (MP4) for Convolutional nets for object recognition [17min]
 Overview of minibatch gradient descent
Lecture 6 slides in pptx for Overview of minibatch gradient descentLecture 6 slides in pdf for Overview of minibatch gradient descentSubtitles (text) for Overview of minibatch gradient descentSubtitles (srt) for Overview of minibatch gradient descentVideo (MP4) for Overview of minibatch gradient descent
 A bag of tricks for minibatch gradient descent
Subtitles (text) for A bag of tricks for minibatch gradient descentSubtitles (srt) for A bag of tricks for minibatch gradient descentVideo (MP4) for A bag of tricks for minibatch gradient descent
 The momentum method
Subtitles (text) for The momentum methodSubtitles (srt) for The momentum methodVideo (MP4) for The momentum method
 Adaptive learning rates for each connection
Subtitles (text) for Adaptive learning rates for each connectionSubtitles (srt) for Adaptive learning rates for each connectionVideo (MP4) for Adaptive learning rates for each connection
 Rmsprop: Divide the gradient by a running average of its recent magnitude
Subtitles (text) for Rmsprop: Divide the gradient by a running average of its recent magnitudeSubtitles (srt) for Rmsprop: Divide the gradient by a running average of its recent magnitudeVideo (MP4) for Rmsprop: Divide the gradient by a running average of its recent magnitude
 Modeling sequences: A brief overview
Lecture 7 slides in pptx for Modeling sequences: A brief overviewLecture 7 slides in pdf for Modeling sequences: A brief overviewSubtitles (text) for Modeling sequences: A brief overviewSubtitles (srt) for Modeling sequences: A brief overviewVideo (MP4) for Modeling sequences: A brief overview
 Training RNNs with back propagation
Subtitles (text) for Training RNNs with back propagationSubtitles (srt) for Training RNNs with back propagationVideo (MP4) for Training RNNs with back propagation
 A toy example of training an RNN
Subtitles (text) for A toy example of training an RNNSubtitles (srt) for A toy example of training an RNNVideo (MP4) for A toy example of training an RNN
 Why it is difficult to train an RNN
Subtitles (text) for Why it is difficult to train an RNNSubtitles (srt) for Why it is difficult to train an RNNVideo (MP4) for Why it is difficult to train an RNN
 Longterm Shorttermmemory
(hard) A novel approach to online handwriting recognition based on bidirectional long shortterm memory networks for Longterm ShorttermmemorySubtitles (text) for Longterm ShorttermmemorySubtitles (srt) for Longterm ShorttermmemoryVideo (MP4) for Longterm Shorttermmemory
 A brief overview of Hessian Free optimization
Lecture 8 slides in pptx for A brief overview of Hessian Free optimizationLecture 8 slides in pdf for A brief overview of Hessian Free optimizationSubtitles (text) for A brief overview of Hessian Free optimizationSubtitles (srt) for A brief overview of Hessian Free optimizationVideo (MP4) for A brief overview of Hessian Free optimization
 Modeling character strings with multiplicative connections [14 mins]
Subtitles (text) for Modeling character strings with multiplicative connections [14 mins]Subtitles (srt) for Modeling character strings with multiplicative connections [14 mins]Video (MP4) for Modeling character strings with multiplicative connections [14 mins]
 Learning to predict the next character using HF [12 mins]
Generating Text with Recurrent Neural Networks for Learning to predict the next character using HF [12 mins]Subtitles (text) for Learning to predict the next character using HF [12 mins]Subtitles (srt) for Learning to predict the next character using HF [12 mins]Video (MP4) for Learning to predict the next character using HF [12 mins]
 Echo State Networks [9 min]
Scholarpedia: Echo State Networks for Echo State Networks [9 min]Subtitles (text) for Echo State Networks [9 min]Subtitles (srt) for Echo State Networks [9 min]Video (MP4) for Echo State Networks [9 min]
 Overview of ways to improve generalization [12 min]
Lecture 9 slides in pptx for Overview of ways to improve generalization [12 min]Lecture 9 slides in pdf for Overview of ways to improve generalization [12 min]Subtitles (text) for Overview of ways to improve generalization [12 min]Subtitles (srt) for Overview of ways to improve generalization [12 min]Video (MP4) for Overview of ways to improve generalization [12 min]
 Limiting the size of the weights [6 min]
Subtitles (text) for Limiting the size of the weights [6 min]Subtitles (srt) for Limiting the size of the weights [6 min]Video (MP4) for Limiting the size of the weights [6 min]
 Using noise as a regularizer [7 min]
Subtitles (text) for Using noise as a regularizer [7 min]Subtitles (srt) for Using noise as a regularizer [7 min]Video (MP4) for Using noise as a regularizer [7 min]
 Introduction to the full Bayesian approach [12 min]
Subtitles (text) for Introduction to the full Bayesian approach [12 min]Subtitles (srt) for Introduction to the full Bayesian approach [12 min]Video (MP4) for Introduction to the full Bayesian approach [12 min]
 The Bayesian interpretation of weight decay [11 min]
Subtitles (text) for The Bayesian interpretation of weight decay [11 min]Subtitles (srt) for The Bayesian interpretation of weight decay [11 min]Video (MP4) for The Bayesian interpretation of weight decay [11 min]
 MacKay’s quick and dirty method of setting weight costs [4 min]
Subtitles (text) for MacKay’s quick and dirty method of setting weight costs [4 min]Subtitles (srt) for MacKay’s quick and dirty method of setting weight costs [4 min]Video (MP4) for MacKay’s quick and dirty method of setting weight costs [4 min]
 Why it helps to combine models [13 min]
lecture 10 slides in pptx for Why it helps to combine models [13 min]lecture 10 slides in pdf for Why it helps to combine models [13 min]Subtitles (text) for Why it helps to combine models [13 min]Subtitles (srt) for Why it helps to combine models [13 min]Video (MP4) for Why it helps to combine models [13 min]
 Mixtures of Experts [13 min]
Adaptive mixtures of local experts for Mixtures of Experts [13 min]Subtitles (text) for Mixtures of Experts [13 min]Subtitles (srt) for Mixtures of Experts [13 min]Video (MP4) for Mixtures of Experts [13 min]
 The idea of full Bayesian learning [7 min]
Subtitles (text) for The idea of full Bayesian learning [7 min]Subtitles (srt) for The idea of full Bayesian learning [7 min]Video (MP4) for The idea of full Bayesian learning [7 min]
 Making full Bayesian learning practical [7 min]
Subtitles (text) for Making full Bayesian learning practical [7 min]Subtitles (srt) for Making full Bayesian learning practical [7 min]Video (MP4) for Making full Bayesian learning practical [7 min]
 Dropout [9 min]
Improving neural networks by preventing coadaptation of feature detectors for Dropout [9 min]Subtitles (text) for Dropout [9 min]Subtitles (srt) for Dropout [9 min]Video (MP4) for Dropout [9 min]
 Hopfield Nets [13 min]
lecture 11 slides in pptx for Hopfield Nets [13 min]Lecture 11 slides in pdf for Hopfield Nets [13 min]Subtitles (text) for Hopfield Nets [13 min]Subtitles (srt) for Hopfield Nets [13 min]Video (MP4) for Hopfield Nets [13 min]
 Dealing with spurious minima [11 min]
Subtitles (text) for Dealing with spurious minima [11 min]Subtitles (srt) for Dealing with spurious minima [11 min]Video (MP4) for Dealing with spurious minima [11 min]
 Hopfield nets with hidden units [10 min]
Subtitles (text) for Hopfield nets with hidden units [10 min]Subtitles (srt) for Hopfield nets with hidden units [10 min]Video (MP4) for Hopfield nets with hidden units [10 min]
 Using stochastic units to improv search [11 min]
Subtitles (text) for Using stochastic units to improv search [11 min]Subtitles (srt) for Using stochastic units to improv search [11 min]Video (MP4) for Using stochastic units to improv search [11 min]
 How a Boltzmann machine models data [12 min]
Scholarpedia: Boltzmann Machines for How a Boltzmann machine models data [12 min]Subtitles (text) for How a Boltzmann machine models data [12 min]Subtitles (srt) for How a Boltzmann machine models data [12 min]Video (MP4) for How a Boltzmann machine models data [12 min]
 Boltzmann machine learning [12 min]
lecture 12 slides in pptx for Boltzmann machine learning [12 min]Lecture 12 slides in pdf for Boltzmann machine learning [12 min]Subtitles (text) for Boltzmann machine learning [12 min]Subtitles (srt) for Boltzmann machine learning [12 min]Video (MP4) for Boltzmann machine learning [12 min]
 OPTIONAL VIDEO: More efficient ways to get the statistics [15 mins]
Subtitles (text) for OPTIONAL VIDEO: More efficient ways to get the statistics [15 mins]Subtitles (srt) for OPTIONAL VIDEO: More efficient ways to get the statistics [15 mins]Video (MP4) for OPTIONAL VIDEO: More efficient ways to get the statistics [15 mins]
 Restricted Boltzmann Machines [11 min]
Subtitles (text) for Restricted Boltzmann Machines [11 min]Subtitles (srt) for Restricted Boltzmann Machines [11 min]Video (MP4) for Restricted Boltzmann Machines [11 min]
 An example of RBM learning [7 mins]
Subtitles (text) for An example of RBM learning [7 mins]Subtitles (srt) for An example of RBM learning [7 mins]Video (MP4) for An example of RBM learning [7 mins]
 RBMs for collaborative filtering [8 mins]
Subtitles (text) for RBMs for collaborative filtering [8 mins]Subtitles (srt) for RBMs for collaborative filtering [8 mins]Video (MP4) for RBMs for collaborative filtering [8 mins]
 The ups and downs of back propagation [10 min]
lecture 13 slides in pptx for The ups and downs of back propagation [10 min]Lecture 13 slides in pdf for The ups and downs of back propagation [10 min]Subtitles (text) for The ups and downs of back propagation [10 min]Subtitles (srt) for The ups and downs of back propagation [10 min]Video (MP4) for The ups and downs of back propagation [10 min]
 Belief Nets [13 min]
Subtitles (text) for Belief Nets [13 min]Subtitles (srt) for Belief Nets [13 min]Video (MP4) for Belief Nets [13 min]
 Learning sigmoid belief nets [12 min]
Connectionist learning of belief networks for Learning sigmoid belief nets [12 min]Subtitles (text) for Learning sigmoid belief nets [12 min]Subtitles (srt) for Learning sigmoid belief nets [12 min]Video (MP4) for Learning sigmoid belief nets [12 min]
 The wakesleep algorithm [13 min]
The “wakesleep” algorithm for unsupervised neural networks for The wakesleep algorithm [13 min]Subtitles (text) for The wakesleep algorithm [13 min]Subtitles (srt) for The wakesleep algorithm [13 min]Video (MP4) for The wakesleep algorithm [13 min]
 Learning layers of features by stacking RBMs [17 min]
Selftaught learning: transfer learning from unlabeled data for Learning layers of features by stacking RBMs [17 min](easy) To recognize shapes, first learn to generate images for Learning layers of features by stacking RBMs [17 min](hard) A fast learning algorithm for deep belief nets for Learning layers of features by stacking RBMs [17 min]lecture 14 slides in pptx for Learning layers of features by stacking RBMs [17 min]Lecture 14 slides in pdf for Learning layers of features by stacking RBMs [17 min]Subtitles (text) for Learning layers of features by stacking RBMs [17 min]Subtitles (srt) for Learning layers of features by stacking RBMs [17 min]Video (MP4) for Learning layers of features by stacking RBMs [17 min]
 Discriminative learning for DBNs [9 mins]
Subtitles (text) for Discriminative learning for DBNs [9 mins]Subtitles (srt) for Discriminative learning for DBNs [9 mins]Video (MP4) for Discriminative learning for DBNs [9 mins]
 What happens during discriminative finetuning? [8 mins]
Subtitles (text) for What happens during discriminative finetuning? [8 mins]Subtitles (srt) for What happens during discriminative finetuning? [8 mins]Video (MP4) for What happens during discriminative finetuning? [8 mins]
 Modeling realvalued data with an RBM [10 mins]
Subtitles (text) for Modeling realvalued data with an RBM [10 mins]Subtitles (srt) for Modeling realvalued data with an RBM [10 mins]Video (MP4) for Modeling realvalued data with an RBM [10 mins]
 OPTIONAL VIDEO: RBMs are infinite sigmoid belief nets [17 mins]
Subtitles (text) for OPTIONAL VIDEO: RBMs are infinite sigmoid belief nets [17 mins]Subtitles (srt) for OPTIONAL VIDEO: RBMs are infinite sigmoid belief nets [17 mins]Video (MP4) for OPTIONAL VIDEO: RBMs are infinite sigmoid belief nets [17 mins]
 From PCA to autoencoders [5 mins]
lecture 15 slides in pptx for From PCA to autoencoders [5 mins]Lecture 15 slides in pdf for From PCA to autoencoders [5 mins]Subtitles (text) for From PCA to autoencoders [5 mins]Subtitles (srt) for From PCA to autoencoders [5 mins]Video (MP4) for From PCA to autoencoders [5 mins]
 Deep auto encoders [4 mins]
Subtitles (text) for Deep auto encoders [4 mins]Subtitles (srt) for Deep auto encoders [4 mins]Video (MP4) for Deep auto encoders [4 mins]
 Deep auto encoders for document retrieval [8 mins]
Subtitles (text) for Deep auto encoders for document retrieval [8 mins]Subtitles (srt) for Deep auto encoders for document retrieval [8 mins]Video (MP4) for Deep auto encoders for document retrieval [8 mins]
 Semantic Hashing [9 mins]
Semantic Hashing for Semantic Hashing [9 mins]Subtitles (text) for Semantic Hashing [9 mins]Subtitles (srt) for Semantic Hashing [9 mins]Video (MP4) for Semantic Hashing [9 mins]
 Learning binary codes for image retrieval [9 mins]
Using Very Deep Autoencoders for ContentBased Image Retrieval for Learning binary codes for image retrieval [9 mins]Subtitles (text) for Learning binary codes for image retrieval [9 mins]Subtitles (srt) for Learning binary codes for image retrieval [9 mins]Video (MP4) for Learning binary codes for image retrieval [9 mins]
 Shallow autoencoders for pretraining [7 mins]
Subtitles (text) for Shallow autoencoders for pretraining [7 mins]Subtitles (srt) for Shallow autoencoders for pretraining [7 mins]Video (MP4) for Shallow autoencoders for pretraining [7 mins]
 OPTIONAL: Learning a joint model of images and captions [10 min]
lecture 16 slides in pptx for OPTIONAL: Learning a joint model of images and captions [10 min]lecture 16 slides in pdf for OPTIONAL: Learning a joint model of images and captions [10 min]Subtitles (text) for OPTIONAL: Learning a joint model of images and captions [10 min]Subtitles (srt) for OPTIONAL: Learning a joint model of images and captions [10 min]Video (MP4) for OPTIONAL: Learning a joint model of images and captions [10 min]
 OPTIONAL: Hierarchical Coordinate Frames [10 mins]
Subtitles (text) for OPTIONAL: Hierarchical Coordinate Frames [10 mins]Subtitles (srt) for OPTIONAL: Hierarchical Coordinate Frames [10 mins]Video (MP4) for OPTIONAL: Hierarchical Coordinate Frames [10 mins]
 OPTIONAL: Bayesian optimization of hyperparameters [13 min]
Subtitles (text) for OPTIONAL: Bayesian optimization of hyperparameters [13 min]Subtitles (srt) for OPTIONAL: Bayesian optimization of hyperparameters [13 min]Video (MP4) for OPTIONAL: Bayesian optimization of hyperparameters [13 min]
 OPTIONAL: The fog of progress [3 min]
lecture 16 slides in pptx for OPTIONAL: The fog of progress [3 min]Lecture 16 slides in pdf for OPTIONAL: The fog of progress [3 min]Subtitles (text) for OPTIONAL: The fog of progress [3 min]Subtitles (srt) for OPTIONAL: The fog of progress [3 min]
Wiki :
Welcome to the wiki page.
This wiki will be collaboratively created by students in Prof Geoffey Hinton’s Neural Networks for Machine Learning course (Fall 2012).
Feel free to create pages as needed. To see how to edit or create a page, see “Help” on the left navbar.
Lecture 2a – An overview of the main types of neural network architecture
Lecture 2b – Perceptrons: The first generation of neural networks
Lecture 2c – A geometrical view of perceptrons
Lecture 2d – Why the learning works
Lecture 2e – What perceptrons can’t do
Lecture 4c – A quick note on the crossentropy and derivative of a softmax unit
Lecture 13b – The math of Sigmoid Belief Networks
Online Books
Local Links
External resources
 Neuralnets:External Resources for Neural Networks for Machine Learning
 CompletedWhy do we need machine learning? [13 min]