台大李宏毅
2017-03-08 15:42:51 0 举报
AI智能生成
台大李宏毅,全名李宏毅,是台湾大学计算机科学系的教授。他以其深入浅出的教学风格和幽默风趣的讲解而受到学生们的喜爱。他的课程涵盖了人工智能、机器学习、深度学习等多个领域,深受广大学生和专业人士的欢迎。李宏毅教授在学术界享有盛誉,曾获得多项国际学术奖项。此外,他还积极参与学术交流活动,与世界各地的专家学者分享研究成果,为推动计算机科学领域的发展做出了巨大贡献。
作者其他创作
大纲/内容
第一课 概述
什么是机器学习
寻找最佳function任务
从音频到文字的任务
function的Hypathesis Set
机器学习框架图
什么是深度学习
多个function的生产线
深度学习举例
语音识别
早期的MFCC识别音频过程
通过深度学习进行的语音识别
图像识别
早期图像识别流程
深度学习图像识别过程
从人类神经元工作过程受到启发
深度学习的框架
输入层、隐藏层、输出层、激活函数
深度学习的历史
1960年感知机
1969年感知机局限性
1980年多层感知机
1986年 BackPropagation
1989年 1层感知机可以表示 任何函数
2006年RBM initialization
2009年GPU计算
2011年语音识别流行起来
2012年ILSVRC图像识别比赛
为什么是深度学习
shallow可以表示任何形式function
deep与shallow进行比较
识别01格子的例子
更少的神经节点
更少的参数
更少的训练样本
更好的效果
手写识别准确度的例子
同上
语音识别
同上
为什么会deep优于 shallow
通过逻辑电路进行解释
为什么之前没有流行
后续介绍
structure学习
输入输出为结构化数据,如何进行处理
相关材料
“deepLearning”
子主题
第二课 DNN
what is model
what is the task
binary classification
malti-class classification
what is the model
what is the function we are looking for
input, output format
A layer of neuron
a neuron's struct
limitation of single layer
Can't handle XNOR. Example
math symble
output: a
intput: z
weight: w
bias: b
struct expression
what is the "best" function
best function = best parameters
cost function: c(𝜃)
How to pick the "best model"
Gradent decent
Idea of gradient decent
Formal derivation of gradient decent
Taloy series: Basic of Gradient decent
Multivariable of Taylor series
Gradient decent for neural network: bp
stuck at local minima
Practical issue for neural network
Parameter Initialization
Learning rate: example
Stochastic gradient decent: faster, better. Example
Mini-batch gradient decent: faster than stochastic gradient decent, easy compute
Recipe for learning
data provided in home work
common mistake
第三课 Backpropagation
background
Cost function
Gradient decent
BP is an efficent way to comput gradient decent
Chain rule
1 dimension
mult-dimension
对w求偏导,分解成两项
求第一部分
子主题
子主题
How to compute δL
How to compute δL
The relation of δL and δL+1
The relation of δL and δL+1
子主题
conclude
子主题
第四课 Tips for training neural network
Activaction function
Rectified linear unit (ReLU)
函数图像
导数图像
Reason
Fast to compute
Biological reason
Infinite sigmoid with different biases
Vanishing gradient problem
Problem of sigmoid
Vanishing Gradient Problem
For sigmoid function, δ'(z) always smaller than 1
Error signal is getting smaller and smaller
ReLU
A thinner linear network
Relu variant
Leakly ReLU
Parametric ReLU
Maxout
maxout
function
maxout training
Cost function
Output layer: Softmax
Use ReLU as output layer is not easy to express the distance between the real value and ouput value
计算过程
Cross Entropy
function
compute
i = r
i ≠ r
Data preprocessing
Normalization input
Normalization training data and testing data in same way
Optimization
Learning rate
Reduce the learning rate by some factor every few epochs
E.g
Learning rate cannot be one-size-fits-all
Adagrad
原始公式
推导结果
Momentum
Original Gradient Decent: Easy to stuck
easy to stuck
In physical word:
inertia
inertia
Momentum
compute
Momentum
Generalization
Early stoping
Weight decay
w close to 0 is better
the reason
New cost function
update
Dropout
Dropout prcoess
Training: Each iterate has p% neuron to dropout
Testing: if the dropout rate is p%, wen testing all weight multiply (1-p%)
Intuitive reason
Dropout ≈ Ensemble
Ensemble
Training of dropout
Training of dropout
Practical Suggestion for dropout
Large network
Longer train time
Higher learning rate
Lager monmentum
第五课 RNN
Memory is important
Sequence data
Vanilla Recurrent Neural Network
Applycation
Speech recognition
Structure
直观理解
展开
Cost Function
compute
Training
BPTT
More Applycation
Name entity recognition
Information extraction
Variant Of RNN
Elman Network & Jordan Network
Deep RNN
Bidirectional RNN
Many To One
Many To Many
One To Many
Long Short Term Memory
Structure
Input gate
Memory forget gate
Output gate
Structure
Example
What is the next wave
Attention based model
第六课 RNN Training
Review Bp
BPTT
RNN not easy to train
Possible solutions
Clipped Gradient
NAG
process
RMSProp
Adagrab
compute
LSTM can address gradient vanishing problem
LSTM struct
Basic struct
Peephole
BPTT training
Other Simpler Variants
Better Initialization
第七课 Basic structure for deepLearning model
Three steps for deeplearning
Neural network
Cost funciton
Optimazation
Fully Connected Layer
Recurrent Structure
RNN
Deep RNN
Bidirection RNN
Pyramidal RNN
Naive RNN
LSTM
LSTM
GRU
GRU
Example Task
Target Delay
Target Delay
Compare Of Different Structure
第八课 Basic structure for deepLearning model
Stack RNN
stack rnn
Convolutional/Pooling layer
Convolutional Layer
Spare Conectivity
Each neural only connects to the output of the part of previous layer
Spare Connectivity
Parameter Sharing
The neural with different receptive fields can use the same set of the parameters
Parameter sharing
Example
Pooling Layer
Pooling
Average Pooling
Max Pooling
L2 Pooling
What outputs should be grouped together
子主题
Combination of different basic layers
子主题
第九课 Special Structure
CNN is not invariant to scaling and rotation
子主题
How to transform an imaging / feature map
Expansion, Compression, Translation
Expansion, Compression, Translation
Rotation
子主题
子主题
Interpolation
Example
Street View
House Number
Bird Recognition
Highway network & Grid LSTM
Highway Network
Feedforward v.s. Recurrent
GRU to Highway Network
子主题
子主题
Grid LSTM
子主题
子主题
子主题
子主题
0 条评论
下一页