

首页  思维导图  详情



台大李宏毅

2017-03-08 15:42:51   0  举报





仅支持查看

AI智能生成

台大李宏毅，全名李宏毅，是台湾大学计算机科学系的教授。他以其深入浅出的教学风格和幽默风趣的讲解而受到学生们的喜爱。他的课程涵盖了人工智能、机器学习、深度学习等多个领域，深受广大学生和专业人士的欢迎。李宏毅教授在学术界享有盛誉，曾获得多项国际学术奖项。此外，他还积极参与学术交流活动，与世界各地的专家学者分享研究成果，为推动计算机科学领域的发展做出了巨大贡献。

模版推荐

作者其他创作

大纲/内容

第一课概述

什么是机器学习

寻找最佳function任务

从音频到文字的任务

function的Hypathesis Set

机器学习框架图

什么是深度学习

多个function的生产线

深度学习举例

语音识别

早期的MFCC识别音频过程

通过深度学习进行的语音识别

图像识别

早期图像识别流程

深度学习图像识别过程

从人类神经元工作过程受到启发

深度学习的框架

输入层、隐藏层、输出层、激活函数

深度学习的历史

1960年感知机

1969年感知机局限性

1980年多层感知机

1986年 BackPropagation

1989年 1层感知机可以表示任何函数

2006年RBM initialization

2009年GPU计算

2011年语音识别流行起来

2012年ILSVRC图像识别比赛

为什么是深度学习

shallow可以表示任何形式function

deep与shallow进行比较

识别01格子的例子

更少的神经节点

更少的参数

更少的训练样本

更好的效果

手写识别准确度的例子

同上

语音识别

同上

为什么会deep优于 shallow

通过逻辑电路进行解释

为什么之前没有流行

后续介绍

structure学习

输入输出为结构化数据，如何进行处理

相关材料

“deepLearning”

子主题

第二课 DNN

what is model

what is the task

binary classification

malti-class classification

what is the model 

what is the function we are looking for

input, output format

A layer of  neuron

a neuron's struct

limitation of single layer

Can't handle XNOR. Example

math symble

output: a

intput: z

weight: w

bias: b

struct expression 

what is the "best" function

best function = best parameters

cost function: c(𝜃)

How to pick the "best model"

Gradent decent

 Idea of gradient decent 

Formal derivation of gradient decent 

Taloy series: Basic of Gradient decent

Multivariable of Taylor series

Gradient decent for neural network: bp

stuck at local minima

Practical issue for neural network

Parameter Initialization

Learning rate: example

Stochastic gradient decent: faster, better. Example

Mini-batch gradient decent: faster than stochastic gradient decent, easy compute

Recipe for learning

data provided in home work

common mistake

第三课 Backpropagation

background

Cost function

Gradient decent

BP is an efficent way to comput gradient decent 

Chain rule

1 dimension

mult-dimension

对w求偏导，分解成两项

求第一部分

子主题

How to compute δL 

The relation of δL and δL+1

子主题

conclude

子主题

第四课 Tips for training neural network

Activaction function

Rectified linear unit (ReLU)

函数图像

导数图像

Reason

Fast to compute

Biological reason

Infinite sigmoid with different biases

Vanishing gradient problem

Problem of sigmoid

Vanishing Gradient Problem

For sigmoid function, δ'(z) always smaller than 1

Error signal is getting smaller and smaller 

ReLU

A thinner linear network 

Relu variant

Leakly ReLU

Parametric ReLU

Maxout

maxout

function

maxout training

Cost function

Output layer: Softmax

Use ReLU as output layer is not easy to express the distance between the real value and ouput value

计算过程

Cross Entropy

function

compute

i = r

i ≠ r

Data preprocessing

Normalization input

Normalization training data and testing data in same way

Optimization

Learning rate

Reduce the learning rate by some factor every few epochs

E.g

Learning rate cannot be one-size-fits-all

Adagrad

原始公式

推导结果

Momentum

Original Gradient Decent: Easy to stuck

easy to stuck

In physical word: inertia

inertia

Momentum

compute

Momentum

Generalization

Early stoping

Weight decay

w close to 0 is better 

the reason

New cost function

update

Dropout

Dropout prcoess

Training: Each iterate has p% neuron to dropout

Testing: if the dropout rate is p%, wen testing all weight multiply (1-p%)

Intuitive reason

Dropout ≈ Ensemble

Ensemble

Training of dropout

Practical Suggestion for dropout

Large network

Longer train time

Higher learning rate

Lager monmentum

第五课 RNN

Memory is important

Sequence data

Vanilla Recurrent Neural Network

Applycation

Speech recognition

Structure

直观理解

展开

Cost Function

compute

Training

BPTT

More Applycation

Name entity recognition

Information extraction

Variant Of RNN

Elman Network & Jordan Network

Deep RNN

Bidirectional RNN

Many To One

Many To Many

One To Many

Long Short Term Memory

Structure

Input gate

Memory forget gate

Output gate

Structure

Example

What is the next wave

Attention based model

第六课 RNN Training

Review Bp

BPTT

RNN not easy to train

Possible solutions

Clipped Gradient

NAG

process

RMSProp

Adagrab

compute

LSTM can address gradient vanishing problem

LSTM struct

Basic struct

Peephole

BPTT training

Other Simpler Variants

Better Initialization

第七课 Basic structure for deepLearning model 

Three steps for deeplearning

Neural network

Cost funciton

Optimazation

Fully Connected Layer

Recurrent Structure

RNN

Deep RNN

Bidirection RNN

Pyramidal RNN

Naive RNN

LSTM

GRU

Example Task

Target Delay

Compare Of Different Structure

第八课 Basic structure for deepLearning model  

Stack RNN

stack rnn

Convolutional/Pooling layer

Convolutional Layer

Spare Conectivity

Each neural only connects to the output of the part of previous layer

Spare Connectivity

Parameter Sharing

The neural with different receptive fields can use the same set of the parameters

Parameter sharing

Example

Pooling Layer

Pooling

Average Pooling

Max Pooling

L2 Pooling

What outputs should be grouped together

子主题

Combination of different basic layers

子主题

第九课 Special Structure

CNN is not invariant to scaling and rotation

子主题

How to transform an imaging / feature map

Expansion, Compression, Translation

Expansion, Compression, Translation

Rotation

子主题

Interpolation

Example

Street View

House Number

Bird Recognition

Highway network & Grid LSTM

Highway Network

Feedforward v.s. Recurrent

GRU to Highway Network

子主题

Grid LSTM

子主题

 Collect

Get Started

李锐人生C端结构

 Collect

Get Started

李宇春留言板

 Collect

Get Started

余毅英语笔记

 Collect

Get Started

毅冰米课（更新）





0 条评论

下一页