Coursera深度学习笔记
2019-11-04 13:44:16 0 举报
AI智能生成
登录查看完整内容
Coursera深度学习Deep Learning思维导图笔记之Improving Deep Neural Networks, 包括初始化, 损失函数, 优化函数,TensorFlow入门用法,一些基本概念、公式等。
作者其他创作
大纲/内容
Improving Deep Neural Networks
Inititalization
Zero initialization
Random initialization
Xavier initialization
The basic idea of Xavier initialization is to keep the variance of inputs and outputs consistent
He initialization
For layers with a ReLU activation.
Regularization
L2 regularization
1/m * lambd/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
L2 regularization cost
Dropout
Forward prop: steps 1-4 are described below.
Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)
(D1 < keep_prob)
Step 3: shut down some neurons of A1
A1 * D1
Step 4: scale the value of neurons that haven't been shut down
A1 / keep_prob
Backward prop: steps 1-2 are described below.
Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation
Step 2: Scale the value of neurons that haven't been shut down
You should use dropout (randomly eliminate nodes) only in training!
Gradient Checking
θ+=θ+εθ−=θ−εJ+=J(θ+)J−=J(θ−)gradapprox=(J+−J− )/ 2ε
difference < 1e-7: The gradient is correct!
numerator = np.linalg.norm(grad-gradapprox)
denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)
Optimization
Batch Gradient Descent
Take gradient steps with respect to all m examples on each step.
Gradient Descent Figure
Stochastic Gradient Descent
compute gradients on just one training example at a time.
Stochastic Gradient Descent Figure
Mini-Batch Gradient descent
Take gradient steps with respect to all mini-batch-size examples on each step.
Mini-Batch Gradient Descent Figure
Two steps to perfom Mini-Batch Gradient descent:
Momentum
Momentum takes into account the past gradients to smooth out the update.
Equations of Momentum Optimization Method
RMSProp
Equations of RMSProp Optimization Method
Adam
Combines ideas from RMSProp and Momentum.
Equations of Adam Optimization Method
b[l] should apply the similar process as W[l]
Batch Normalization
Equations of Batch Normalization
The input value of any neuron in each layer of neural network was set to the standard normal distribution with mean of 0 and variance of 1
Softmax Regression
Softmax regression generalizes logistic regression to C classes.
Equations of softmax
Loss function
Derivate
Cost function
\"logits\" and \"labels\
logits = tf.transpose(ZL) labels = tf.transpose(Y)
TensorFlow Tutorial
Initializations
init = tf.global_variables_initializer()
session.run(init)
The loss variable will be initialized and ready to be computed.
A placeholder is an object whose value you can specify only later.
Operations
Functions
sigmoid = tf.sigmoid(x)
\"One Hot\" encoding
One element of each column is \"hot\" (meaning set to 1)
W1 = tf.get_variable(\"W1\
b1 = tf.get_variable(\"b1\
A1 = tf.nn.relu(Z1)
to apply the ReLU activation
Calculate the correct predictions
Others
To flatten the training images
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
0 条评论
回复 删除
下一页