Coursera深度学习笔记
2019-11-04 13:44:16 0 举报
AI智能生成
Coursera深度学习Deep Learning思维导图笔记之Improving Deep Neural Networks, 包括初始化, 损失函数, 优化函数,TensorFlow入门用法,一些基本概念、公式等。
作者其他创作
大纲/内容
Inititalization
Zero initialization
In general, initializing all the weights to zero results in the network failing to break symmetry.
Random initialization
np.random.randn(layers_dims[l],layers_dims[l-1])
Xavier initialization
np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(1/layers_dims[l-1])
The basic idea of Xavier initialization is to keep the variance of inputs and outputs consistent
He initialization
np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2/layers_dims[l-1])
For layers with a ReLU activation.
Regularization
L2 regularization
1/m * lambd/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
L2 regularization cost<br>
Dropout
With dropout, your neurons thus become less sensitive to the activation of one other specific neuron
Forward prop: steps 1-4 are described below.
Step 1: initialize matrix D1 = np.random.rand(..., ...)
np.random.rand(np.shape(A1)[0],np.shape(A1)[1])
Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)
(D1 < keep_prob)
Step 3: shut down some neurons of A1
A1 * D1
Step 4: scale the value of neurons that haven't been shut down
A1 / keep_prob
Backward prop: steps 1-2 are described below.
Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation<br>
after dA2 = np.dot(W3.T, dZ3), |dA2 = D2*dA2|
Step 2: Scale the value of neurons that haven't been shut down
| dA2 = dA2/keep_prob | before dZ2 = np.multiply(dA2, np.int64(A2 > 0))
You should use dropout (randomly eliminate nodes) only in training!
Gradient Checking
θ+=θ+ε<br>θ−=θ−ε<br>J+=J(θ+)<br>J−=J(θ−)<br>gradapprox=(J+−J− )/ 2ε
Compute the gradient using backward propagation, and store the result in a variable "grad"
difference < 1e-7: The gradient is correct!<br>
numerator = np.linalg.norm(grad-gradapprox)
denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)
Optimization
Batch Gradient Descent
Take gradient steps with respect to all m examples on each step.
<span style="font-size: inherit;"> Gradient Descent Figure</span><br>
a, caches = forward_propagation(X, parameters)
Stochastic Gradient Descent
compute gradients on just one training example at a time.
Stochastic Gradient Descent Figure
a, caches = forward_propagation(X[:,j], parameters)
Mini-Batch Gradient descent
Take gradient steps with respect to all mini-batch-size examples on each step.
Mini-Batch Gradient Descent Figure
Two steps to perfom Mini-Batch Gradient descent:
Step1. Shuffle (X, Y)
permutation = list(np.random.permutation(m))<br> shuffled_X = X[:, permutation]<br> shuffled_Y = Y[:, permutation].reshape((1,m))
Step2. Partition (shuffled_X, shuffled_Y). Handle the end case.
mini_batch_X = shuffled_X[:, mini_batch_size * k : mini_batch_size * (k+1)]
end case: mini_batch_X = shuffled_X[:, mini_batch_size * num_complete_minibatches : ]
Momentum
Momentum takes into account the past gradients to smooth out the update.
Equations of Momentum Optimization Method
RMSProp
Equations of RMSProp Optimization Method<br>
Adam
Combines ideas from RMSProp and Momentum.
Equations of Adam Optimization Method
b[l] should apply the similar process as W[l]
Batch Normalization
Equations of Batch Normalization
The input value of any neuron in each layer of neural network was set to the standard normal distribution with mean of 0 and variance of 1<br>
Softmax Regression
Softmax regression generalizes logistic regression to C classes.
Equations of softmax
Loss function
ti is the actual value, yi here is yhat-i (softmaxed value)
Derivate
Cost function
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))
"logits" and "labels" are expected to be shape(number of examples, num_classes)
logits = tf.transpose(ZL)<br> labels = tf.transpose(Y)
TensorFlow Tutorial
Initializations
tf.Variable((y - y_hat)**2, name='loss')
tf.constant(39, name='y')
init = tf.global_variables_initializer()<br>
session.run(init)
The loss variable will be initialized and ready to be computed.
x = tf.placeholder(tf.int64, name = 'x')
sess.run(2 * x, feed_dict = {x: 3})
A placeholder is an object whose value you can specify only later.
Operations
tf.add(..., ...) to do an addition
tf.matmul(..., ...) to do a matrix multiplication
tf.multiply(...,...) to do an element-wise multiplication
Functions
sigmoid = tf.sigmoid(x)
one_hot_matrix = tf.one_hot(labels,C,1)
"One Hot" encoding<br>
One element of each column is "hot" (meaning set to 1)
W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
A1 = tf.nn.relu(Z1)
to apply the ReLU activation
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
Calculate the correct predictions
Others
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
To flatten the training images
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)
_ , minibatch_cost = sess.run([optimizer, cost],feed_dict={X:minibatch_X, Y: minibatch_Y})<br>
0 条评论
下一页