首页  思维导图  详情

Coursera深度学习笔记

2019-11-04 13:44:16   0  举报





AI智能生成

Coursera深度学习Deep Learning思维导图笔记之Improving Deep Neural Networks, 包括初始化, 损失函数, 优化函数，TensorFlow入门用法，一些基本概念、公式等。

深度学习

Deep Learning

思维导图

学习笔记

TensorFlow

作者其他创作

大纲/内容

Inititalization

Zero initialization

In general, initializing all the weights to zero results in the network failing to break symmetry.

Random initialization

np.random.randn(layers_dims[l],layers_dims[l-1])

Xavier initialization

np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(1/layers_dims[l-1])

The basic idea of Xavier initialization is to keep the variance of inputs and outputs consistent

He initialization

np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2/layers_dims[l-1])

For layers with a ReLU activation.

Regularization

L2 regularization

1/m * lambd/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))

L2 regularization cost

Dropout

With dropout, your neurons thus become less sensitive to the activation of one other specific neuron

Forward prop: steps 1-4 are described below.

Step 1: initialize matrix D1 = np.random.rand(..., ...)

np.random.rand(np.shape(A1)[0],np.shape(A1)[1])

Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold)

(D1 < keep_prob)

Step 3: shut down some neurons of A1

A1 * D1

Step 4: scale the value of neurons that haven't been shut down

A1 / keep_prob

Backward prop:  steps 1-2 are described below.

Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation

after dA2 = np.dot(W3.T, dZ3), |dA2 = D2*dA2|

Step 2: Scale the value of neurons that haven't been shut down

| dA2 = dA2/keep_prob | before dZ2 = np.multiply(dA2, np.int64(A2 > 0))

You should use dropout (randomly eliminate nodes) only in training!

Gradient Checking

θ+=θ+ε θ−=θ−ε J+=J(θ+) J−=J(θ−) gradapprox=(J+−J− )/ 2ε

Compute the gradient using backward propagation, and store the result in a variable "grad"

difference < 1e-7: The gradient is correct!

numerator = np.linalg.norm(grad-gradapprox)

denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)

Optimization

Batch  Gradient Descent

Take gradient steps with respect to all m examples on each step.

Gradient Descent Figure

a, caches = forward_propagation(X, parameters)

Stochastic Gradient Descent

compute gradients on just one training example at a time.

       Stochastic Gradient Descent Figure

a, caches = forward_propagation(X[:,j], parameters)

Mini-Batch Gradient descent

Take gradient steps with respect to all mini-batch-size examples on each step.

         Mini-Batch Gradient Descent Figure

Two steps to perfom Mini-Batch Gradient descent:

Step1. Shuffle (X, Y)

permutation = list(np.random.permutation(m)) shuffled_X = X[:, permutation] shuffled_Y = Y[:, permutation].reshape((1,m))

Step2. Partition (shuffled_X, shuffled_Y). Handle the end case.

mini_batch_X = shuffled_X[:, mini_batch_size * k : mini_batch_size * (k+1)]

end case: mini_batch_X = shuffled_X[:, mini_batch_size * num_complete_minibatches : ]

Momentum

Momentum takes into account the past gradients to smooth out the update.

   Equations of Momentum Optimization Method

RMSProp

Equations of RMSProp Optimization Method

Adam

Combines ideas from RMSProp and Momentum.

        Equations of Adam Optimization Method

b[l] should apply the similar process as W[l]

Batch Normalization

  Equations of Batch Normalization

The input value of any neuron in each layer of neural network was set to the standard normal distribution with mean of 0 and variance of 1

Softmax Regression

Softmax regression generalizes logistic regression to C classes.

Equations of softmax

     Loss function

ti is the actual value, yi here is yhat-i (softmaxed value)

  Derivate

Cost function 

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

"logits" and "labels" are expected to be shape(number of examples, num_classes)

logits = tf.transpose(ZL) labels = tf.transpose(Y)

TensorFlow Tutorial

Initializations

tf.Variable((y - y_hat)**2, name='loss')

tf.constant(39, name='y')

init = tf.global_variables_initializer()

session.run(init)

The loss variable will be initialized and ready to be computed.

x = tf.placeholder(tf.int64, name = 'x')

sess.run(2 * x, feed_dict = {x: 3})

A placeholder is an object whose value you can specify only later.

Operations

tf.add(..., ...) to do an addition

tf.matmul(..., ...) to do a matrix multiplication

tf.multiply(...,...) to do an element-wise multiplication

Functions

sigmoid = tf.sigmoid(x)

one_hot_matrix = tf.one_hot(labels,C,1)

"One Hot" encoding

One element of each column is "hot" (meaning set to 1)

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))

b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

A1 = tf.nn.relu(Z1)

to apply the ReLU activation

correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

Calculate the correct predictions

Others

X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T

To flatten the training images

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)

_ , minibatch_cost = sess.run([optimizer, cost],feed_dict={X:minibatch_X, Y: minibatch_Y})

 收藏

立即使用

HeapSort算法流程图

 收藏

立即使用

Prim 算法流程图图示

 收藏

立即使用

Coursera深度学习笔记

 收藏

立即使用

Unix文件系统的vi操作流程图

zoesa

职业：暂无

去主页





0 条评论

下一页

为你推荐

查看更多



qt Artificial Neural Networks