优化算法
2022-07-22 09:30:08 0 举报
AI智能生成
指数加权平均、momentom、RMSProp、Adam优化算法
作者其他创作
大纲/内容
指数加权平均数
<span class="equation-text" contenteditable="false" data-index="0" data-equation="v_t=\beta*(v_{t-1})+(1-\beta)\theta_t"><span></span><span></span></span>
目的:得到移动平均值,不需要精确计算
原理:
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{100}=0.9(V_{99})+0.1\theta_{100}"><span></span><span></span></span>
<span class="equation-text" data-index="0" data-equation="V_{100}=0.9(0.9*v_{98}+0.1*\theta_{99})+0.1\theta_{100}" contenteditable="false"><span></span><span></span></span><br>
<span class="equation-text" data-index="0" data-equation="V_{100}=0.1\theta_{100}+0.1*(0.9)\theta_{99}+0.1*(0.9^2)\theta_{98} + ....." contenteditable="false"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\beta"><span></span><span></span></span>值越大相当于平均的天数更多,前几天不准,所以有偏差修正
偏差修正:
<span class="equation-text" data-index="0" data-equation="v_t=\beta*(v_{t-1})+(1-\beta)\theta_t" contenteditable="false"><span></span><span></span></span><br>
在基础上再计算
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\frac{V_t}{1-\beta^t}"><span></span><span></span></span>
momentom
思想:每次不是光用mini-batch的梯度,而是平均梯度
公式
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{d_{w}} = \beta*V_{d_{w}}+(1-\beta)dW"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{d_{b}} = \beta*V_{d_{b}}+(1-\beta)db"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="W = W-\alpha*V_{d_w}"><span></span><span></span></span>
<span class="equation-text" data-index="0" data-equation="b = b-\alpha*V_{d_b}" contenteditable="false"><span></span><span></span></span>
RMSProp
思想:对于权重的不同维度做了一个归一化
公式
<span class="equation-text" contenteditable="false" data-index="0" data-equation="S_{d_{w}} = \beta*S_{d_{w}}+(1-\beta)dW^2"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="S_{d_{b}} = \beta*S_{d_{b}}+(1-\beta)db^2"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="W = W-\alpha*\frac{V_{d_w}}{\sqrt(S_{dw})}"><span></span><span></span></span>
<span class="equation-text" data-index="0" data-equation="b = b-\alpha*\frac{V_{d_b}}{\sqrt(S_{db})}" contenteditable="false"><span></span><span></span></span>
adam
公式
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{d_{w}} = \beta_1*V_{d_{w}}+(1-\beta_1)dW"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\beta_1"><span></span><span></span></span>一般为0.9
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{d_{b}} = \beta_1*V_{d_{b}}+(1-\beta_1)db"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="S_{d_{w}} = \beta_2*S_{d_{w}}+(1-\beta_2)dW^2"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\beta_2"><span></span><span></span></span>一般为0.999
<span class="equation-text" contenteditable="false" data-index="0" data-equation="S_{d_{b}} = \beta*S_{d_{b}}+(1-\beta)db^2"><span></span><span></span></span>
偏差修正
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{dw}^{correct} = \frac{V_{dw}}{1-\beta_1^t}"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="V_{db}^{correct} = \frac{V_{db}}{1-\beta_1^t}"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="S_{dw}^{correct} = \frac{S_{dw}}{1-\beta_2^t}"><span></span><span></span></span>
<span class="equation-text" contenteditable="false" data-index="0" data-equation="S_{db}^{correct} = \frac{S_{db}}{1-\beta_2^t}"><span></span><span></span></span>
梯度下降
<span class="equation-text" contenteditable="false" data-index="0" data-equation="W = W-\alpha*\frac{V_{d_w}^{correct}}{\sqrt{S_{dw}^{correct}}+\epsilon}"><span></span><span></span></span>
<span class="equation-text" data-index="0" data-equation="b = b-\alpha*\frac{V_{d_b}^{correct}}{\sqrt{S_{db}^{correct}}+\epsilon}" contenteditable="false"><span></span><span></span></span>
鞍点 局部最优点
鞍点:w某些方向局部最高,某些方向局部最低<br>
鞍点多
局部最优点:所有方向局部最低
局部最优点少
学习率调整
等间隔调整学习率 StepLR<br>
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
Assuming optimizer uses lr = 0.05 for all groups<br>lr = 0.05 if epoch < 30<br>lr = 0.005 if 30 <= epoch < 60<br>lr = 0.0005 if 60 <= epoch < 90
按需调整学习率 MultiStepLR<br>
scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
lr = 0.05 if epoch < 30<br>lr = 0.005 if 30 <= epoch < 80<br>lr = 0.0005 if epoch >80
指数衰减调整学习率 ExponentialLR<br>
torch.optim.lr_scheduler.ExponentialLR
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\alpha*\gamma^{epoch}"><span></span><span></span></span>
自适应调整学习率 ReduceLROnPlateau<br>
自定义指标更新学习率(可用正确率或者loss)
自定义调整学习率 LambdaLR<br>
自定义函数更新学习率
余弦退火调整学习率 CosineAnnealingLR<br>
余弦函数
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \\ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned}"><span></span><span></span></span>
CosineAnnealingWarmRestarts
<span class="equation-text" contenteditable="false" data-index="0" data-equation="\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)"><span></span><span></span></span>
0 条评论
下一页