首页  流程图  详情



 



深度神经网络Transformer模型结构图

2026-03-02 17:44:46   0  举报





详细绘制Transformer的Encoder和Decoder模块内部结构，包含Multi-Head Attention、Add & Norm、Feed Forward等组件。使用清晰的连接线展示数据流向，右侧添加文字注释解释各模块功能。适合教学或技术文档使用。

Transformer

大语言模型架构

模板推荐

作者其他创作

大纲/内容

Masked Multi-HeadAttention

Decoder

Input/Output Embedding将离散的token ID转换为连续的向量表示（通常512或1024维）。输入输出共享embedding矩阵可减少参数。

Feed Forward Networkspan style=\"font-size:13px; font-family:Helvetica; color:#000000; letter-spacing:0px;\

Add & Norm

Feed Forward

Add & Norm残差连接（Add）+ 层归一化（Layer Normalization）。残差连接帮助梯度流动，层归一化稳定训练过程。公式：LayerNorm(x + Sublayer(x))

Cross Attention (Encoder-Decoder Attention)Query来自Decoder下层，Key和Value来自Encoder输出。让解码器关注输入序列中最相关的部分。

Masked Multi-Head Attention带掩码的注意力机制，确保位置i只能看到位置i之前的信息，防止解码器"偷看"未来信息（自回归属性）。

残差连接（Skip Connection）

主数据流（自底向上）

Multi-Head Attention(Cross Attention)

数据流向

Nx (层数)Encoder和Decoder各堆叠N层（原论文N=6）。更多层数可学习更复杂的特征，但训练成本更高。

Linear + SoftmaxLinear将decoder输出映射到词表大小维度，Softmax转换为概率分布，选择概率最高的token作为输出。

Positional Encodingspan style=\"font-size:13px; font-family:Helvetica; color:#000000; letter-spacing:0px;\

Positional Encoding

Output Embedding

Softmax

Linear

span style=\"font-size:12px; font-family:Helvetica; color:#666666; letter-spacing:0px;\

Encoder-Decoder注意力连接

Transformer 模型架构

Input Embedding

Multi-Head Attentionspan style=\"font-size:13px; font-family:Helvetica; color:#000000; letter-spacing:0px;\

模块说明

Encoder

Output Probabilities

Multi-HeadAttention

Outputs (shifted right)

Inputs

 Collect

Get Started

卷积神经网络

 Collect

Get Started

神经网络结构图

 Collect

Get Started

卷积神经网络

 Collect

Get Started

BP神经网络模型结构图





0 条评论

下一页