BertLayer
TokenEmbedding
BertAttention
BertIntermediateFeed Forward
SegmentEmbedding
BertSelfOutputLayer Normal & Dropout
Layer Normal & Dropout
BertEncoder
Downstream Tasks
BertOutputFeed ForwardLayer Normal & Dropout
Input Embedding
Position Ids
Input Ids
× N
Transformer Blocks
Token Type Ids
PositionalEmbedding
BertSelfAttentionMulti-head Attention