Pregunta:
Transformer - Multi-Head Attention
Autor: Christian NRespuesta:
1) Concentenate all the attention heads 2) Multiply with a weight matrix W^o that was trained jointly with the model 3) The result should be the Z matrix that captures information from all the attention heads. We can send this forward to the FFNN.
0 / 5 (0 calificaciones)
1 answer(s) in total