Pregunta:
Transformer - Add & Norm
Autor: Christian NRespuesta:
Layer Normalization = Output of the previous layer (from attention block) + Input Embedding (From the first step) Benefits - Faster training, Reduce Bias, Prevent weight explosion Types of Normalization - Batch & Layer normalization *Layer normalization is preferable for transformers, especially for Natural language processing tasks
0 / 5 (0 calificaciones)
1 answer(s) in total