8 hours of sleep
decaffeination
8 hours of sleep
decaffeination
트윈스타정 40/5mg,
자몽(쥬스) 랑 복용 하면 안되고
태아에 안 좋고
나를 어지럽거나 실신 하게 만들 수 있는 까다로운 친구지만
기왕 만난거 잘 지내 보자.
a = \
1 # multiple line assignments may cause warning messages.
a = 1 # after I modified all the lines into this format then the warning messages have gone.
Linhao Dong, Shuang Xu, Bo Xu, Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition, ICASSP 2018
Attention Penalty (from Speech Transformer paper):
In addition, we encouraged the model attending to closer positions by adding a bigger penalty on the attention weights of more distant position-pairs.
There is no more specific description about attention penalty.
This is my imagination, adding negative value for non-diagonal elements on scaled_attention_logits except for the first multi-head attention in decoders.
I have no concrete idea about the attention penalty the authors explained.