๐Ÿ’ฌ NLP/Attention & Transformer

๐Ÿ’ฌ NLP/Attention & Transformer

[Paper Review] Attention Is All You Need(NIPS 2017)

Attention Is All You Need(NIPS 2017) 0. Abstract Sequence transduction ๋ชจ๋ธ์€ `RNN` ํ˜น์€ `CNN`์— ๊ธฐ๋ฐ˜ํ•˜๋ฉฐ `Encoder-Decoder` ๊ตฌ์กฐ๋ฅผ ํฌํ•จ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•œ ๋ชจ๋ธ๋“ค๋„ Encoder์™€ Decoder๋ฅผ `Attention mechanism`์œผ๋กœ ์—ฐ๊ฒฐ ๋ณธ ๋…ผ๋ฌธ์€ RNN๊ณผ CNN์„ ์™„์ „ํžˆ ๋ฐฐ์ œํ•œ, ์˜ค์ง Attention mechanism์— ๊ธฐ๋ฐ˜ํ•œ ์ƒˆ๋กœ์šด ๋„คํŠธ์›Œํฌ `Transformer`๋ฅผ ์ œ์•ˆ Transformer๋Š” ๋ณ‘๋ ฌํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ›ˆ๋ จ์— ์ ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋จ๊ณผ ๋™์‹œ์— ๋‘ ๊ฐœ์˜ ๋ฒˆ์—ญ task์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„ WMT 2014 English-German Translation task์—์„œ 28.4 BLEU์„ ๊ธฐ๋กํ•˜์—ฌ ์ตœ๊ณ ์˜ ..

๐Ÿ’ฌ NLP/Attention & Transformer

[Paper Review] Effective Approaches to Attention-based Neural Mahine Translation(EMNLP 2015)

Effective Approaches to Attention-based Neural Machine Translation(EMNLP 2015) 0. Abstract `Attention` mehanism์€ ๋ฒˆ์—ญ ๊ณผ์ •์—์„œ source sentence๋ฅผ ์„ ํƒ์ ์œผ๋กœ focusing ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ NMT(Neural Machine Translation)๋ฅผ ๊ฐœ์„ ์‹œํ‚ค๋Š” ๋ฐ ์‚ฌ์šฉ๋จ ๊ทธ๋Ÿฌ๋‚˜ NMT ๋ถ„์•ผ์—์„œ ๋”์šฑ ํšจ์œจ์ ์œผ๋กœ attention์„ ์‚ฌ์šฉํ•˜๋Š” architecture๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ์ž‘์—…์€ ๊ฑฐ์˜ ์—†์—ˆ์Œ 2๊ฐœ์˜ ๊ฐ„๋‹จํ•˜๊ณ  ํšจ๊ณผ์ ์ธ Attention Mechanism์„ ์ œ์‹œ ํ•ญ์ƒ ๋ชจ๋“  source word๋ฅผ ํ™œ์šฉํ•˜๋Š” `global` attentional model ํ•œ ๋ฒˆ์— source word์˜ subset๋งŒ ํ™œ์šฉํ•˜๋Š” `loc..

๐Ÿ’ฌ NLP/Attention & Transformer

[Paper Review] ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(ICLR 2021)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(ViT)(ICLR 2021) Abstract `Transformer` ๊ตฌ์กฐ๊ฐ€ NLP task์—์„œ ์‚ฌ์‹ค์ƒ ๊ธฐ์ค€์ด ๋˜๋Š” ๋™์•ˆ Computer Vision์—์„œ์˜ ์‘์šฉ์€ ์ œํ•œ์  Vison์—์„œ `Attention`์€ Convolutional networks์™€ ํ•จ๊ป˜ ์ ์šฉํ•˜๊ฑฐ๋‚˜ ์ „์ฒด ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ Convolution Network์˜ ํŠน์ • ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋Œ€์ฒดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ CNN์— ์˜์กดํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ Image patches์˜ sequences์— ์ง์ ‘ ์ ์šฉ๋œ pure transformer๊ฐ€ Image classification task์—์„œ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„ ๋Œ€๋Ÿ‰์˜ ๋ฐ์ด..

๐Ÿ’ฌ NLP/Attention & Transformer

[NLP] ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) - Self-Attention, Multi-Head Attention, Feed Forward Neural Network, Residual Connection, Layer Normalization

Transformer `Transformer`๋Š” `Attention`์„ ์‚ฌ์šฉํ•˜๋ฉด์„œ ํ•™์Šต๊ณผ ๋ณ‘๋ ฌํ™”๋ฅผ ์‰ฝ๊ฒŒ ํ•˜์—ฌ ์†๋„๋ฅผ ๋†’์ธ ๋ชจ๋ธ์ด๋‹ค. Transformer๋Š” `Seq2Seq` ๋ชจ๋ธ๊ณผ ๊ฐ™์ด ์ˆœ์ฐจ์ ์œผ๋กœ input ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ด๋ฅผ ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌํ•œ๋‹ค.Transformer ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ์ ์ธ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. Transformer ๋ชจ๋ธ์„ signle black box์™€ ๊ฐ™์ด ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ์ด๋Š” `RNN` ๊ธฐ๋ฐ˜์˜ `Encoder-Decoder` ๊ตฌ์กฐ์™€ Input๊ณผ Output์€ ๋™์ผํ•˜๋‹ค. ๋˜ํ•œ Transformer๋ฅผ ์ž์„ธํžˆ ๋“ค์—ฌ๋‹ค ๋ณด๋ฉด Encoding component์™€ Decoding Component๊ฐ€ ๋”ฐ๋กœ ์กด์žฌํ•˜๋ฉฐ, ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ์—ฐ๊ฒฐํ•˜๋Š”์ง€๊ฐ€ ๊ฒฐ๊ตญ RNN ๊ตฌ์กฐ์™€์˜ ์ฐจ์ด์ ์ด๋‹ค. Encod..

Junyeong Son
'๐Ÿ’ฌ NLP/Attention & Transformer' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก