๐Ÿ’ฌ NLP/PLM

๐Ÿ’ฌ NLP/PLM

[Paper Review] Don't Stop Pretraining: Adapt Language Models to Domains and Tasks(ACL 2020)

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks 0. Abstract ๋‹ค์–‘ํ•œ source์˜ text๋กœ pre-training์„ ์ˆ˜ํ–‰ํ•œ ๋ชจ๋ธ์€ ์˜ค๋Š˜๋‚  NLP์˜ ํ† ๋Œ€๋ฅผ ํ˜•์„ฑ pre-trained model์„ target task์˜ ๋„๋ฉ”์ธ์— ๋งž๊ฒŒ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ์—ฌ์ „ํžˆ ๋„์›€์ด ๋˜๋Š”์ง€ ํ™•์ธ 4๊ฐœ์˜ ๋„๋ฉ”์ธ(biomedical, computer science publications, news, reviews)๊ณผ 8๊ฐœ์˜ classification task๋ฅผ ํ†ตํ•ด study `domain-adaptive pretraining`์ด ๋ฆฌ์†Œ์Šค๊ฐ€ ๋งŽ์€ ํ™˜๊ฒฝ๊ณผ ์ ์€ ํ™˜๊ฒฝ ๋ชจ๋‘์—์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃจ์–ด๋ƒ„์„ ๋ณด์ž„ unlabeled data์— adapting ํ•˜๋Š” `..

๐Ÿ’ฌ NLP/PLM

[Paper Review] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and comprehension(2019)

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 0. Abstract sequence-to-sequence model์„ pre-trainingํ•˜๊ธฐ ์œ„ํ•œ denoising autoencoder, `BART`๋ฅผ ์ œ์•ˆ BART๋Š” ๋‘ ๊ฐ€์ง€ ๊ณผ์ •์„ ํ†ตํ•ด ํ•™์Šต ์ž„์˜์˜ `noising function`์„ ํ†ตํ•ด text๋ฅผ ์†์ƒ ์†์ƒ๋œ text๋ฅผ ํ†ตํ•ด original text๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋ฉฐ ๋ชจ๋ธ์„ ํ•™์Šต ๋‹จ์ˆœํ•จ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  BERT(bidirectional encoder๋กœ ์ธํ•ด), GPT(left-to-right decoder) ๋“ฑ ํ˜„์žฌ์˜ ์ตœ์‹  pre-training ์ฒด..

๐Ÿ’ฌ NLP/PLM

[Paper Review] GPT-2: Language Models are Multitask Learners(2019)

Language Models are Unsupervised Multitask Learners 0. Abstract Question Answering, Machine Translation, Reading Comprehension, Summarization ๋“ฑ์˜ NLP task๋“ค์€ task-specific dataset์„ ํ†ตํ•œ Supervised Learning์„ ํ™œ์šฉ ์ˆ˜๋ฐฑ ๋งŒ๊ฐœ์˜ webpage๋“ค๋กœ ๊ตฌ์„ฑ๋œ `WebText`๋ผ๋Š” dataset์„ ํ†ตํ•ด trainํ•  ๋•Œ Language Model์ด ๋ช…์‹œ์ ์ธ supervision ์—†์ด๋„ ์ด๋Ÿฌํ•œ task๋“ค์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์‹œ์ž‘ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆ Language Model์˜ ์šฉ๋Ÿ‰์€ `zero-shot` task transfer์˜ ์„ฑ๊ณต์— ๋งค์šฐ ํ•„์ˆ˜์ ์ด๋ฉฐ, ์ด๊ฒƒ์ด ๊ฐœ์„ ๋˜๋ฉด ์ž‘์—… ..

๐Ÿ’ฌ NLP/PLM

[Paper Review] RoBERTa: A Robustly Optimized BERT Pretraining Approach(2019)

RoBERTa: A Robustly Optimized BERT Pretraining Approach 0. Abstract Language Model์˜ Pre-training ๊ณผ์ •์€ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ง€๊ณ  ์™”์ง€๋งŒ, ๋‹ค์–‘ํ•œ approach ๊ฐ„์— ์‹ ์ค‘ํ•œ ๋น„๊ต ํ•„์š” ํ•™์Šต์€ ์ƒ๋‹นํ•œ ๊ณ„์‚ฐ ๋น„์šฉ์ด ํ•„์š”ํ•˜๊ณ , ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ private dataset์„ ํ†ตํ•ด ํ›ˆ๋ จํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ hyperparameter์— ๋Œ€ํ•œ ์„ ํƒ์ด ์ตœ์ข… ๊ฒฐ๊ณผ์— ์ปค๋‹ค๋ž€ ์˜ํ–ฅ์„ ๋ผ์นจ ๋ณธ ๋…ผ๋ฌธ์€ `BERT`์˜ ์—ฌ๋Ÿฌ key hyperparameter์™€ training data size์˜ ์˜ํ–ฅ๋ ฅ์„ ์‹ ์ค‘ํ•˜๊ฒŒ ์ธก์ •ํ•˜๋Š” replication study BERT์˜ ํ•™์Šต์ด ๋งค์šฐ ๋ถ€์กฑํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์œผ๋ฉฐ, BERT๋งŒ์œผ๋กœ ์ดํ›„ ๊ฐœ๋ฐœ๋œ ๋ชจ๋ธ๋“ค์˜ ์„ฑ๋Šฅ์„ ์ด๊ธธ ์ˆ˜ ์žˆ..

๐Ÿ’ฌ NLP/PLM

[Paper Review] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(NAACL 2019)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 0. Abstract `BERT(Bidirectional Encoder Representations from Transformer)`๋ผ๋Š” ์ƒˆ๋กœ์šด language representation model์„ ์ œ์‹œ ๋‹น์‹œ์— ๋‚˜์˜จ language represtation model๋“ค๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ BERT๋Š” ๋ชจ๋“  layer์—์„œ left/right context๋ฅผ ๋™์‹œ์— ๊ณ ๋ คํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ unlabeled text๋กœ๋ถ€ํ„ฐDeep Bidirectional Representation์„ pre-train ํ•˜๋„๋ก ์„ค๊ณ„ BERT๋Š” ์ถ”๊ฐ€์ ์ธ 1๊ฐœ์˜ output layer๋ฅผ ํ†ตํ•ด fine-tu..

๐Ÿ’ฌ NLP/PLM

[NLP] GPT(Generative Pre-Training of a Language Model)

Motivation `ELMo`์™€ ์•„์ด๋””์–ด๋Š” ๋™์ผ Unlabeled Text Corpus๋ฅผ ํ™œ์šฉํ•˜์—ฌ GPT๋ฅผ ํ†ตํ•ด `pre-training`์„ ๊ฑฐ์ณ embedding vector๋ฅผ ์ฐพ์•„๋‚ด๊ณ , specific task๋ฅผ ์œ„ํ•œ Labeled Text Corpus๋ฅผ ํ™œ์šฉํ•ด `fine-tuning`์„ ๊ฑฐ์ณ ์ด๋ฅผ ์ˆ˜ํ–‰ unlabeled text๋กœ๋ถ€ํ„ฐ word-level ์ด์ƒ์˜ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์›€ `transfer`์— ์œ ์šฉํ•œ text representation์„ ํ•™์Šตํ•˜๋Š”๋ฐ ์–ด๋– ํ•œ optimization objective๊ฐ€ ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ์ง€ ๋ถˆํ™•์‹ค ํ•™์Šต๋œ representation์„ target task์— transfer ํ•˜๋Š”๋ฐ ๋ชจ๋ธ ์•„ํ‚คํ…์ณ์— task-specificํ•œ ๋ณ€ํ™”๋ฅผ ํ•˜๋Š” ๊ฒƒ, intricate l..

๐Ÿ’ฌ NLP/PLM

[NLP] ELMo(Embeddings from Language Models)

Pre-trained word representation Pre-trained word respresentation์€ ๋งŽ์€ neural language understanding model์—์„œ ์ค‘์š”ํ•œ ์š”์†Œ ๋†’์€ ํ’ˆ์งˆ์˜ representation์€ 2๊ฐ€์ง€๋ฅผ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ ๋‹จ์–ด์˜ ๋ณต์žกํ•œ ํŠน์„ฑ(ex> syntax, semantic) ๋‹จ์–ด๋“ค์ด linguistic context ์ƒ์—์„œ ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ์‚ฌ์šฉ๋  ๋•Œ, ์‚ฌ์šฉ๋ฒ•์— ๋งž๋Š” representation์„ ํ‘œํ˜„ "๋ˆˆ"์ด๋ผ๋Š” ๋‹จ์–ด๋Š” "eye", "snow"๋กœ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•œ๋ฐ ์ด์— ๋งž๊ฒŒ embedding์ด ๋‹ฌ๋ผ์•ผ ํ•จ ELMo(Embeddings from Language Models)์˜ ํŠน์ง• ๊ธฐ์กด์— ๋‹จ์–ด์— ์ง‘์ค‘ํ–ˆ๋˜ ๊ฒƒ์—์„œ ๋ฒ—์–ด๋‚˜ ์ „์ฒด input sentence๋ฅผ ๊ณ ..

Junyeong Son
'๐Ÿ’ฌ NLP/PLM' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก