๐Ÿง  Deep Learning

๐Ÿง  Deep Learning

[DL] ์„ฑ๋Šฅ ์ตœ์ ํ™” - Batch Normalization, Dropout, Early Stopping

์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ์„ฑ๋Šฅ ์ตœ์ ํ™” ์ผ๋ฐ˜์ ์œผ๋กœ ML/DL ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ์–‘์ด ๋งŽ์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅํ•œ ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์—†๋‹ค๋ฉด ์ง์ ‘ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด ์‚ฌ์šฉ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ(0~1์˜ ๊ฐ’), ํ•˜์ดํผ๋ณผ๋ฆญ ํƒ„์  ํŠธ(-1~1์˜ ๊ฐ’) ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹ ๋ฒ”์œ„๋ฅผ ์กฐ์ • ์ •๊ทœํ™”, ๊ทœ์ œํ™”, ํ‘œ์ค€ํ™” ๋“ฑ๋„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋„์›€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ ์„ฑ๋Šฅ ์ตœ์ ํ™” ML/DL์„ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ์œ ์‚ฌํ•œ ์šฉ๋„์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์„ ํƒํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผœ ๋ณด๊ณ  ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠœ๋‹์„ ์œ„ํ•œ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ชจ๋ธ์„ ํ•˜๋‚˜ ์„ ํƒํ•˜์—ฌ ํ›ˆ๋ จ์‹œํ‚ค๋ ค๋ฉด ๋‹ค์–‘ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ํ›ˆ๋ จ์‹œํ‚ค๊ณ  ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋„์ถœํ•ด์•ผ ํ•œ๋‹ค. ์ง„๋‹จ : ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์–ด๋Š ์ˆœ๊ฐ„ ๋ฉˆ์ท„์„..

๐Ÿง  Deep Learning

[DL] ํ•™์Šต ๊ด€๋ จ ๊ธฐ์ˆ ๋“ค - SGD, Momentum, AdaGrad, Adam, optimization, Xavier, He, Batch Normalization

๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹  ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(SGD) ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ๋ชฉ์ ์€ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ’์„ ๊ฐ€๋Šฅํ•œ ํ•œ ๋‚ฎ์ถ”๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์œผ๋กœ ์ด๋Š” ๊ณง ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ตœ์ ๊ฐ’์„ ์ฐพ๋Š” ๋ฌธ์ œ์ด๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ฒƒ์„ `์ตœ์ ํ™”(optimization)`์ด๋ผ๊ณ  ํ•œ๋‹ค. ์ตœ์ ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๋‹จ์„œ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ(๋ฏธ๋ถ„)๋ฅผ ์ด์šฉํ•˜๋Š”๋ฐ, ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•ด ๊ธฐ์šธ์–ด์ง„ ๋ฐฉํ–ฅ์œผ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ๊ฐฑ์‹ ํ•˜๋Š” ์ผ์„ ๋ฐ˜๋ณตํ•ด์„œ ์ตœ์ ์˜ ๊ฐ’์„ ํ–ฅํ•ด ๋‹ค๊ฐ€๊ฐ€๋Š” ๊ฒƒ์ด `ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(SGD)`์ด๋‹ค. SGD๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ฒฐ๊ตญ SGD๋Š” ๊ธฐ์šธ์–ด์ง„ ๋ฐฉํ–ฅ์œผ๋กœ ์ผ์ • ๊ฑฐ๋ฆฌ๋งŒ ๊ฐ€๊ฒ ๋‹ค๋Š” ๋‹จ์ˆœํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. SGD๋ฅผ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. class SGD: def __init__(self, lr=0.01): se..

๐Ÿง  Deep Learning/RNN

[RNN] Seq2seq Learning - Encoder & Decoder, Attention, Feedforward Neural Network

Sequence-to-sequence model `Seq2Seq` ๋ชจ๋ธ์€ words, letters, features of images ๋“ฑ์˜ sequence data๋ฅผ Inputs์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ Outputs ๋˜ํ•œ ๋˜๋‹ค๋ฅธ sequence data์ด๋‹ค. ์—ฌ๊ธฐ์„œ ์ž…๋ ฅ์— ์‚ฌ์šฉํ•˜๋Š” sequence์— ํ•ด๋‹นํ•˜๋Š” item์˜ ๊ฐœ์ˆ˜์™€ ์ถœ๋ ฅ์˜ sequence์— ํ•ด๋‹นํ•˜๋Š” item์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋™์ผํ•  ํ•„์š”๋Š” ์—†๋‹ค. ์ด๋Ÿฌํ•œ sequence-to-sequence ๋ชจ๋ธ์€ ๋ฒˆ์—ญ ๋จธ์‹ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋ฉฐ ์ด ๊ฒฝ์šฐ sequence๋Š” ๋‹จ์–ด๋“ค๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, output ๋˜ํ•œ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋‹จ์–ด๋“ค๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. Encoder-Decoder Seq2Seq ๋ชจ๋ธ์€ `Encoder`์™€ `Decoder`๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ๊ฐ๊ฐ์˜ ์—ญํ• ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. Encoder : in..

๐Ÿง  Deep Learning/RNN

[RNN] ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง - RNN, Vanilla RNN, encoder-decoder, BPTT, LSTM, GRU, Attention

๊ธฐ์–ต์„ ๊ฐ–๋Š” ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ RNN ๊ธฐ์–ต์„ ์ „๋‹ฌํ•˜๋Š” ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง ์‹œ๊ฐ„๊ณผ ๊ณต๊ฐ„์  ์ˆœ์„œ ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ `์ˆœ์ฐจ ๋ฐ์ดํ„ฐ(sequence data)`๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋Š” ์‹œ๊ณต๊ฐ„์˜ ์ˆœ์„œ ๊ด€๊ณ„๋กœ ํ˜•์„ฑ๋˜๋Š” ๋ฌธ๋งฅ ๋˜๋Š” `์ฝ˜ํ…์ŠคํŠธ(context)`๋ฅผ ๊ฐ–๋Š”๋‹ค. ํ˜„์žฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•  ๋•Œ ์•ž๋’ค์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ดํŽด๋ณด๋ฉด์„œ ์ฝ˜ํ…์ŠคํŠธ๋ฅผ ํŒŒ์•…ํ•ด์•ผ ํ˜„์žฌ ๋ฐ์ดํ„ฐ์˜ ์—ญํ• ์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. ์ธ๊ณต ์‹ ๊ฒฝ๋ง์ด ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์ฝ˜ํ…์ŠคํŠธ๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด ๋ฐ์ดํ„ฐ์˜ ์ˆœ์ฐจ ๊ตฌ์กฐ๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๊ณ , ๋ฐ์ดํ„ฐ์˜ ์ฝ˜ํ…์ŠคํŠธ ๋ฒ”์œ„๊ฐ€ ๋„“๋”๋ผ๋„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. ์ด๋Ÿฐ ์ ๋“ค์„ ๊ณ ๋ คํ•˜์—ฌ ๋งŒ๋“  ์ธ๊ณต ์‹ ๊ฒฝ๋ง์ด ๋ฐ”๋กœ `์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN: Recurrent Neural Network)`์ด๋‹ค. ์ˆœ๋ฐฉํ–ฅ ์‹ ๊ฒฝ๋ง์ด๋‚˜ ์ปจ๋ฒŒ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ์ˆœํ™˜ ..

๐Ÿง  Deep Learning/RNN

[RNN] RNN, LSTM and GRU

์ˆœํ™˜ ์‹ ๊ฒฝ๋ง ํ•™์Šต ํ•™์Šต์ด๋ž€ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด parameters๋ฅผ ์ถ”์ •ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์€ t์‹œ์ ๊นŒ์ง€์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ y๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ํ•™์Šต ๋Œ€์ƒ์ด ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 3๊ฐ€์ง€์ด๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” t ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜์˜ํ•œ W_xh ๊ฐ€์ค‘์น˜, ๋‘ ๋ฒˆ์งธ๋Š” t ์‹œ์  ์ด์ „์˜ ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” W_hh ๊ฐ€์ค‘์น˜, ๊ทธ๋ฆฌ๊ณ  t ์‹œ์ ์˜ y๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ ํ™œ์šฉํ•˜๋Š” W_hy ๊ฐ€์ค‘์น˜์ด๋‹ค. ํ•ด๋‹น ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋งค ์‹œ์ ๋งˆ๋‹ค ๊ณต์œ ํ•˜๋Š” ๊ตฌ์กฐ์ด๋ฉฐ(parameter sharing), ๋งค ์‹œ์  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฐ’์ด ๊ฐ™๋‹ค. ๋˜ํ•œ ์ตœ์ ์˜ W๋Š” W๋ฅผ ๋งค ์‹œ์  ์ ์šฉํ–ˆ์„ ๋•Œ Loss๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” W์ด๋‹ค. hidden state์™€ ์˜ˆ์ธก๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋œ๋‹ค. ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์ด 3๊ฐ€์ง€์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ถ”๋ก ํ•˜๋Š” ํ•™์Šต๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. Los..

๐Ÿง  Deep Learning/RNN

[RNN] Recurrent Neural Networks and Attention(Introduction)

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์˜ˆ์ธก ๋ถ„์„ ๋ฐฉ๋ฒ•๋ก  ํŠธ๋ Œ๋“œ `์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ(Time Series Data)`๋ž€, ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ์ˆœ์„œ๋Œ€๋กœ ๊ด€์ธก๋˜์–ด ์‹œ๊ฐ„์˜ ์˜ํ–ฅ์„ ๋ฐ›๊ฒŒ ๋˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋งํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์—๋Š” `์‹œ๊ณ„์—ด ๋‹จ๋ณ€๋Ÿ‰ ๋ฐ์ดํ„ฐ(Univariate time series data)`, `์‹œ๊ณ„์—ด ๋‹ค๋ณ€๋Ÿ‰ ๋ฐ์ดํ„ฐ(Multivariate time series data)`, `์‹œ๊ณ„์—ด ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ(Time series image data)` ๋“ฑ์ด ์žˆ๋‹ค. ์ „ํ†ต ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐฉ๋ฒ•๋ก  ์ด๋™ํ‰๊ท ๋ฒ•(Moving average) ์ง€์ˆ˜ํ‰ํ™œ๋ฒ•(Exponential smoothing) ARIMA(Autoregressive integrated moving average) ๋ชจ๋ธ SARIMA(Seasonal ARIMA) ๋ชจ๋ธ ..

๐Ÿง  Deep Learning

[DL] ์ดˆ๊ธฐํ™”์™€ ์ •๊ทœํ™” - Xavier Initialization, He Initialization, batch normalization, weight decay, early stopping, data augmentation, bagging, Dropout

๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”(Weight Initialization) ์‹ ๊ฒฝ๋ง์„ ํ•™์Šตํ•  ๋•Œ ์†์‹ค ํ•จ์ˆ˜์—์„œ ์ถœ๋ฐœ ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋ชจ๋ธ `์ดˆ๊ธฐํ™”(initialization)`์ด๋‹ค. ํŠนํžˆ ๊ฐ€์ค‘์น˜๋Š” ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ๊ฐ€์žฅ ํฐ ๋น„์ค‘์„ ์ฐจ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€์ค‘์น˜์˜ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ํ•™์Šต ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค. ์ƒ์ˆ˜ ์ดˆ๊ธฐํ™” ์‹ ๊ฒฝ๋ง์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ชจ๋‘ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜์—ฌ ๋‰ด๋Ÿฐ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ 0์ด๋ฉด ๊ฐ€์ค‘ ํ•ฉ์‚ฐ ๊ฒฐ๊ณผ๋Š” ํ•ญ์ƒ 0์ด ๋˜๊ณ , ํ™œ์„ฑ ํ•จ์ˆ˜๋Š” ๊ฐ€์ค‘ ํ•ฉ์‚ฐ ๊ฒฐ๊ณผ์ธ 0์„ ์ž…๋ ฅ๋ฐ›์•„์„œ ๋Š˜ ๊ฐ™์€ ๊ฐ’์„ ์ถœ๋ ฅํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ™œ์„ฑ ํ•จ์ˆ˜๊ฐ€ ReLU๋‚˜ ํ•˜์ดํผ๋ณผ๋ฆญ ํƒ„์  ํŠธ๋ฉด ์ถœ๋ ฅ์€ 0์ด ๋˜๊ณ  ์‹œ๊ทธ๋ชจ์ด๋“œ๋ฉด ์ถœ๋ ฅ์€ ํ•ญ์ƒ 0.5๊ฐ€ ๋œ๋‹ค. 0์ด ์•„๋‹Œ ๋‹ค๋ฅธ ๊ฐ’์˜ ๊ฒฝ์šฐ์—๋„ ๋งŒ์•ฝ ๊ฐ€์ค‘์น˜๋ฅผ ๋ชจ๋‘ ๊ฐ™์€ ์ƒ์ˆ˜๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋ฉด ์‹ ๊ฒฝ๋ง์— `๋Œ€์นญ์„ฑ(symmetry)`์ด ์ƒ๊ฒจ..

๐Ÿง  Deep Learning

[DL] ์ตœ์ ํ™” - Stochastic Gradient Descent, SGD Momentum, overshooting, Nesterov Momentum, AdaGrad, RMSProp, Adam

ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์€ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ณก๋ฉด์—์„œ ๊ฒฝ์‚ฌ๊ฐ€ ๊ฐ€์žฅ ๊ฐ€ํŒŒ๋ฅธ ๊ณณ์œผ๋กœ ๋‚ด๋ ค๊ฐ€๋‹ค ๋ณด๋ฉด ์–ธ์  ๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ ์ง€์ ์— ๋„๋‹ฌํ•œ๋‹ค๋Š” ๊ฐ€์ •์œผ๋กœ ๋งŒ๋“ค์–ด์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ฐ€์ •์ด ๋‹จ์ˆœํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์— ์ž˜ ๋Œ€์ฒ˜ํ•˜์ง€ ๋ชปํ•˜๊ณ  ํ•™์Šต ์†๋„๋„ ๋Š๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ๊ณ ์ •๋œ ํ•™์Šต๋ฅ  ํ•™์Šต๋ฅ ์ด๋ž€ ์ตœ์ ํ™”ํ•  ๋•Œ ํ•œ ๊ฑธ์Œ์˜ ํญ์„ ๊ฒฐ์ •ํ•˜๋Š” ์Šคํ…์˜ ํฌ๊ธฐ๋ฅผ ๋งํ•˜๋ฉฐ ํ•™์Šต ์†๋„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์€ ์ง€์ •๋œ ํ•™์Šต๋ฅ ์„ ์‚ฌ์šฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฝํ—˜์ ์œผ๋กœ ํ•™์Šต๋ฅ ์„ ์กฐ์ •ํ•  ์ˆ˜๋ฐ–์— ์—†์œผ๋ฉฐ, ์ด๋Š” ํšจ์œจ์ ์ด์ง€ ์•Š๊ณ  ์ตœ์ ์˜ ํ•™์Šต๋ฅ ์„ ์ •ํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ๋˜ํ•œ ํ•™์Šต๋ฅ ์ด ๊ณ ์ •๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์ ํ™”๊ฐ€ ๋น„ํšจ์œจ์ ์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค. ํ•™์Šต๋ฅ ์ด ๋ณ€ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ฒ˜์Œ์—๋Š” ํฐ ํญ์œผ๋กœ ์ด๋™ํ•˜๋‹ค๊ฐ€ ์ตœ์ ํ•ด์— ๊ฐ€๊นŒ์›Œ์งˆ์ˆ˜๋ก ์ด๋™ ํญ์„ ์ค„์—ฌ์„œ ์•ˆ์ •..

๐Ÿง  Deep Learning

[DL] ์‹ ๊ฒฝ๋ง ํ•™์Šต - model parameter, optimization, loss function, cost function, Gradient Descent, chain rule, backpropagation, minibatch, cross-entropy

์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ์˜๋ฏธ ์‹ ๊ฒฝ๋ง์—๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์™€ ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๊ณต๋  ๋ฟ, ์ถ”๋ก ์„ ์œ„ํ•œ ๊ทœ์น™์€ ์ œ๊ณต๋˜์ง€ ์•Š๋Š”๋‹ค. ์‹ ๊ฒฝ๋ง์„ `ํ•™์Šต(learning)`ํ•œ๋‹ค๋Š” ๊ฒƒ์€ ์ด ๊ทœ์น™์„ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด์„œ ์Šค์Šค๋กœ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ธฐ๋Œ€ํ•˜๋Š” ์ •๋‹ต์ด ๋“ค์–ด์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅํ•˜๋‹ค. ์‹ ๊ฒฝ๋ง์— ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ์–ด๋–ค ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ• ์ง€๋ฅผ ์ •ํ•˜๋Š” ๊ทœ์น™์€ ํ•จ์ˆ˜์  ๋งคํ•‘ ๊ด€๊ณ„๋กœ ํ‘œํ˜„๋œ๋‹ค. ๊ฐ€์ค‘ ํ•ฉ์‚ฐ๊ณผ ํ™œ์„ฑ ํ•จ์ˆ˜๊ฐ€ ์—ฐ๊ฒฐ๋˜์–ด ๋‰ด๋Ÿฐ์„ ๊ตฌ์„ฑํ•˜๊ณ , ๋‰ด๋Ÿฐ์ด ๋ชจ์—ฌ ๊ณ„์ธต์„ ๊ตฌ์„ฑํ•˜๋ฉฐ, ๊ณ„์ธต์ด ์Œ“์—ฌ์„œ ์‹ ๊ฒฝ๋ง์˜ ๊ณ„์ธต ๊ตฌ์กฐ๊ฐ€ ์ •์˜๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ณต์žกํ•œ ์‹ ๊ฒฝ๋ง์˜ ๊ณ„์ธต ๊ตฌ์กฐ ์ž์ฒด๊ฐ€ ์‹ ๊ฒฝ๋ง์˜ ํ•จ์ˆ˜์  ๋งคํ•‘ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต ๊ณผ์ •์—์„œ ํ•จ์ˆ˜์  ๋งคํ•‘ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ „์ฒด ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ์ฐพ์•„์•ผ ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์˜ ..

๐Ÿง  Deep Learning

[DL] ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•(backpropagation) - forward propagation, backward propagation, chain rule, affine transformation, softmax with cross entropy error, gradient check

์‹ ๊ฒฝ๋ง ํ•™์Šต์—์„œ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋Š” ์ˆ˜์น˜ ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐํ–ˆ๋‹ค. ์ˆ˜์น˜ ๋ฏธ๋ถ„์€ ๋‹จ์ˆœํ•˜๊ณ  ๊ตฌํ˜„ํ•˜๊ธฐ๋„ ์‰ฝ์ง€๋งŒ ๊ณ„์‚ฐ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฐ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด์— ๋น„ํ•ด `์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•(backpropagation)`์€ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ `๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„(computational graph)`๋Š” ๊ณ„์‚ฐ ๊ณผ์ •์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. ์ด๋Š” ๋ณต์ˆ˜์˜ `๋…ธ๋“œ(node)`์™€ ์—์ง€(edge)`๋กœ ํ‘œํ˜„๋˜๋ฉฐ, ๋…ธ๋“œ ์‚ฌ์ด์˜ ์ง์„ ์„ ์—์ง€๋ผ๊ณ  ํ•œ๋‹ค. ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ๋ฌธ์ œํ’€์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ๋ฆ„์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค. ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค. ๊ทธ๋ž˜ํ”„์—์„œ ๊ณ„์‚ฐ์„ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ 2๋ฒˆ์งธ ๋‹จ๊ณ„์ธ ๊ทธ๋ž˜ํ”„์—์„œ ๊ณ„์‚ฐ์„ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ง„ํ–‰ํ•˜๋Š” ๋‹จ๊ณ„๋ฅผ `์ˆœ์ „ํŒŒ(forward..

Junyeong Son
'๐Ÿง  Deep Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก