NLP ๋ถ์ผ์์ ๋ฅ๋ฌ๋์ ๊ณ ๊ธ ์์ฉ
- DMN
-
Ask Me Anything
- attention score layer
- story layer
- episodic memory layer
- answer layer
ํ ์คํธ ์๋ ์์ฑ
์์ ๋ฌธ์ฅ: I love you very much
-
๋ฌธ์ ๋จ์์ ์๊ณ์ด ๋ฐ์ดํฐ ์์ฑ
-
LSTM์ผ๋ก ํ์ต
- I love you๋ฅผ ๋ฃ์ด๋ v๊ฐ ๋์ค๊ฒ๋ ์ ๊ฒฝ๋ง ์ activation์ softmax ํจ์๋ฅผ ์ด๋ค.
- compile ์, loss ํจ์๋ categorical_crossentropy ์ฌ์ฉ
-
์์:
- ๋ฌธ์ฅ์ผ๋ก ์ด๋ฃจ์ด์ง raw data ๋ถ๋ฌ์ค๊ธฐ
- ์ ์ฒ๋ฆฌ
- word ์๋๊ณ character ๋จ์๋ก ๋ถ๋ฅ
- ์๊ณ์ด x data ์์ฑ. (์ ์์์ x ์ฒ๋ผ)
-
Converting indices into vectorized format
- X, Y๋ฅผ np.zeros
- Model Building
- softmax: ์ถ๋ ฅ์ธต์ ๊ฐ์ด [0.3, 0.4, 0.8] ๋ฑ์ผ๋ก ๋์ค๋ฉด ์ด ์ดํฉ์ด 1์ด ๋์ค๊ฒ๋ ํ๋ฅ ๋ถํฌ ๋ค์ ๊ณ์ฐ. ์ด๋ ๋์จ ๊ฐ๋ค์ ์ฐจ์ด๋ฅผ ๋ ํฌ๊ฒ ์กฐ์ํ๊ณ ์ถ์ ๋, ๋ฒ ํ๊ฐ ๋ค์ด๊ฐ ์์ ์ฌ์ฉํ์ฌ ๊ณ์ฐ. ์ด๋ ๊ฒ ํ๋ฉด model.predict(x) ์, ์ํ๋ ๋ฌธ์์ ์์น๊ฐ ๋์ฌ ํ๋ฅ ์ด ๋์์ง๋ค (์ญ์ผ๋ก ์ํ์ง ์๋ ๋จ์ด๊ฐ ๋์ฌ ํ๋ฅ ์ ์ ์ด์ง๋ค.)
- softmax ํจ์๋ฅผ ์ฐ๋ skip-gram์ ๊ณ์ฐ๋์ด ๋ง๋จ ๋จ์ ์ด ์๋๋ฐ, ์ด๋ฅผ SGNS๊ฐ ๋ณด์ํ๋ค.
- ์์ธก์น๋ฅผ softmax ํ๋ฅ ๋ก ๋ฝ์ ๋ค์ ์ญ์ฐ์ฐ(exp) ํ๋ ํจ์ ์์ฑ
-
np.random.multinomial๋ก sampling
def pred_indices(preds, metric=1.0): preds = np.asarray(preds).astype('float64') preds = np.log(preds) / metric exp_preds = np.exp(preds) preds = exp_preds/np.sum(exp_preds) probs = np.random.multinomial(1, preds, 1) return np.argmax(probs)
> * ๋คํญ ๋ถํฌ (Multinomial distribution): > > ๋คํญ ๋ถํฌ๋ ์ฌ๋ฌ ๊ฐ์ ๊ฐ์ ๊ฐ์ง ์ ์๋ ๋ ๋ฆฝ ํ๋ฅ ๋ณ์๋ค์ ๋ํ ํ๋ฅ ๋ถํฌ๋ก, ์ฌ๋ฌ ๋ฒ์ ๋ ๋ฆฝ์ ์ํ์์ ๊ฐ๊ฐ์ ๊ฐ์ด ํน์ ํ์๊ฐ ๋ํ๋ ํ๋ฅ ์ ์ ์ํ๋ค. ๋คํญ ๋ถํฌ์์ ์ฐจ์์ด 2์ธ ๊ฒฝ์ฐ ์ดํญ ๋ถํฌ๊ฐ ๋๋ค. > > > ์ถ์ฒ: [์ํค๋ฐฑ๊ณผ. ๋คํญ๋ถํฌ]([https://ko.wikipedia.org/wiki/%EB%8B%A4%ED%95%AD_%EB%B6%84%ED%8F%AC](https://ko.wikipedia.org/wiki/๋คํญ_๋ถํฌ))
- Train & Evaluate the Model
-
batch
- randint๋ก randomํ๊ฒ ์์ํ๋๋ก ์ค์
-
ํ๋ฅ ์์ ์ค์ ํ ๋จ์ด ์์ฑ
-
[0.2, 0.7,1.2] ์ฒ๋ผ
for diversity in [0.2, 0.7,1.2]:
a = np.array([0.9, 0.2, 0.4]) b = 1.0 e = np.exp(a/b) print(e/np.sum(e))
'0.9'์ฒ๋ผ ์ ๋ ํ๋์ ๊ฐ์ด ํด ๊ฒฝ์ฐ ๋ค๋ฅธ ๊ฐ๋ค๊ณผ์ ์ฐจ์ด๊ฐ ๋ ์ปค์ง
print(e/np.sum(e))
[0.40175958 0.2693075 0.32893292]print(e/np.sum(e))
[0.47548496 0.23611884 0.2883962 ]a = np.array([0.6, 0.2, 0.4])
b = 1.0
e = np.exp(a/b)a = np.array([0.9, 0.2, 0.4])
b = 1.0
e = np.exp(a/b)
-
-
model.predict(x):
- ์์ธก(predict): model.predict(x) = > [0.01, 0.005, 0.3, 0.8 ...]
-
๋ฌธ์ ์ถ์ถ:
sys.stdout.write(pred_char) sys.stdout.flush()
DMN
-
Dynamic Memory Networks
- ์๋ 5๊ฐ์ง์ N/W๊ฐ ๊ฒฐํฉ๋์ด ์๋ ๋ชจ์ต
Input Module
Question Module
Episodic Memory Module
Answer Module
attention score N/W
(FNN)
Ask Me Anything
- Q โ A: Question & Answering
- DL ์ฌ์ฉ
- ๋ ผ๋ฌธ ์ ์๋ GRU ์ฌ์ฉ
- ํน์ง: Q&A๋ฅผ ๊ธฐ์ตํ๋ ํ๋์ ๊ฒฝํ ๋จ์์ธ Episode๋ฅผ ๊ธฐ์ตํ๋ ์ฅ์น๊ฐ ์๋ค.
-
์์:
- Input ๋ฌธ์ฅ(text sequence)๊ณผ
- attention ์ฐ์ฐ์ด ๋ค์ด๊ฐ question ๋ฐ์
- attention ์ฐ์ฐ: attention score
- episodic memory ๊ตฌ์ฑํ ํ,
- ์ผ๋ฐ์ ์ธ ๋ต๋ณ์ ์ค ์ ์๊ฒ๋ ๊ตฌ์ฑํ ๋คํธ์ํฌ
์ถ์ฒ: Ankit Kumar์ธ, 2016.05, Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. ์์ 'attention score' ์ถ๊ฐ
- attention process
-
attention score ๊ณ์ฐ
- attention score
- ์ฌ๋ฌ ๋ฌธ์ฅ์ Epsodic stroy๊ฐ ์๊ณ ์ด๊ฒ์ ๋ํ ๋ต์ ์ฐพ์ ๋ ์ง๋ฌธ๊ณผ ๊ฐ์ฅ ๊ด๋ จ์ด ๋์ (์ ์ฅ๋) ๋ฌธ์ฅ์ ์ ์๋ฅผ ๋งค๊ธฐ๊ธฐ ์ํ ๊ณ์ฐ์ ์ํํจ
- ์ฆ, ๋ต์ ๋ด๊ธฐ ์ํด ์ด๋ค ๋ฌธ์ฅ์ attention์ ํด์ผ ํ๋์ง attention score๋ก ๊ณ์ฐํ๋ ์๊ณ ๋ฆฌ์ฆ
- ๊ธฐ๊ณ๋ฒ์ญ, test ๋ถ๋ฅ, part-of-speech tagging, image captioning, Dialog system(chatbot) ๊ฐ๋ฅ
-
mission: ์ฃผ์ด์ง Question์ ์๋ฏธ๋ฅผ ํ์ ํ ์ ์๋๋ก ๋คํธ์ํฌ๋ฅผ ๊ตฌ์ฑํด์ผ ํ๋ค.
- '์๋ฏธ๋ฅผ ํ์ ํ ์ ์๋๋ก' : ์กฐ์์ด(Anaphora resolution) ํด์.
-
input module
= story module- Input์ ๋ฃ์ ๋ฌธ์ฅ๋ค์ 1ํ์ผ๋ก ๋ถ์ด๊ณ , [EOS] ๋ก ๊ตฌ๋ถ
๋ฌธ์ฅ 1 ๋ฌธ์ฅ 2 ๋ฌธ์ฅ3 When I was young, I passed test <EOS> But, Now Test is so crazy <EOS> Because The test level pretty hard more and more. - Embedding layer ํฌ์
- RNN ๊ฑฐ์ณ์
- Hidden layer ์ถ๋ ฅ์ ๋ค์ n๊ฐ์ ๋ฌธ์ฅ(c1, c2, c3 ๋ฑ)์ผ๋ก ์ถ๋ ฅ
- episodic memory module ํฌ์
-
Question module
- Question ๋ฌธ์ฅ ํฌ์
- Embedding layer ํฌ์
- RNN ๊ฑฐ์ณ์
- episodic memory module ํฌ์
-
episodic memory module
- input module(๋ฌธ์ฅ๋ง๋ค)+Question module+ atttention mechanism์ถ๋ ฅ๋ ๊ฑธ ๋ฐ๋ณตํด์ ๋ด๋ถ์ episodic memory๋ฅผ ๋ฐ๋ณต update
- "์ด๋ป๊ฒ update?"
-
input module์ embedding value๊ณผ atttention score ๊ณ์ฐํ์ฌ RNN layer์ ํต๊ณผ ์ํค๊ธฐ
- ์ด๋,
atttention score
: atttention score layer์ ์ถ๋ ฅ์ธต์์ ๋์จ w์ธg
๋ฅผ input module์ ์ถ๋ ฅ๊ฐ์ธ c1,c2,c3 ๋ฑ๊ณผ ๊ณ์ฐํ ๊ฐ
- ์ด๋,
-
Question module์ embedding value ๊ฐ์ RNN layer์ ํต๊ณผ ์ํค๊ธฐ
- ์ด๋, w =
Q
- ์ด๋, w =
-
2๋ฅผ answer์ outputํ Answer Module์ RNN layer์ ํต๊ณผ ์ํด
- ์ด๋, w =
m
- ์ด๋, w =
- episodic memory module์ RNN layer๋ฅผ ๋ฐ๋ณตํ ๋๋ง๋ค
attetion score
๊ณ์ฐ -
๊ทธ๋ ๊ฒ ํด์ ๋์จ attetion score ๊ฐ ์ค ๊ฐ์ฅ ๋์ ๊ฒ(g) ์ฐพ์
atttention mechanism
- ๊ทธ๋ ๊ฒ ํด์ ๋์จ attetion score ๊ฐ ์ค ๊ฐ์ฅ ๋์ ๊ฒ(g) ์ฐพ๊ณ
-
g๋ก ๋ค์ ๋คํธ์ํฌ ํ์ฑ
- 2์ธต ๊ตฌ์กฐ๊ฐ ๋จ
-
memory update mechanism
- attention score๋ก ๊ฐ์ค ํ๊ท
-
์ฉ์ด:
c
: Input์ ์ถ๋ ฅm
: episodic memory module ์ถ๋ ฅ๊ฐ์ด์ attention score์ ์ ๋ ฅ๊ฐq
: question layer์ ์ถ๋ ฅ๊ฐg
: attention score์ ์ถ๋ ฅ๊ฐ
code:
-
์์
- ํจํค์ง ๋ถ๋ฌ์ค๊ธฐ
- ์ ์ฒ๋ฆฌ
- 2-1. Document Data processing(raw data)
- ๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ
- 3-1. Raw Document Data ๋ถ๋ฌ์ค๊ธฐ
- 3-2. train/test data split ํด์ ๊ฐ์ ธ์ค๊ธฐ
- vocab ๋ง๋ค๊ธฐ
-
4-1. Train & Test data๋ฅผ ํ๊บผ๋ฒ์ ๋ฌถ์ด์ vocab์ ๋ง๋ฆ
collections.Counter()
-
4-2. word2indx / indx2word ๋ง๋ฆ
padding
- ๋ฒกํฐํ
-
5-1. vocab_size ๋ณ์ ์ค์
len(word2indx)
-
5-2. story์ question ๊ฐ๊ฐ์ max len ๋ณ์ ์ค์
- ๋ค์์ padding ๋ง์ถฐ ์ฃผ๋ ค๊ณ max len ์ค์ ํด์ค
-
5-3. ๋ฒกํฐํ ์ํด
raw data
์word2indx
, ๊ฐ ๋ชจ๋(story, question)์maxlen
์ ํจ์์ ๋ฃ์ดpadding
,categorical
๋ฑ์ ์งํํจ
- ๋ชจ๋ธ ๋น๋
- 6-1. train/test data split
-
์ด๋, Xstrain, Xqtrain, Ytrain =
data_vectorization
(datatrain, word2indx, storymaxlen, questionmaxlen) ์ด๊ณ , datavectorization์ return ๊ฐ์pad_sequences(Xs, maxlen=story_maxlen)
pad_sequences(Xq, maxlen=question_maxlen)
to_categorical(Y, num_classes=len(word2indx))
- 6-2. Model Parameters ์ค์
- 6-3. Inputs
- 6-4. Story encoder embedding
- 6-5. Question encoder embedding
-
6-6. ๋ชจ๋ ๋ง๋ค์ด์ค
- Question module๋ ์์์ ๋ง๋ค์ด์ค ๊ฑธ๋ก ์ฌ์ฉํจ
-
attention score layer
- dot์ผ๋ก ๋ง๋ฆ
-
story module
- ์ด layer๋ story layer์ input์์ ์์ํ์ฌ question layer๋ ๊ฑด๋๋ฐ๊ณ ๋ ๋ค๋ฅธ embedding layer๋ฅผ ๊ฑฐ์ณ, ์ถํ์ dot layer์ add๋ฅผ ํด์ฃผ๋ ค๊ณ ๋ง๋ฆ
-
episodic memory module
- dotํ layer์ ๋ฐ๋ก ์์ storyencoderc๋ฅผ addํด์ ๋ง๋ค์ด์ง๊ฒ ๋จ
-
answer module
- episodic memory layer(response) + quetion layer
- episodic memory layer(response) + quetion layer
- compile
model = Model(inputs=[storyinput, questioninput], outputs=output)
- input์ story์ question ๋ ๊ฐ ์จ์คฌ๋จ ๊ฒ!
- **fit **
- loss plot
- ์ ํ๋ ์ธก์ (predict)
- ์ ์ฉ
- ํจํค์ง ๋ถ๋ฌ์ค๊ธฐ
-
ํจํค์ง ๋ถ๋ฌ์ค๊ธฐ
import collections import itertools import nltk import numpy as np import matplotlib.pyplot as plt import random from tensorflow.keras.layers import Input, Dense, Activation, Dropout from tensorflow.keras.layers import LSTM, Permute from tensorflow.keras.layers import Embedding from tensorflow.keras.layers import Add, Concatenate, Dot from tensorflow.keras.models import Model from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.utils import to_categorical
์ ์ฒ๋ฆฌ
-
Raw Document Data processing
# ๋ฌธ์ ๋ด์ฉ ์์ : 3๋ฌธ์ฅ์ story(episodic story) # ํ์ฌ๊น์ง NLP๋ ํ ๋ฌธ์ฅ ์์์ ๋จ์ด๋ค์ ์๋ฏธ๋ฅผ ํ์ ํ๊ณ ์ด๋ฅผ ํตํด ํ ๊ฐ์ ๋ฌธ์ฅ์ ๋ถ์ํ๋ ์์ค์ ๊ทธ์ณ์๋ค(step1 ์์ค). # ์ด๋, episodic story๋ 'ํ ๋ฌธ์ฅ ์'์ด ์๋๋ผ ๋ฌธ์ฅ '๊ฐ'์ ๋จ์ด๋ค์ ๊ด๊ณ(=๋ฌธ์ฅ ๊ฐ์ ๊ด๊ณ)๋ฅผ ํ์ ํ๋ ๋ฐ์ ์์๊ฐ ์์ผ๋ฉฐ ๋ฐ๋ผ์ ๋งค์ฐ ๋ถ์์ด ์ด๋ ต๋ค(์ด๋ ์์ ์์ญ์ด๋ค. step2). ๋์๊ฐ ๋ฌธ๋จ(Paragrape) ๊ฐ์ ๊ด๊ณ๊น์ง ํ์ ํ ํ์๊ฐ ์๋ค(์ด๋ ์์ค์ ์์ญ์ด๋ค. step3). """ data ์๊น์ # 1 Mary moved to the bathroom.\n # 2 Daniel went to the garden.\n # 3 Where is Mary?\tbathroom\t1 """ ## Question๊ณผ answer์ #\t : tab ์ผ๋ก ๊ตฌ๋ถ๋์ด ์๋ค. # Return: # 3๊ฐ(Stories, question, answer)๋ฅผ return ํด์ค # Stories = ['Mary moved to the bathroom.\n', 'John went to the hallway.\n'] # questions = 'Where is Mary? ' # answers = 'bathroom' #---------------------------------------------------------------------------- def get_data(infile): stories, questions, answers = [], [], [] story_text = [] fin = open(infile, "r") for line in fin: lno, text = line.split(" ", 1) if "\t" in text: # >data ์๊น์<์์ \t๊ฐ ์๋ 3๋ฒ์ ๋งํ๋ ๊ฒ์ question, answer, _ = text.split("\t") #\t์ผ๋ก ๊ตฌ๋ถํด์ quetion๊ณผ answer ๊ตฌ๋ถ # ์ซ์(ex:1) stories.append(story_text) questions.append(question) answers.append(answer) # >data ์๊น์< ์์ 3๋ฒ์ \t ์์ answer๋ฌธ story_text = [] else: story_text.append(text) # ์ฌ์ค์ ํด๋น ํจ์๋ else ๋ถํฐ ์์ํ๋ ๊ฒ. fin.close() return stories, questions, answers
๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ
-
Raw Document Data ๋ถ๋ฌ์ค๊ธฐ
Train_File = "./dataset/qa1_single-supporting-fact_train.txt" Test_File = "./dataset/qa1_single-supporting-fact_test.txt"
-
get the data
data_train = get_data(Train_File) # ์ถ๋ ฅ: stories, questions, answers data_test = get_data(Test_File) print("\n\nTrain observations:",len(data_train[0]),"Test observations:", len(data_test[0]),"\n\n")
Train observations: 10000 Test observations: 1000
vocab ๋ง๋ค๊ธฐ
-
Building Vocab dictionary from Train & Test data
- Train & Test data๋ฅผ ํ๊บผ๋ฒ์ ๋ฌถ์ด์ vocab์ ๋ง๋ฆ
dictnry = collections.Counter() # collections.Counter() ์ด์ฉํ์ฌ ๋จ์ด๋ค์ด ์ฌ์ฉ๋ count ์กฐํํ ์์ for stories, questions, answers in [data_train, data_test]: for story in stories: for sent in story: for word in nltk.word_tokenize(sent): dictnry[word.lower()] +=1 for question in questions: for word in nltk.word_tokenize(question): dictnry[word.lower()]+=1 for answer in answers: for word in nltk.word_tokenize(answer): dictnry[word.lower()]+=1
-
word2indx / indx2word ๋ง๋ฆ
# collections.Counter()๊ณผ ๊ตฌ์กฐ๋ ๊ฐ์๋ฐ, ๋จ์ด index๋ 1๋ถํฐ ์์ํ๊ฒ ๋ฐ๊ฟ์ค. word2indx = {w:(i+1) for i,(w,_) in enumerate(dictnry.most_common())} word2indx["PAD"] = 0 # padding indx2word = {v:k for k,v in word2indx.items()} # ์์์ word2indx["PAD"] ํด์ค์ print(indx2word) ํ๋ฉด, ๋งจ ๋ง์ง๋ง์ ',0: 'PAD'' ๊ฐ ๋ค์ด๊ฐ ์๋ค.
๋ฒกํฐํ
-
vocab_size
๋ณ์ ์ค์ vocab_size = len(word2indx) # vocab_size = 22 -> ์ฆ 21๊ฐ ๋จ์ด๋ง์ด ์ฐ์ธ ๊ฒ(ํ๋๋ ํจ๋ฉ) print("vocabulary size:",len(word2indx)) print(word2indx)
- vocabulary size: 22
- {'to': 1, 'the': 2, '.': 3, 'where': 4, 'is': 5, '?': 6, 'went': 7, 'john': 8, 'sandra': 9, 'mary': 10, 'daniel': 11, 'bathroom': 12, 'office': 13, 'garden': 14, 'hallway': 15, 'kitchen': 16, 'bedroom': 17, 'journeyed': 18, 'travelled': 19, 'back': 20, 'moved': 21, 'PAD': 0}
-
story์ question ๊ฐ๊ฐ์
max len
๋ณ์ ์ค์ story_maxlen = 0 question_maxlen = 0 for stories, questions, answers in [data_train, data_test]: for story in stories: story_len = 0 for sent in story: swords = nltk.word_tokenize(sent) story_len += len(swords) if story_len > story_maxlen: story_maxlen = story_len # story ์ค ๊ฐ์ฅ ๊ธด ๋ฌธ์ฅ ์ฐพ๊ธฐ(=๋จ์ด๊ฐ ๊ฐ์ฅ ๋ง์ ๊ฑฐ) for question in questions: question_len = len(nltk.word_tokenize(question)) if question_len > question_maxlen: question_maxlen = question_len # question ์ค ๊ฐ์ฅ ๊ธด ๋ฌธ์ฅ ์ฐพ๊ธฐ print ("Story maximum length:", story_maxlen, "Question maximum length:", question_maxlen)
Story maximum length: 14 Question maximum length: 4
-
Converting data into
Vectorized
form- ์์ ๋ฌธ์ฅ์ ์์นํํจ
def data_vectorization(data, word2indx, story_maxlen, question_maxlen): Xs, Xq, Y = [], [], [] stories, questions, answers = data for story, question, answer in zip(stories, questions, answers): xs = [[word2indx[w.lower()] for w in nltk.word_tokenize(s)] for s in story] # vocab์ index๋ก ๋จ์ด๋ฅผ ํ์ํ๋ค(์์นํํ๋ค) xs = list(itertools.chain.from_iterable(xs)) # chain.from_iterable(['ABC', 'DEF']) --> ['A', 'B', 'C', 'D', 'E', 'F'] xq = [word2indx[w.lower()] for w in nltk.word_tokenize(question)] Xs.append(xs) Xq.append(xq) Y.append(word2indx[answer.lower()]) # Y = answer return pad_sequences(Xs, maxlen=story_maxlen), pad_sequences(Xq, maxlen=question_maxlen),\ to_categorical(Y, num_classes=len(word2indx)) # ๊ฐ์ฅ ๊ธด ๋ฌธ์ฅ(maxlen=story_maxlen))์ ๊ธฐ์ค์ผ๋ก ๋ฌธ์ฅ์ ๊ธธ์ด๋ฅผ ํต์ผ์ํจ๋ค. ์ด๊ฒ๋ณด๋ค ์งง์ ๋ถ๋ถ์ padding(0)์ผ๋ก ์ฑ์ # y: anwser์ด๊ณ , ์ฌ๊ธฐ์ ํ ๋จ์ด๋ก ๋์จ๋ค. ์ฆ, ํ ๋จ์ด = ์ซ์ 1๊ฐ # to_categorical: ์ ์ฐ๊ณ sparse categorical์ ์จ๋ ok
- ํจ์ data_vectorization() ไธญ
-
xs = [[word2indx[w.lower()] for w in nltk.word_tokenize(s)] for s in story]
xs > Out[19]: [[8, 7, 20, 1, 2, 13, 3], [10, 19, 1, 2, 17, 3]] story > Out[20]: ['John went back to the office.\n', 'Mary travelled to the bedroom.\n']
- xs = list(itertools.chain.from_iterable(xs)) xs > Out[22]: [8, 7, 20, 1, 2, 13, 3, 10, 19, 1, 2, 17, 3]
- Xs.append(xs) ํด์ค์ผ๋ก์จ for๋ฌธ ํตํด์ output ๋ ๊ฒ๋ค์ list ํํ๋ก ์ถ์ ํด์ค
- padding ํด์ค padsequences(Xs, maxlen=storymaxlen) # story_maxlen = 14 Out[31]: array([[ 0, 8, 7, 20, 1, 2, 13, 3, 10, 19, 1, 2, 17, 3]])
- padsequences(Xs, maxlen=storymaxlen) Out[31]: array([[ 0, 8, 7, 20, 1, 2, 13, 3, 10, 19, 1, 2, 17, 3]])
- padsequences(Xq, maxlen=questionmaxlen) Out[32]: array([], shape=(0, 4), dtype=int32)
- tocategorical(Y, numclasses=len(word2indx)) Out[33]: array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
๋ชจ๋ธ ๋น๋
-
train/test data split
Xstrain, Xqtrain, Ytrain = data_vectorization(data_train, word2indx, story_maxlen, question_maxlen) Xstest, Xqtest, Ytest = data_vectorization(data_test, word2indx, story_maxlen, question_maxlen) print("Train story",Xstrain.shape,"Train question", Xqtrain.shape,"Train answer", Ytrain.shape) print( "Test story",Xstest.shape, "Test question",Xqtest.shape, "Test answer",Ytest.shape)
print > Train story (10000, 14) Train question (10000, 4) Train answer (10000, 22) Test story (1000, 14) Test question (1000, 4) Test answer (1000, 22)
-
Model Parameters ์ค์
EMBEDDING_SIZE = 128 LATENT_SIZE = 64 BATCH_SIZE = 64 NUM_EPOCHS = 40
-
Inputs
story_input = Input(shape=(story_maxlen,)) # story_maxlen = 14 question_input = Input(shape=(question_maxlen,))
-
Story encoder embedding
story_encoder = Embedding(input_dim=vocab_size, # vocab_size: 22 output_dim=EMBEDDING_SIZE, # EMBEDDING_SIZE* = 128(ํ ๋จ์ด๋ฅผ 128๊ฐ์ vector๋ก ํ์, embedding layer์ colum ๋ด๋น) input_length=story_maxlen)(story_input) # story_maxlen = 14 story_encoder = Dropout(0.2)(story_encoder)
-
Question encoder embedding
question_encoder = Embedding(input_dim=vocab_size, output_dim=EMBEDDING_SIZE, input_length=question_maxlen)(question_input) question_encoder = Dropout(0.3)(question_encoder)
attention score layer
-
attention score layer
match = Dot(axes=[2, 2])([story_encoder, question_encoder])
-
Match between story and question: story and question๋ฅผ dot ์ฐ์ฐ ์ํ.
- ์ฌ๊ธฐ์ dot ์ฐ์ฐ์ attention score๋ก ์ฌ์ฉํจ
-
storyencoder = [None, 14, 128], questionencoder = [None, 4, 128]
- match = [None, 14, 4]
-
axes=[2, 2]? story D2(=128 = embedding vector)์ question D2(=128 = embedding vector)๋ฅผ dot ํด๋ผ
- ์ฆ, (x, 128)๊ณผ (128,y)๋ก ํ์ชฝ์ transpose ์์ผ์ ์ฐ์ฐ ์ํ
- ์ฐ์ story input์ embedding layer์ ์ถ๋ ฅ์ story_encoder = (None, ํ story์ ์ฌ์ฉ๋ ์ต๋ ๋จ์ด ๊ฐ์(=14), embedding vector(128)) ์ด๋ค.
- question input์ embedding layer์ ์ถ๋ ฅ์ question_encoder = (None, ํ question์ ์ฌ์ฉ๋ ์ต๋ ๋จ์ด ๊ฐ์(=14), embedding vecotr(128)) ์ด๋ค.
- dot -> (None)์ ๋นผ๊ณ (row, colum)๋ผ๋ฆฌ(=14, 128)๊ณผ (128, 14)๊ฐ ์ฐ์ฐ ์ํ
-
story layer
-
story layer
story_encoder_c = Embedding(input_dim=vocab_size, # vocab_size = 22 output_dim=question_maxlen, # question_maxlen = 4 input_length=story_maxlen)(story_input) # story_maxlen = 14 story_encoder_c = Dropout(0.3)(story_encoder_c) # story_encoder_c.shap=(14, 4)
- ์ด layer๋ story layer์ input์์ ์์ํ์ฌ question layer๋ ๊ฑด๋๋ฐ๊ณ ๋ ๋ค๋ฅธ embedding layer๋ฅผ ๊ฑฐ์ณ, ์ถํ์ dot layer์ add๋ฅผ ํ๊ฒ ๋จ
episodic memory layer
-
episodic memory layer
response = Add()([match, story_encoder_c]) # dotํ layer์ ๋ฐ๋ก ์์ story_encoder_c๋ฅผ addํจ => (14, 4) response = Permute((2, 1))(response) # ๊ฒฐ๋ก shape = (4, 14) # Permute((2, 1)): (D2, D1)์ผ๋ก transpose. permute๊ฐ transpose๋ณด๋ค ๋ ์ถ์ด๋์ด ์์ ๋ก์
answer layer
-
episodic memory layer(response) + quetion layer
answer = Concatenate()([response, question_encoder]) answer = LSTM(LATENT_SIZE)(answer) # LATENT_SIZE = 64 answer = Dropout(0.2)(answer) answer = Dense(vocab_size)(answer) # shape=(None, 22) # ๋ง์ง๋ง dense๋ vocab_size=22(๋จ์ด๋ค์ ์ด ๊ฐ์)๋ก! output = Activation("softmax")(answer) # shape=(None, 22)
compile
-
๋ชจ๋ธ ๋น๋ ๋ง์ง๋ง
model = Model(inputs=[story_input, question_input], outputs=output) # ํฉ์ณค์ผ๋ input์ []๋ก ์จ์ฃผ๋ ๊ฒ model.compile(optimizer="adam", loss="categorical_crossentropy") # ์ฒ์์ to_categorial ์ ํด์คฌ์ผ๋ฉด loss="sparse_categorical_crossentropy" ํด์ผํจ print (model.summary())
fit
-
๋ชจ๋ธ ํ์ต
# Model Training history = model.fit([Xstrain, Xqtrain], [Ytrain], # Ytrain: answer batch_size = BATCH_SIZE, epochs = NUM_EPOCHS, validation_data=([Xstest, Xqtest], [Ytest])) # ytest??? fit ํ๋ฉด 0,0,0 ์ด๋ ๊ฒ 13,14 ๋ฑ์ง๋ก ๋ฐ๋
- Ytest.shape Out[78]: (1000, 22)
- Ytest Out[79]: array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
- ytest.shape Out[87]: (1000,)
- ytest Out[92]: array([15, 12, 16, 15, 16, 15, 14 ... ])
loss plot
-
loss plot
plt.title("Episodic Memory Q & A Loss") plt.plot(history.history["loss"], color="g", label="train") plt.plot(history.history["val_loss"], color="r", label="validation") plt.legend(loc="best") plt.show()
์ ํ๋ ์ธก์
-
get predictions of labels
ytest = np.argmax(Ytest, axis=1) Ytest_ = model.predict([Xstest, Xqtest]) ytest_ = np.argmax(Ytest_, axis=1)
์ ์ฉ
-
์ ์ฉ
- Select Random questions and predict answers
NUM_DISPLAY = 10 for i in random.sample(range(Xstest.shape[0]),NUM_DISPLAY): story = " ".join([indx2word[x] for x in Xstest[i].tolist() if x != 0]) question = " ".join([indx2word[x] for x in Xqtest[i].tolist()]) label = indx2word[ytest[i]] prediction = indx2word[ytest_[i]] print(story, question, label, prediction)
์ถ๋ ฅ์ธต
-
์ถ๋ ฅ์ธต์ด 0 or 1 ์ฒ๋ผ ํ๋์ผ ๋
- Binary classification. ๋ฐ๋ผ์ sigmoid - binary-crossentropy ์ฌ์ฉ
-
y yHat 0 0 0 1 1 1
์ ํ๋: 2/3
-
์ถ๋ ฅ์ธต์ด ๋ ๊ฐ ์ด์ ๋์ฌ ๋
- multi-classification. ๋ฐ๋ผ์ softmax - categorical-crossentropy ์ฌ์ฉ
- one-hot ๊ตฌ์กฐ
-
y yHat 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0
์ ํ๋: 2/3
-
์ถ๋ ฅ์ธต์ '1'์ด ์ฌ๋ฌ ๊ฐ์ธ ๊ตฌ์กฐ. one-hot ๊ตฌ์กฐ๊ฐ ์๋ ๋
- multi-labeled classification. ๋ฐ๋ผ์ sigmoid - binary-crossentropy ์ฌ์ฉ
- ์ ๋ ฅ ๋ด๋ฐ ๊ฐ๊ฐ์ ๋ํด binary-classification ํด์ผ ํจ
-
y yHat 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0 0
์ ํ๋: 9๊ฐ ์ค 7๊ฐ ๋ง์ถค. ๋ฐ๋ผ์ 7/9
- ์์์ฒ๋ผ row ์ ์ฒด๊ฐ ๋ค ๋ง์์ ๋ ๋ง์๋ค๊ณ ๋ณด๋ ๊ฒ ์๋๊ณ , row+colum์ผ๋ก ๊ฐ๊ฐ ๊ฐ๋ณ๋ก ๋ด
-
์ฐธ๊ณ :
- ์๋ง์ถ์ด ํํธ, blog.naver.com/chunjein
- ํฌ๋ฆฌ์๋ ๋ฐ๋ธ์ฌ ์ธ. 2019.01.31. ์์ฐ์ด ์ฒ๋ฆฌ ์ฟก๋ถ with ํ์ด์ฌ [ํ์ด์ฌ์ผ๋ก NLP๋ฅผ ๊ตฌํํ๋ 60์ฌ ๊ฐ์ง ๋ ์ํผ]. ์์ด์ฝ
- https://frhyme.github.io/python-libs/ML_multilabel_classfication/