Vous êtes sur la page 1sur 15

RECURRENT NEURAL NETWORK

USING LSTM MODEL

A recurrent neural network (RNN) is a class of artificial neural


network where connections between nodes form a directed graph
along a sequence. This allows it to exhibit dynamic temporal behavior
for a time sequence.Using the knowledge from an external embedding
can enhance the precision of your RNN because it integrates new
information (lexical and semantic) about the words, an information
that has been trained and distilled on a very large corpus of data.The
pre-trained embedding we’ll be using is glove.
Input Layer : Takes the sequence of words as input.

LSTM Layer : Computes the output using LSTM units. I have added 100
units in the layer, but this number can be fine tuned later.

Dropout Layer : A regularisation layer which randomly turns-off the


activations of some neurons in the LSTM layer. It helps in preventing over
fitting.

Output Layer : Computes the probability of the best possible next word as
output.
Long Short-Term Memories

The Long Short-Term Memory (LSTM) unit was initially proposed by


Hochreiter and Schmidhuber and since then a number of modifications to the
original unit have been made. Unlike the
recurrent unit, which simply computes a weighted sum of the input signal and
applies a nonlinearfunction, each LSTM unit maintains a memory ct at time t,
which is subsequently used to determine

the output, or the activation, ht, of the cell.

1. ht = ot tanh(ct)
2. ct = ft ct􀀀 1 + it ~ct
3. ~ct = tanh(xtWc + ht􀀀 1Uc + bc)
4. ot = (xtWo + ht􀀀 1Uo + bo)
5. it = (xtWi + ht􀀀 1Ui + bi)
6. ft = (xtWf + ht􀀀 1Uf + bf )

See for a good exposition of the motivation behind this mathematical


formulation.
4

The Popular LSTM Cell


xt ht-1 xt ht-1

Wi Wo æ æ xt ö ö
Input Gate it Output Gate ot ft = s ç W f ç ÷ + b f ÷
è è ht-1 ø ø
Similarly for it, ot
xt W Cell

ct-1 ht
ht-1
ct = ft Ä ct-1 +
æ xt ö
it Ä tanhW ç ÷
Wf
ft Forget Gate è ht-1 ø

ht = ot Ä tanhct
xt ht-1

* Dashed line indicates time-lag


1. Input Gate : It uses the input word and the past hidden state to
determine whether or not
the current input is worth preserving.

2. New Memory Generation : It uses the input word and the past
hidden state to generate a
new memory which includes aspects of the new word.

3. Forget Gate : It uses the input word and the past hidden state to
make an assessment on
whether the past memory cell is useful for computation of the current
memory cell.

4. Final Memory Generation : It first takes the advice of the forget


gate and accordingly forgets the past memory; it then takes the
advice of the input gate and accordingly gates the new memory and
lastly it sums these two results to produce the final memory.

5. Output/Exposure Gate : It makes the assessment regarding what


parts of the memory needs to be exposed/present in the hidden
state.
Amazon review dataset
The dataset used for our work is based on the public collected
• Amazon review data set from He and McAuley (2016), and
• ground-truth spamming review dataset from Mukherjee et al.
• (2012). We preprocessed the dataset as follows. We obtained a
• manual labeled subset for 815 unique users in more than
2000
• potential groups. After averaging the manually labeled
spamming
• score (ranging from 0 and 1, more the score approaches
• 1, more likely this user is a spammer.), we labeled users with
• Loading and preparing
the data
• As a starting point, I loaded a
csv file containing 1,780
customer reviews in English
with the corresponding
rating on the scale from 1 to
5, where 1 is the lowest
(negative) and 5 is the
highest (positive) rating.
Here is a quick glance at the
data frame:
Data preprocessing
RNN input requires array data
type, therefore, we convert the
“Reviews” into the X array and
“Sentiment” into the y array
accordingly.
X, y = (data['Review'].values,
data['Sentiment'].values)
TRAINING THE DATASET
• Now, we split the data set into training and
testing using sklearn’s train_test_split and
keeping 25% of original data as a hold out set:
• from sklearn.model_selection import
train_test_splitX_train, X_test, y_train, y_test
= train_test_split(X_pad, y, test_size = 0.25,
random_state = 1)
Furthemore, the training set can be
split into training and validation set:
• batch_size = 64
X_train1 = X_train[batch_size:]
y_train1 = y_train[batch_size:]X_valid =
X_train[:batch_size]
y_valid = y_train[:batch_size]
• Next, we add 1 hidden LSTM layer with 200
memory cells. Potentially, adding more layers
and cells can lead to better results.
• Finally, we add the output layer with sigmoid
activation function to predict a probability of a
review being positive.
• After training the model for 10 epochs, we
achieve an accuracy of 98.44% on validation set
and 95.93% on test (hold out) set.
From the screenshot above, you can immediately spot some cases of
misclassifications. Out of 100 predicted cases, only sixteen reviews with
actual 1 star rating are incorrectly classified as “Pos” (having positive
sentiment). However, if we dig deeper, we can see that the problem is
actually not as big as it seems to be:
DISADVANTAGES
• In order to prepare the reviews for prediction, the same
preprocessing steps have to be applied on text before passing them
into trained model.
• # Prepare reviews for check
Check_set = df.Review.values
Check_seq = tk.texts_to_sequences(Check_set)
Check_pad = pad_sequences(Check_seq, maxlen = 100, padding =
'post')# Predict sentiment
check_predict = model.predict_classes(Check_pad, verbose = 0)#
Prepare data frame
check_df = pd.DataFrame(list(zip(df.Review.values,
df.Rating.values, check_predict)), columns =
['Review','Rating','Sentiment'])
check_df.Sentiment = ['Pos' if x == [1] else 'Neg' for x in
check_df.Sentiment]
check_df
RESULT ANALYSIS
CONCLUSION
• In this paper, we have proposed a model for collective anomaly
detection based on Long Short-Term Memory Recurrent Neural
Network. We have motivated this method through investigating LSTM
RNN in the problem of time series, and adapted it to detect collective
anomalies by proposing the measurements in Section 4. We investigated
the hyper-parameters, the suitable number of inputs and some
thresholds by using the validation set. The proposed model is evaluated
by using the time series version of the Kaggle dataset.

• The results suggest that proposed model is efficiently capable of


detecting collective anomalies in the dataset. However, they must be
used with caution. The training data fed into a network must be
organized in a coherent manner to guarantee the stability of the system.
In future work, we will focus on how to improve the classification
accuracy of the model. We also observed that implementing variations
in a LSTM RNNs number of inputs might trigger different output
reactions.

Vous aimerez peut-être aussi